November 30, 2006

MySQL Wins “European Entrepreneur of the Year” Award

Mårten Mickos, CEO of MySQL AB, a Swedish
software enterprise, was awarded the “European Entrepreneur of the Year
2006″ award on Thursday night at a Gala Dinner in Geneva. The award was the
main prize handed out at the Audemars Piguet “Changing Times Award” created
to recognize the European entrepreneur whose private company has had the
biggest impact, on the largest number of people over the last three years.

Related Posts

(MySQL Wins Another Government Deal) (Compassionate Oversight) (MySQL AB Sees Increased Growth & Momentum in EMEA) (MySQL AB Completes Record Quarter & Year) (MySQL Hosts Two European Customer Conferences

Interesting MySQL and PostgreSQL Benchmarks

I had found pile of MySQL and PostgreSQL benchmarks on various platforms which I have not seen before. Very interesting reading.
It does not share too much information about how MySQL or PostgreSQL was configured or about queries. Furthermore MySQL and PostgreSQL has a bit different implementations (ie SubQueries avoided for MySQL) so do not just compare it directly.

It also does not mention if Innodb or MyISAM tables are used - it turns out Both are used in the benchmark. This is CPU bound benchmark with working set fitting in memory.

MySQL and PostrgreSQL Scalability on Xeon Woodcrest, Opteron and Niagra
Pretty interesting to see how PostgreSQL scales just as systems should scale in theory - gradually goes up with number of threads about matches number of Cores/Threads and stays at this level at higher concurrency. MySQL with Innodb shows its ugly face and drops pretty quickly as concurrency growths with peak at about number of CPUs. I guess this is lucky case as Innodb may well start to slow down before concurrency reaches number of CPUs.

Yes, Innodb Team has provided the fix for this scalability problem and it is merged into MySQL 5.0.30 “Enterprise” but according to the tests I’ve done so far it is far from full solution yet.

It is also interesting to see CPU comparison in this test. Woodcrest has best performance in this test (and in many other MySQL tests), Opteron comes second and older Intel Xeons as well as Niagra being outsiders.

Niagra scalability is one more interesting story. As you can see MySQL 4.1 actually scaled pretty well with Niagra, suffering slow regression with increased concurrency rather than quick drop. In MySQL 5.0 it is changed dramatically - it climbs to higher peak but it drops down very quickly as well as concurrency growths. It is seen much better on this picture

Linux vs Solaris comparison is also pretty interesting. With MySQL Linux has higher peak but Solaris suffers less with increased concurrency.

Note: I have not validated these benchmarks and as I already mentioned they do not have full disclosure. They however do match my own experience with MySQL so I tend to trust PostgreSQL data points as well.

Related Posts

(MySQL/Innodb scalability tests after fix) (Bringing MySQL compatibility to PostgreSQL) (NewsForge: Bringing MySQL Compatibility to PostgreSQL) (MySQL and PostgreSQL SpecJAppServer benchmark results) (Linux IO Schedulers and MySQL

Pump ‘n’ dump spam

Laura Frieder and Jonathan Zittrain: “The average investor who buys a stock on the day it is most heavily touted and sells it 2 days after the touting ends will lose approximately 5.5%. For the top half of most thoroughly touted stocks, … a spammer who buys at the ask price on the day before unleashing touts and sells at the bid price on the day his or her touting is the heaviest will, on average, earn 5.79%.”

Leonard Richardson has a site which monitors daily stock spam activity. You can see from his charts that buying these scammy penny stocks is a guaranteed way to lose money.

Leonard, by the way, is the co-author of Ruby Cookbook, 906 pages of shiny, nearly perfect Ruby code with detailed explanations of everything, this book is a fantastic way to learn Ruby.

His wife works at Fog Creek Software; if you”ve emailed us lately and heard back from someone named Sumana, that”s her.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(The Oracle 10g Data Pump API Speeds Up the ETL Process) (Web Administration: Top 10 Ways to Pump Up IIS Performance) (Web Administration: Top 10 Ways to Pump Up IIS Performance) (Web Administration: Top 10 Ways to Pump Up IIS Performance) (The new anti-spam law and linking campaigns

November 29, 2006

Using source control tools on huge projects

Of all the things broken at Microsoft, the way they use source control on the Windows team is not one of them.

A young Windows engineer writes:

“… prior to the restart effort of Longhorn, there were about seven [branches], reverse-integrating into one main branch every two or three weeks perhaps. Now, imagine several thousand developers checking in directly into seven branches. This will lead to two things:

“1. you check in frequently, and there”s a very high chance of either breaking the build, or breaking functionality in the OS, or 2., as a counter-reaction, you don”t check in very often, which clearly is bad, since now you don”t have a good delta history of what you did.

“So this clearly didn”t scale. As part of the restart effort, we decided that each team would get its own feature branch, each feature area (multiple teams) would go up to an aggregation branch, and those would lead up to the final main branch. (As such there”s now north of a hundred branches in tiers, leading up to about six aggregation branches.) Teams were free to choose how many sub-feature branches they wanted, if any, and they were free to choose how often they wanted to push up their changes to the aggregation branch. As part of the reverse-integration (RI, i.e. pushing up) process, various quality gates had to pass, including performance tests. Due to how comprehensive those gates ended up being, this would take at least a day to run, plus perhaps a day or two to triage issues if any cropped up; so there was a possibly considerable cost to doing an RI in the first place. However, these gates were essential in upholding the quality of the main branch, and had they not existed, the OS would have never shipped. I suppose it”s one of those “what doesn”t kill you…” type deals.

“Some teams did manage to manufacture pathological cases for themselves where changes wouldn”t RI up for several months, but that”s the individual team”s fault (or their release management), and not the process. Generally, the more disciplined teams were about quality, the faster and more frequently they”d RI. From what you know about the varying levels of stability/quality of components of the OS, it”s pretty easy to map that back to RI velocity and so forth, since it all goes hand-in-hand pretty nicely.”

When you”re working with source control on a huge team, the best way to organize things is to create branches and sub-branches that correspond to your individual feature teams, down to a high level of granularity. If your tools support it, you can even have private branches for every developer. So they can check in as often as they want, only merging up when they feel that their code is stable. Your QA department owns the “junction points” above each merge. That is, as soon as a developer merges their private branch with their team branch, QA gets to look at it and they only merge it up if it meets their quality bar.

The best way to imagine this is to look at this screenshot from Accurev. As you can see there are a lot of “leaf” branches but as things get merged up towards the trunk, they have to pass through QA which basically just checks that it”s OK and then merges it closer to the trunk. By the way, Accurev makes a nice source control system that is designed for this style of intensive branching and merging. The Windows team itself uses their own source control system which, it is rumoured, is just an old version of Perforce for which they bought a source license; Perforce has a reputation among developers for being expensive, solid, and extremely fast when working with extremely large source code bases.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(RSA and RSM Surpass Rational Rose Productivity) (Enterprise-Wide Project Management) (Getting MarkDown and SmartyPants working with EditPad Pro) (Explore Open Source with Microsoft Technologies) (The top 5 myths about link popularity building tools

How to Backup an SQL Server Database in a Shared Network Location

This workaround allows you to back up shared network locations in the SQL database.

Related Posts

(Accelerating Network-based Backup) (Accelerating Network-based Backup by Riverbed) (Small Business Server: Master Your Domain: Build a Corporate Network at Home) (MaxDB Database Backup and Recovery White Paper) (How To Set Up Database Replication In MySQL (16 Jan 2006)

BoardReader - Forum Search Engine

One may have notice we were not blogging too much recently, this is because we were quite busy, mainly building BoardReader.com - Search Engine which indexes tens of thousands of forums from all over the world. This project was built by us as consulting project so too bad we do not own it completely but we’re still quite excited it is live now. We did not work on crawler in this project only on database Backend and full text search engine implementation. In this part it is standard LAMPS application. I guess you know what LAMP is and S Stands for Sphinx - Full Text Search Engine which we love to use where large scale search is needed. At this point we have over 300 millions of posts indexed with only 3 search servers and still counting. I guess we’ll have half a billion of forum posts soon.

To share few more technical details - it is implemented using pretty standard “manual partitioning” scheme with different forum sites mapped to different “table groups” with each server handling bunch of these. This would make it easier to re balance groups if needed as traffic growths as well as makes ALTER TABLE much less painful. The other technique which I covered in some of my presentations is using double data storage with different partitioning. In our case we wanted to track links between sites. It is easy for outgoing part as we already cluster by sites but It is hard for incoming links as they are scattered among many tables and servers. To target this problem we also store inbound links clustered by second level domain which allows to get inbound links pretty efficiently. It turns out however some domains still get way too many links and we’ll likely redesign it in the future to use sphinx instead (it can do extremely fast parallel group-by on many servers, in google style).

Few features which I would like to highlight - first you can use it to Search MySQL Forums Notice simple link structure - you can replace mysql.com in it with any other domain to search forums from that domain. For example you can use this link to search our MySQL Performance Forums

Second - note the graph which shows how many results were found matching this terms right from search results. It can show quite interesting data, for example searching Britney Divorce will show huge spike then news came out and quick calm down in about in week. You can click on the bar in the graph to get search results focused on that period. Can be quite fun.

Another nice feature is domain profile - by using it you can see how actively this domain is getting links, which pages are most frequently linked on domain as well as which pages and domains forum users tend to link to. So far reporting period is restricted by performance reasons - there is too much data to group and quite a hassle to build summary tables as we want to count uniques, but it should be fixed once we rewrite it using sphinx. From that page you can also get to inbound link report which allows you to see what recent links do you have from forums to whole web site or particular url

I also should mention couple of ratings we have implemented. Love for ratings probably comes from my SpyLOG background. At this point we have implemented rating of YouTube videos and rating of Domains In both cases we check how many links each of domains is getting and from how many unique sites. For domains we split domains which are getting normal links as well as domains which have images on them referenced.

There are still a lot things to do and quite probably quite a lot of bugs to kill. We would welcome any feedback such as suggestions or bug reports. Also if you know the forum which is not indexed please free to submit it.

Related Posts

(Are search engine spammers exploiting your web pages?) (Is Amazon’s A9 search engine really that good?) (All you need to know about the new MSN Search) (Sphinx: Going Beyond full text search) (The final goal of search engine optimization

November 28, 2006

Shutdown in OS/X

Arno Gourdol on the design of the off button in OS/X: “And finally, how often do you need to manually set your computer to Sleep? I just close the lid of my MacBook and it goes to sleep: a simple mechanical, physical interaction: no need for a software command.”

Right. Here”s what the current OS/X shut down dialog looks like:

PS: We still have a few hundred copies of Aardvark”d (the movie) in stock. That”s the documentary about summer internships at Fog Creek and the making of Copilot. They make great gifts.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(How many Microsofties does it take to implement the Off menu?) (Seattle

Find the very best keywords for your web site with new powerful SEO tools

Finding the right keywords is an important step that you should take seriously if you want to be successful with your web site. New SEO tools help you to find the best keywords for your site.

Related Posts

(Why Jim didn’t make profit with his Internet business, part 2) (Why Jim didn’t make profit with his Internet business) (Find the keywords that convert to sales) (Outperform your competitors with better keywords) (Keywords and search engine optimization

November 27, 2006

Trailing spaces in MySQL

In the past life was easy in MySQL. Both CHAR and VARCHAR types meant the same, only being difference in the sense of fixed or dynamic row length used. Trailing spaces were removed in both cases.

With MySQL 5.0 however things changed so now VARCHAR keeps trailing spaces while CHAR columns do not any more. Well in reality CHAR columns are padded to full length with spaces but it is invisible as those trailing spaces are removed upon retrieval. This is something you need to watch both upgrading to MySQL 5.0 as well as designing your applications - you should keep into account if you mind trailing spaces stored choosing VARCHAR vs CHAR in addition to fixed length vs dynamic level rows and space spent for column size counter.

There is more fun stuff with trailing spaces. When comparison is done trailing spaces are always removed, even if VARCHAR column is used which is pretty counterintuitive. So “a “=”a”=”a ” for all textual column types - CHAR, VARCHAR, TEXT. BLOB is exception it will preserve trailing spaces and use them in comparison.

Related Posts

(The Four HTML Optimization Steps) (MySQL Query Cache WhiteSpace and comments) (MySQL Federal DBA Day) (MySQL Awarded GSA Contract Schedule 70) (Nice PHP MySQL Tutorial

November 26, 2006

PHP Conference Brazil

The PHP Conference Brasil is the first Brazilian conference related exclusively to the PHP language in Sao Paulo, on December 1st and 2nd, 2006. It will be a great opportunity to establish a sustainable Brazilian PHP community and to exchange ideas among our professionals. More info is available at the Brazilian PHP Conference Web site.

Related Posts

(PHP Québec conference 2007) (2006 DC PHP Conference - Speakers and schedule) (Scale to New Heights at the 2007 MySQL Conference & Expo) (PHP Québec 2007 - Call for Papers) (MySQL Hosts Three Customer Conferences in London, Munich & Paris
« Previous entries