January 31, 2007

Getting MySQL to use full key length

There is one bug, or “missing feature” in MySQL Optimizer which may give you hard time causing performance problems which may be hard to track down, it is using only part of the index when full index can be used or using shorter index while there is longer index available. The last item is yet another good reason for removing redundant indexes

Here is example from NNSEEK database:

SQL:

  1. mysql> EXPLAIN SELECT thread_id FROM nn2_msg132.msg132   WHERE group_id=398157 AND parent_id=0 AND (published BETWEEN ‘2006-12-02 00:00:00′ AND ‘2006-12-02 23:59:59′)   ORDER BY published DESC LIMIT 0,10 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: SIMPLE
  5.         TABLE: msg132
  6.          type: ref
  7. possible_keys: group_id,thread_id,groupid_published,grp_subj
  8.           KEY: group_id
  9.       key_len: 8
  10.           ref: const,const
  11.          rows: 1
  12.         Extra: USING WHERE
  13. 1 row IN SET (0.00 sec)

Where group_id key is defined as KEY `group_id` (`group_id`,`parent_id`,`published`)

As you can see MySQL selects to use “ref” access only using two first key parts from the index and it assumes there is only one row to be matched by index. In fact there are 2000 matching rows for these constants (parent_id=0 is special value which matches significant amount of rows).

I think optimizer should be fixed to always expand and use “range” lookup at least for simple ranges. Even if further restrictions by next key part will only barely increase selectivity it anyway does not cost much more as basically the same data is traversed anyway.

So how do you force MySQL optimizer to use full length in this case ? ANALYZE TABLE does not help, at least in this case. However looks like as one of its side effects FORCE INDEX actually forces index to be used to largest extent possible:

SQL:

  1. mysql> EXPLAIN SELECT thread_id FROM nn2_msg132.msg132 force INDEX(group_id)  WHERE group_id=398157 AND parent_id=0 AND (published BETWEEN ‘2006-12-02 00:00:00′ AND ‘2006-12-02 23:59:59′)   ORDER BY published DESC LIMIT 0,10 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: SIMPLE
  5.         TABLE: msg132
  6.          type: range
  7. possible_keys: group_id
  8.           KEY: group_id
  9.       key_len: 12
  10.           ref: NULL
  11.          rows: 51
  12.         Extra: USING WHERE
  13. 1 row IN SET (0.00 sec)

As you can see in this case estimate becomes closer to the truth and this query runs about 50 times faster and easy on buffer pool (we do not have many un-needed rows accessed causing their pages to be loaded replacing valuable data from cache).

MySQL could look at the stats and adjust them appropriately - the plan for second query is accessing subset of rows from the first plan so estimation for number of rows for the first query should not be smaller. Stats and B-Tree dives can give information which is quite far away from the real number so it is worth at least make it consistent in this respect.

The other example I mentioned in the start of this post which I often have trouble with (just do not have example today) is using different index, for example, having indexes (A) and (A,B) for query A=Const and B>Const we can see MySQL selecting (A) index instead of (A,B) or even worse it can can select something like (A,D) using only first keypart of that index.

To fix this problem I guess another statistics alignment should take place - if you have several keys which start with prefix A and you use only this prefix for lookup - estimate number of rows should be the same for all indexes.

Related Posts

(Trailing spaces in MySQL) (Covering index and prefix indexes) (MySQL Full Text Search in Action) (Sphinx Developer joins our team) (Constructing Strings of Repeated Characters in T-SQL

January 30, 2007

A Faster Way to Search Through a DataTable

Find out how DataView can help you perform faster searches through a DataTable.

Related Posts

(A Faster Way to Search Through a DataTable) (Use DataTable.Rows.Find() Instead of DataTable.Select()) (Placing Multiple Headers in a DataTable) (Retrieve Totals from a Dataset Without Looping) (Overcoming the Barriers Around Using Custom Classes in ASP.NET

A Faster Way to Search Through a DataTable

Find out how DataView can help you perform faster searches through a DataTable.

Related Posts

(A Faster Way to Search Through a DataTable) (Use DataTable.Rows.Find() Instead of DataTable.Select()) (Placing Multiple Headers in a DataTable) (Retrieve Totals from a Dataset Without Looping) (Overcoming the Barriers Around Using Custom Classes in ASP.NET

The end of Google bombs?

Google bombs have become a problem for Google. The official Google webmaster blog recently had a statement about Google bombs and how Google deals with them. How does this new algorithm affect your Google rankings?

Related Posts

(What Google knows about you) (All about Google Suggest) (All about Google’s new PPA advertising product) (Official Google statement: How does Google collect and rank results?) (Google official: PageRank is not a big deal

Linux IO Schedulers and MySQL

Found a great article about Linux IO Schedulers today which is quite interesting. It goes in details about schedulers and explains in which of workloads which of schedulers is best.

The interesting thing this article points out is - there are multiple versions of each of the schedulers, while name remains the same. This means unless you really know mapping between kernel versions and scheduler versions it is very hard to evaluate benchmark results.

This could be noticed by benchmarks we’ve done over years. Long time ago “AS” scheduler could be several times slower than deadline for MySQL workloads such as SysBench or DBT2 when it went down to 30% difference and in the last runs we’ve done difference was not really significant.

This article also points out benchmarking IO schedulers you should look at more numbers than aggregate bandwidth - you also better to measure per client bandwidth as well as max latency as this is what can be the problem. Take a look at these old results for example. It also means you’d better to perform IO scheduler benchmarks on mixed load with different of task, for example mixing OLTP with some reporting queries if you really want to see the difference.

From the article it looks like CFQ should be good choice for databases and it is also found to work pretty well by some benchmarks we’ve done. The only question if it is doing as good as it could - In the docs it is mentioned it uses “per process” scheduling while MySQL is single process but single thread - does each thread gets its own queue in reality or is it shared ?

We should look into this when we’ll run more benchmarks for IO Schedulers.

Related Posts

(MySQL to Certify to the Linux Standard Base) (MySQL to Certify to the Linux Standard Base) (IT Manager’s Journal: Linux Goes whaBAM!) (IT Manager’s Journal: Linux Goes whaBAM!) (Dell to Offer MySQL Network

Making MySQL Replication Parallel

Kevin Burton writes about making MySQL Replication Parallel. Many of us have been beaten by the fact MySQL Replication is single threaded so in reality it is only able to use only single CPU and single disk effectively which is getting worse and worse as computers are getting “wider” these days with multi-core CPUs.

Kevin proposes to execute queries in parallel and it is generally good idea, the problem is however implementing it right without changing MySQL Replication semantics - which is - Slave database state corresponds to master database state at certain point in time. It is delayed but threads reading from the slave never will see state of the database which never existed on master.

As I commented in Kevins blog the problem is very simple to illustrate - assume you have 2 queries modifying 2 different tables, query A and query B. On the Master query A completed first and B followed it. On the slave we execute them in parallel so query B may complete before query A causing database stage which never existed on the master. Of course the idea could be to wait on final commit stage and commit queries A and B in order defined by Master but it brings to the plate other problems such as possible deadlocks between queries if they are complex transactions.

It should be however not as bad if we only look at single queries or transactions which do not have any overlap in terms of tables.

For some users commit order for independent queries may be unimportant so this restriction could be weakened to only make sure there is a “barrier” between queries which are possibly dependent on each other, such as reading or writing to the same tables.

There is other possible solution it is to allow multiple threads inside the server to share same transactional/lock context. In this case replication could accumulate number of queries execute them in parallel and then commit all at once.

None of these however are easy trick which I would expect to come quite soon.

On other hand if support for Multi-Master is implemented for many applications Parallel Replication could be implemented simply by filtering transactions and writing to number of binary logs.

If you’re “Scaling Out” you may just treat single server as it is few servers, so place several independent pieces on it, for example if different databases. Now if you could setup filtering so updates for each of them is written to its own binary log file and setup multi-master replication so slave can read all of them in parallel you can get replication parallel enough for many application without serious code complications.

If MySQL would not implement it it might be nice feature to hack into community tree.

Related Posts

(How To Set Up Database Replication In MySQL (16 Jan 2006)) (How To Set Up Database Replication In MySQL (16 Jan 2006)) (MySQL Replication and Slow Queries) (Managing Slave Lag with MySQL Replication) (MySQL Master-Master replication manager released

Founders at Work

Jessica Livingston, a partner at Y Combinator, has just published a book of interviews with 32 high tech company founders called Founders at Work.

She posted a couple of chapters online, including the interview with me, which is probably the most complete story of the early days of Fog Creek Software in print.

“There”s a bunch of people out there doing certain types of things and they seem to be pretty incompetent, but they”re getting huge valuations. Surely if I did those same things, knowing that I am less incompetent—merely semi-incompetent as opposed to extremely incompetent—I should be able to achieve at least their level of success.”

I haven”t read the book yet… still waiting for my copy to arrive from Amazon, which appears to be beyond semi-incompetent with regards to what “overnight shipping” might mean when you order something on Thursday. Check out the list of interviews, though: virtually everyone starting a high tech company will be all over this book like senators on cake.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(Why Google is not happy with its results) (What should we say about Sun buying MySQL ?) (MySQL Announces Annual Community, Partner & Application Awards) (Do reciprocal links still work?) (Does advertising in PPC search engines really work?

January 29, 2007

MySQL Enterprise Unlimited Site Agreements Now Available for the Cost of a Single CPU of Oracle Enterprise

MySQL AB today unveiled a simpler way for large and growing organizations to acquire and adopt enterprise software. Designed with a customer’s perspective in mind, a one-year MySQL Enterprise Unlimited subscription offers a company-wide enterprise site agreement at the unprecedented low price of $40,000 (EUR 32,000, GBP 24,000).

Related Posts

(F-Secure Manages Security Services with MySQL Enterprise Unlimited) (Dada selects MySQL Enterprise Unlimited to handle over 7 million online subscribers) (The Twelve Days of Scale-Out: iStockphoto.com Grows to a Top-100 U.S. Web Site with MySQL Enterprise Unlimited) (MySQL Enterprise Simplifies Database Scale-Out with New Automated Replication Monitoring & Advisory Tools) (The Twelve Days of Scale-Out: The Phone House Consolidates its eCommerce Systems on MySQL Enterprise

MySQL AB Completes Record Year

MySQL AB today reported another milestone in its 11-year history with record results for the year ended December 31, 2006. The privately-held company continued to deliver rapid revenue growth — fuelled by the delivery of significant product, community and partner initiatives throughout the year. The recent launch of the new MySQL Enterprise subscription offering is especially drawing high praise among MySQL’s customers for its innovative monitoring tools and low total cost of ownership (TCO).

Related Posts

(MySQL AB Completes Record Quarter & Year) (Selecting One Record Before and After a Value) (MySQL Inks Five-Year GSA Pact) (MySQL AB Announces Record Cluster Database Benchmark Results) (Sage 50 HR 2007 Software Suite Ships with Embedded MySQL Database

O’Reilly OSCON call for papers

The O’Reilly Open Source Convention 2007 will once again take place from July 23rd - 27th 2007 in Portland, Oregon, and has a PHP track as usual. The Call for Papers closes Monday February 5 2007. You can find more information at http://conferences.oreillynet.com/os2007/.

Related Posts

(php|tek 2007 Chicago - Call for Papers) (Pump ‘n’ dump spam) (XML and LINQ papers by Erik) (International PHP Conference 2007 Spring Edition : Call for Paper) (Speaking on OSCON 2007
« Previous entries