March 21, 2008

Big Iron for tests anyone ?

MySQL Users Conference is coming and with it my presentation about Innodb Scalability limits. We did bunch of tests but we surely could get benefit of some extra hardware for testing, so if you could provide us with dedicated remote access for benchmarks it would be great.

Here is what we’re looking for in particular:

- More than Dual Core Opteron systems. Would be good to see how they scale
- More than 8 core systems
- Non x86 based systems (Niagara etc)
- Fancy IO subsystem - more than 8 hard drives
- SSD based storage.

If you have something of this sort available let me know.


Entry posted by peter |
3 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Quick Turnaround on Application Integration Projects: A Motorola Case Study) (Integration Appliances: The Fastest Track to Master Data Management) (Deliver Application Integration Projects in Days) (Box for some tests anyone ?) (Investigate navigation in XQuery

March 20, 2008

MySQL Query Cache WhiteSpace and comments

Commenting on my previous post on MySQL Query Cache Gerry pokes me as I’m all wrong and both comments and whitespace are fixed in MySQL 5.0. This was not what I remember seeing in production so I decided to do some tests on the matter:

I did the test two ways. First using command line client and second using little PHP script which just does the same query. I did this as command line client is known to optimize queries sometimes by skipping “unnecessary” comments.

So here is the command line run:

SQL:

  1. mysql> SELECT /* my little comment */ count(*) FROM fact WHERE val LIKE “%c%”;
  2. +———-+
  3. | count(*) |
  4. +———-+
  5. |        0 |
  6. +———-+
  7. 1 row IN SET (8.77 sec)
  8.  
  9. mysql> SELECT /* my little comment */ count(*) FROM fact WHERE val LIKE “%c%”;
  10. +———-+
  11. | count(*) |
  12. +———-+
  13. |        0 |
  14. +———-+
  15. 1 row IN SET (0.00 sec)
  16.  
  17. mysql> SELECT /* my little comment2 */ count(*) FROM fact WHERE val LIKE “%c%”;
  18. +———-+
  19. | count(*) |
  20. +———-+
  21. |        0 |
  22. +———-+
  23. 1 row IN SET (0.00 sec)
  24.  
  25. mysql> SELECT /* my little comment4 */ count(*) FROM fact WHERE val LIKE “%c%”;
  26. +———-+
  27. | count(*) |
  28. +———-+
  29. |        0 |
  30. +———-+
  31. 1 row IN SET (0.00 sec)
  32.  
  33. mysql> SELECT count(*) FROM fact WHERE val LIKE “%c%”;
  34. +———-+
  35. | count(*) |
  36. +———-+
  37. |        0 |
  38. +———-+
  39. 1 row IN SET (8.79 sec)
  40.  
  41. mysql> SELECT count(*) FROM fact WHERE val LIKE “%c%”;
  42. +———-+
  43. | count(*) |
  44. +———-+
  45. |        0 |
  46. +———-+
  47. 1 row IN SET (0.00 sec)
  48.  
  49. mysql>          SELECT count(*) FROM fact WHERE val LIKE “%c%”;
  50. +———-+
  51. | count(*) |
  52. +———-+
  53. |        0 |
  54. +———-+
  55. 1 row IN SET (0.00 sec)
  56.  
  57. mysql> /* comment*/ SELECT count(*) FROM fact WHERE val LIKE “%c%”;
  58. +———-+
  59. | count(*) |
  60. +———-+
  61. |        0 |
  62. +———-+
  63. 1 row IN SET (0.00 sec)
  64.  
  65. mysql> /* another comment */ SELECT count(*) FROM fact WHERE val LIKE “%c%”;
  66. +———-+
  67. | count(*) |
  68. +———-+
  69. |        0 |
  70. +———-+
  71. 1 row IN SET (0.00 sec)
  72.  
  73. mysql> SELECT /* inside comment */ count(*) FROM fact WHERE val LIKE “%c%”;
  74. +———-+
  75. | count(*) |
  76. +———-+
  77. |        0 |
  78. +———-+
  79. 1 row IN SET (8.81 sec)
  80.  
  81. mysql> SELECT /* inside comment */ count(*) FROM fact WHERE val LIKE “%c%”;
  82. +———-+
  83. | count(*) |
  84. +———-+
  85. |        0 |
  86. +———-+
  87. 1 row IN SET (0.00 sec)
  88.  
  89. mysql> SELECT /* inside comment2 */ count(*) FROM fact WHERE val LIKE “%c%”;
  90. +———-+
  91. | count(*) |
  92. +———-+
  93. |        0 |
  94. +———-+
  95. 1 row IN SET (0.00 sec)
  96.  
  97. mysql> SELECT /* inside comment4 */ count(*) FROM fact WHERE val LIKE “%c%”;
  98. +———-+
  99. | count(*) |
  100. +———-+
  101. |        0 |
  102. +———-+
  103. 1 row IN SET (0.00 sec)
  104.  
  105. mysql> SELECT /* inside comment4 */count(*) FROM fact WHERE val LIKE “%c%”;
  106. +———-+
  107. | count(*) |
  108. +———-+
  109. |        0 |
  110. +———-+
  111. 1 row IN SET (8.82 sec)

Looking at these results you could judge as all problems are indeed fixed. You can have whitespace in the start and you can have leading comment and it all works. However the comment which is inside the query works interesting way - the queries with different comments are both treated as same query if only comment is different. However if you change whitespace a bit (see the last query has space after comment deleted) it causes query cache miss.

If we look at process list output we can see the comments are actually skipped:

SQL:

  1. | 798009298 | root        | localhost            | test           | Query          |       4 | Sending DATA                                                          | SELECT  count(*)   FROM fact WHERE val LIKE “%c%”

This tells us we should not use MySQL Command Line Client for any tests involving comments

Repeating queries from PHP instead we can learn the following about MySQL 5.0 Query Cache:

      Whitespace at the start of query does not block query from being cached. Moreover query with 2 spaces in front is considered same as query with 3 spaces in front
      Comment at the start of the query does not block query from being cached. However queries with different comments are considered different queries (it is not stripped before hashing) - so you should not put things like current time in such a comment.
      Comments inside the query also matter. Meaning if you place comments inside the query or in the end. Though this was always the case

So in the nutshell you should be more free now in regards of some SELECT queries not cached because of whitespace or comments in front of them. Though you still need to have queries exactly the same including comments to make them cached by query cache.


Entry posted by peter |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(MySQL Query Cache and prepared statements - support comming ?) (Beware large Query_Cache sizes) (How MySQL Query Cache works with Transactions) (Query Profiling with MySQL: Bypassing caches) (Getting real life query speeds with MySQL

March 19, 2008

The tool I’ve been waiting for years

I’ve just been pointed to the nice tool which I was waiting for years to see. It is fincore - little perl script which allows you to see what pages of file are cached in OS memory. This is really cool.

When it comes to MySQL it is very useful with MyISAM tables which has their data file cached by OS cache only so you do not have any good information from MySQL side on what data is cached. You can also use it with Innodb to see how much memory are you wasting with double buffering by not using of O_DIRECT.

Besides general clues such as 50% of my file is cached you should watch for dynamics - for example check it during backup process and compare it due to normal load - this can give you a clue if slow down happens because of extra IO pressure or just because pages were washed out. You can also check how pages are cached. For example every second page cached may be helpful for point queries but does not save a lot of IO for doing table scans.

One thing I’m still missing is looking it from another side - so I have say 10GB of OS cache used on the server but how can I tell what is using it ? This look from another side would help me dramatically to find out what is causing cache pressure and what needs to be worked on. Scanning all files on filesystem and checking which are cached obviously does not work.

The fincore looks more like proof of concept tool - it is a bit simplistic, however being written in Perl it is easily hackable - if you want to make it to print percentage of file cached or “graph” showing how cached pages are distributed among file is very easy.

The great thing about this tool it is very fast and it does not disturbs OS file cache by using mincore function to get pages which are currently in cache.

This function is actually the real meat here - the tool is simplistic but it shows how to use the function so you can write real stuff. For example using this tool MySQL can easily add amount of cached data per table for MyISAM and Archive tables to INFORMATION_SCHEMA (or other system tables) which would be really cool Of course than one would need to implement cache content tracking for storage engines which cache everything in their own cache memory - Innodb, Falcon, Maria.

Having information about how large portion of table is cached would allow optimizer to take much smarter decisions in many cases.

Lets now see some examples:

SQL:

  1. [root@DB01 mysql]# du -h ib_log*
  2. 257M    ib_logfile0
  3. 257M    ib_logfile1
  4. [root@DB01 mysql]# perl /tmp/fincore –justsummarize ib_logfile0 ib_logfile1
  5. page size: 4096 bytes
  6. 24141 pages, 94.3 Mbytes IN core FOR 2 files; 12070.50 pages, 47.2 Mbytes per file.
  7. [root@DB01 mysql]# perl /tmp/fincore –justsummarize ib_logfile0
  8. page size: 4096 bytes
  9. 1 page, 4.0 kbytes IN core FOR 1 file; 1.00 page, 4.0 kbytes per file.
  10. [root@DB01 mysql]# perl /tmp/fincore –justsummarize ib_logfile1
  11. page size: 4096 bytes
  12. 24169 pages, 94.4 Mbytes IN core FOR 1 file; 24169.00 pages, 94.4 Mbytes per file.

So we can see one of Innodb log files is practically uncached while other has about 1/3rd cached - this makes sense, perhaps second log file is being written now and there is a “tail” of pages which just were not removed from the cache yet. As Innodb does not read logfile unless in recovery these are waste and Innodb could use fadvice to give instruction to kernel not to cache these as long as it can’t perform direct IO to log files on Linux because it is not aligned.

SQL:

  1. [root@DB01 mysql]# du -h ibdata*
  2. 246G    ibdata1
  3. [root@DB01 mysql]# perl /tmp/fincore –justsummarize ibdata1
  4. page size: 4096 bytes
  5. 0 pages, 0.0  bytes IN core FOR 1 file; 0.00 pages, 0.0  bytes per file.

Out of 250GB innodb data file none of pages are in cache - this is because this instance is using O_DIRECT flag to bypass data buffering and we can well see it works.

Lets now see stats for MyISAM tables:

SQL:

  1. [root@DB01 logs]# du -h performance_log_080318.MYD
  2. 1.1G    performance_log_080318.MYD
  3. [root@DB01 logs]# perl /tmp/fincore –justsummarize performance_log_080318.MYD
  4. page size: 4096 bytes
  5. 497 pages, 1.9 Mbytes IN core FOR 1 file; 497.00 pages, 1.9 Mbytes per file.
  6.  
  7. [root@DB01 logs]# du -h performance_log_080319.MYD
  8. 229M    performance_log_080319.MYD
  9. [root@DB01 logs]# perl /tmp/fincore –justsummarize performance_log_080319.MYD
  10. page size: 4096 bytes
  11. 28415 pages, 111.0 Mbytes IN core FOR 1 file; 28415.00 pages, 111.0 Mbytes per file.
  12. [root@DB01 logs]#

The performance log for yesterday is almost out of cache. It is about 0:50 by server clock this is why we still can see some pages remaining. Today log file is 50% in cache. Knowing access pattern to the file you can draw some conclusions about how much IO pressure we have on this server.

P.S If you would hack this tool or know any similar tools please let me know.


Entry posted by peter |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(How Log Parser 2.2 Works) (Someone AJAXified mytop!) (Windows Administration: Preserve Your Data With The Backup Tool You Already Have) (Windows Administration: Preserve Your Data With The Backup Tool You Already Have) (Windows Administration: Preserve Your Data With The Backup Tool You Already Have

March 18, 2008

Working with many files and file system fragmentation

Working on performance optimization project (not directly MySQL related) we did a test - creating 100 files writing 4K in the random file for some time and when checking the read speed on the files we end up with, compared to just writing the file sequentially and reading it back.

The performance difference was huge - we could read single file at 80MB/sec while fragmented files only deliver about 2MB/sec - this is a massive difference.

The test was done on EXT3 and it looks like it does not do very good job preventing file fragmentation for large amount of growing files.

It would be interesting to see how other filesystems deal with this problem, for example XFS with delayed allocation may be doing better job.

I also would like to repeat the test with MySQL MyISAM tables and see how bad the difference would be for MySQL but I would expect something along those lines.

Interesting enough it should not be that hard to fix this issue - one could optionally preallocate MyISAM tables in some chunks (say 1MB) so its gets less fragmentation. Though it would be interesting to benchmark how much such approach would generally help.

Until we have this feature - reduced fragmentation is one more benefit we get with batching. For example instead of inserting rows one by one in large number of tables once can be buffered in memory (application or MyISAM memory table) and flushed to the actual tables in bulks.


Entry posted by peter |
One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Getting MarkDown and SmartyPants working with EditPad Pro) (Oracle 10g Automatic Storage Management With HP StorageWorks Arrays on HP-UX: Providing Best Practices and Customer Guidance for HP and Oracle Environments) (How IT Works: Encrypting File System) (How IT Works: Encrypting File System) (How IT Works: Encrypting File System

March 17, 2008

Researching your MySQL table sizes

I posted a simple INFORMATION_SCHEMA query to find largest tables last month and it got a good response. Today I needed little modifications to that query to look into few more aspects of data sizes so here it goes:

Find total number of tables, rows, total data in index size for given MySQL Instance

SQL:

  1. mysql> SELECT count(*) TABLES,
  2.     ->        concat(round(sum(table_rows)/1000000,2),‘M’) rows,
  3.     ->        concat(round(sum(data_length)/(1024*1024*1024),2),‘G’) DATA,
  4.     ->    concat(round(sum(index_length)/(1024*1024*1024),2),‘G’) idx,
  5.     ->    concat(round(sum(data_length+index_length)/(1024*1024*1024),2),‘G’) total_size,
  6.     ->    round(sum(index_length)/sum(data_length),2) idxfrac
  7.     ->    FROM information_schema.TABLES;
  8. +——–+———-+———+——–+————+———+
  9. | TABLES | rows     | DATA    | idx    | total_size | idxfrac |
  10. +——–+———-+———+——–+————+———+
  11. |   1538 | 1623.91M | 314.00G | 36.86G | 350.85G    |    0.12 |
  12. +——–+———-+———+——–+————+———+
  13. 1 row IN SET (52.56 sec)

Find the same data using some filter
I often use similar queries to find space used by particular table “type” in sharded environment when multiple tables with same structure and similar name exists:

SQL:

  1. mysql> SELECT count(*) TABLES,
  2.     ->        concat(round(sum(table_rows)/1000000,2),‘M’) rows,
  3.     ->        concat(round(sum(data_length)/(1024*1024*1024),2),‘G’) DATA,
  4.     ->    concat(round(sum(index_length)/(1024*1024*1024),2),‘G’) idx,
  5.     ->    concat(round(sum(data_length+index_length)/(1024*1024*1024),2),‘G’) total_size,
  6.     ->    round(sum(index_length)/sum(data_length),2) idxfrac
  7.     ->    FROM information_schema.TABLES
  8.     ->    WHERE  table_name LIKE “%performance_log%”;
  9. +——–+———+———+——-+————+———+
  10. | TABLES | rows    | DATA    | idx   | total_size | idxfrac |
  11. +——–+———+———+——-+————+———+
  12. |    120 | 370.29M | 163.97G | 0.00G | 163.97G    |    0.00 |
  13. +——–+———+———+——-+————+———+
  14. 1 row IN SET (0.03 sec)

Find biggest databases

SQL:

  1. mysql> SELECT
  2.     ->        count(*) TABLES,
  3.     ->        table_schema,concat(round(sum(table_rows)/1000000,2),‘M’) rows,
  4.     ->        concat(round(sum(data_length)/(1024*1024*1024),2),‘G’) DATA,
  5.     ->    concat(round(sum(index_length)/(1024*1024*1024),2),‘G’) idx,
  6.     ->    concat(round(sum(data_length+index_length)/(1024*1024*1024),2),‘G’) total_size,
  7.     ->    round(sum(index_length)/sum(data_length),2) idxfrac
  8.     ->    FROM information_schema.TABLES
  9.     ->    GROUP BY table_schema
  10.     ->    ORDER BY sum(data_length+index_length) DESC LIMIT 10;
  11. +——–+——————–+——-+——-+——-+————+———+
  12. | TABLES | table_schema       | rows  | DATA  | idx   | total_size | idxfrac |
  13. +——–+——————–+——-+——-+——-+————+———+
  14. |     48 | cacti              | 0.01M | 0.00G | 0.00G | 0.00G      |    0.72 |
  15. |     17 | mysql              | 0.00M | 0.00G | 0.00G | 0.00G      |    0.18 |
  16. |      4 | pdns               | 0.00M | 0.00G | 0.00G | 0.00G      |    1.00 |
  17. |      2 | test               | 0.00M | 0.00G | 0.00G | 0.00G      |    0.12 |
  18. |     16 | information_schema | NULL  | 0.00G | 0.00G | 0.00G      |    NULL |
  19. +——–+——————–+——-+——-+——-+————+———+
  20. 5 rows IN SET (0.32 sec)

Data Distribution by Storage Engines
You can change this query a bit and get most popular storage engines by number of tables or number of rows instead of data stored.

SQL:

  1. mysql> SELECT engine,
  2.     ->        count(*) TABLES,
  3.     ->        concat(round(sum(table_rows)/1000000,2),‘M’) rows,
  4.     ->        concat(round(sum(data_length)/(1024*1024*1024),2),‘G’) DATA,
  5.     ->    concat(round(sum(index_length)/(1024*1024*1024),2),‘G’) idx,
  6.     ->    concat(round(sum(data_length+index_length)/(1024*1024*1024),2),‘G’) total_size,
  7.     ->    round(sum(index_length)/sum(data_length),2) idxfrac
  8.     ->    FROM information_schema.TABLES
  9.     ->    GROUP BY engine
  10.     ->    ORDER BY sum(data_length+index_length) DESC LIMIT 10;
  11. +————+——–+———+———+——–+————+———+
  12. | engine     | TABLES | rows    | DATA    | idx    | total_size | idxfrac |
  13. +————+——–+———+———+——–+————+———+
  14. | MyISAM     |   1243 | 941.06M | 244.09G | 4.37G  | 248.47G    |    0.02 |
  15. | InnoDB     |    280 | 682.82M | 63.91G  | 32.49G | 96.40G     |    0.51 |
  16. | MRG_MyISAM |      1 | 13.66M  | 6.01G   | 0.00G  | 6.01G      |    0.00 |
  17. | MEMORY     |     14 | 0.00M   | 0.00G   | 0.00G  | 0.00G      |    NULL |
  18. +————+——–+———+———+——–+————+———+
  19. 4 rows IN SET (14.02 sec)

Trivially but handy.


Entry posted by peter |
One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Opening Tables scalability) (Mail clients and Databases) (MySQL: what read_buffer_size value is optimal ?) (TMP_TABLE_SIZE and MAX_HEAP_TABLE_SIZE) (Merge Tables Gotcha

March 16, 2008

MySQL Error Message Nonsenses

What MySQL honestly was never good at is giving good helpful error messages. Start with basics for example - The error message in case of syntax error gives you information about tokens near by but little details:

SQL:

  1. mysql> SELECT * FROM  user oder BY pwd;
  2. ERROR 1064 (42000): You have an error IN your SQL syntax; CHECK the manual that corresponds TO your MySQL server version FOR the RIGHT syntax TO USE near ‘by pwd’ at line 1

It would be much better if MySQL would give error give exact position of error (with complex auto generated queries line number is often not good enough) as well as give some better explanation on what is wrong.

The new parser for MySQL was spoke since 5.0 times but it never took off and I’m not seeing it on public road map either.

Though this is just tip of the iceberg of not so helpful or misleading error messages.

Incorrect information if file table.frm is perhaps my favorite one. How do you expect to decode from this message this means storage engine in question is unavailable ? This is especially strange because it should not be so hard to discover this issue - storage engine which is specified in .frm file is not available and give user friendly error message.

Another great set of errors is Got Error XYZ from storage engine which could mean all sort of things. For example this may be Operating system specific error code to tell you why operation could not be performed (read error, out of file descriptors etc). In fact you can run “perror X” to get the explanation but for some reason it is not done automatically.

Even more - the operating system error are often mixed with MySQL errors and in some cases error code may have two meanings. Though typically these do not overlap:

SQL:

  1. [pz@sl1 ~]$ perror 124
  2. OS error code 124:  Wrong medium type
  3. MySQL error code 124: Wrong INDEX given TO FUNCTION

The same code instead of explanation is often seen in other combinations:

ERROR 1005 at line 20: Can’t create table ‘./test/test.frm’ (errno: 150)

May make you to think you have some kind of strange file system/OS error until you run perror:

SQL:

  1. [pz@sl1 ~]$ perror 150
  2. MySQL error code 150: FOREIGN KEY constraint IS incorrectly formed

What is also interesting is - this error 150 is in fact internal Innodb error code. Other storage engines may or may not use same error codes for same kind of error complicating automatic handling.

When we tested different transactional storage engines we surely saw lock related tables being far from consistent.

Let me give you another example. So Innodb has limit of 1023 open transactions which perform writes. I’m not excited to see there is a limit as such but what is especially annoying is absolutely misleading error messages produced when this limit is reached:

SQL:

  1. mysql> INSERT INTO yt VALUES (66666);
  2. ERROR 1114 (HY000): The TABLE ‘yt’ IS full

For me table is full means something like I’m out of disk space (or quota set for the table size (think MEMORY)) while in this case it is nothing like that and only going to error logs allows you to figure out what is the problem

SQL:

  1. 070223 13:26:01InnoDB: Warning: cannot find a free slot FOR an undo log. Do you have too
  2. InnoDB: many active transactions running concurrently?

I hope with pluggable storage engine interface Sun/MySQL spends some time to make things consistent. There should be general high level error codes which can be portable across storage engines - for example something with meaning of “Storage engine is out of resources” together with ability for storage engine to pass as much information back as needed to investigate and solve the error.

Going back to our foreign keys example it is not only you need to run perror to find out this error is foreign key related but you also need to run SHOW INNODB STATUS to see what exactly Innodb did not like about foreign key constraint definition.


Entry posted by peter |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(ASP Error Handling - On Error Resume Next) (ASP Error Handling - On Error Resume Next) (Add an option to Fail on Innodb Initialize failure, Please ?) (Linux failing to boot screen on the plane) (Microsoft Still Doesn’t Get Security

March 15, 2008

Sharding and Time Base Partitioning

For large number of online applications once you implemented proper sharding you can consider your scaling problems solved - by getting more and more hardware you can grow. As I recently wrote it however does not mean it is the most optimal way by itself to do things.

The “classical” sharding involves partitioning by user_id,site_id or somethat similar. This allows to spread data more or less evenly across the boxes and use any number of boxes. However this may be not the most optimal approach by itself because not all data belonging to same user is equal.

Consider Blog or Forum as example - most likely few last posts will get majority of hits while things written year ago are accessed with much less frequency. You can often level off this significantly for reads by using caching (if things are accessed frequently they are served from cache) but you still have to deal with writes which can be significant depending on your design.

It does not only have to be active portions of data same way you can have active users and ones which are almost dead.

Another interesting type of data which I find often kept on the same “cluster” without good reason is some sort of logs or history data. Think about Wikipedia page version history for example which accumulates to huge volume which very few users need to read, various change logs etc are other type of this data.

Besides separating “cold” data from “hot” it often makes sense to separate data based on its importance for system operation - for example if page versions data is currently unavailable for Wikipedia it is still possible to serve 99% of reads and even possibly handle writes by queuing new version creation.

It may make sense to separate data on table (or partition) level to get better “clustering” - because data is usually cached by pages rather than by rows and index entries having all hot data in separate table from cold data is much more efficient for caching than having table with the mix, even though the total size remains same.

It also often makes sense to separate data on the server level. Keeping Hot and Production Critical dataset small you can both make system to perform faster as well as well as get plenty of operating benefits - small database takes less time to backup and restore it is easier to do ALTER TABLE and replication would not fall behind as easily.

You can also use different hardware for different parts of data - you can hold “Hot” data on fast RAID volume or even SSD while place archive data on slow but large SATA volumes (unless response time will not become show stopper)

Of course not all applications need to use this technique but there is significant class of application which can benefit from it dramatically.


Entry posted by peter |
4 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Oracle Partitioning - A Must for Data Warehouse and OLTP Environments) (Concurrent PCB Design Through Design Partitioning) (MySQL to Certify to the Linux Standard Base) (Linux/BSD Gangsters: OpenOffice Base And MySQL) (Improving Database Performance with Partitioning

March 13, 2008

Economics of Performance Optimization

I think every person responsible for Development or Operations of growing application sooner or later have to decide on couple few questions on how to tackle application performance. These questions are:

  • Should we Optimize Application or get more Hardware ?
  • Should we do things ourselves or hire an experts to help us ?

The answer on these questions actually depend on a lot of things, some of which we’ll try to cover here.

The things which usually define best solution are economics , resources and risks .

First lets talk about optimizing application vs getting better hardware.

With application optimization typically you have rather interesting relationship between “effort” and “result” typical application would have number of low hanging fruits which are easy to fix - adding couple of proper indexes or tuning couple of MySQL settings.

When low hanging fruits are fixed you end up with either fine tuning which gives little improvement or more application changes such as building summary tables, significant schema changes, use of replication, caching, using memcached, sphinx, implementing proper web caching etc. These changes are not only require more time to do but they also require proper experience to implement well and add has higher risks - adding index you rarely would add bugs to your application while redesigning application schema or implementing caching are much more bugs prone. Another risk is implementing such solution in time before your application collapsing.

When it comes even more serious application architecture changes such as sharding which unlike of things like caching often has to be done at once rather than incrementally which becomes even more expensive, requires even higher experience to do well and poses higher risks.

On the Hardware side things are not so trivial as well. First there is Scale-Up of using more powerful server and there is Scale-Out when you can get use of more servers efficiently.

If you Scale-Up you by getting more powerful MySQL Server hardware the cost is going to be increasing dramatically after certain hardware size. For example right now you can get 8 core x86-64 box with 32G with good price/performance value while if you would like to go significantly higher like 32 cores and 128G of RAM it is going to be dis proportionally more expensive.

There are also however resources and risks issues - fewer people have experience dealing with quite high end hardware plus it is often more risky if it breaks because immediate replacement may not be available and keeping unused spare box can be expensive. Because fewer users use these boxes you also likely to have more bugs on all levels starting from Hardware and going down to MySQL.

Then now even if it all works perfectly you should think if your assessment of MySQL scaling abilities is the right one. It well may be you will not be able to scale efficiently past certain hardware configuration - some of these issues may be MySQL related and others (like excessive row level locks) may be caused by application design.

Another thing to consider - even if you have unlimited money to spend there are still limits of how powerful hardware you can get on the market, especially if you consider some workloads may have problems with latency rather than throughput. If single user
CPU bound query takes 30 seconds you unlikely will be able to solve the problem by CPU upgrade. For single thread workload the most expensive x86-64 CPU you can buy would be unlikely more than 2-3 times faster than the one in laptop I’m using right now.

So what it all means ?

This means it often good idea to pick up low hanging fruit straight away - before you look at upgrade the first time, because it even can take less time than moving the application. On reasonably tuned applications when further tunings are expensive you may well upgrade hardware instead but as you grow further you’ve got to think about application again.

Of course this assumes your application starts from the ground and grows gradually. There are cases when application starts from quite serious load to start with - like if you work for Yahoo and develop the new “Applet” which will be featured on the front page you quite likely start with scaled out and reasonably optimized phase.

Let us think about some Social Network applications - these tend to grow fast if they are lucky.

Initially it runs on some dedicated (or even virtual) rented server which would usually have 1-2GB of memory may be couple of SATA hard drives - something one would rent for $100-200 a month, personally funding projects people rarely like to waste more money until projects takes off.

As it gets slow one can spend say $1000 or personal time optimizing it and get 10x capacity compared to naive implementation.

At the same time one would likely get couple of boxes to spread Web and Database instances and establish may be establish replication mainly to be able to fail over if there are problems.

Over time application still will be limited by MySQL server so the alternative would be either spend $5000 on implementing caching to upload it or to get some higher end box with few cores 16-32G of RAMs and good IO subsystem which would cost $5-10K or some $500 a month if you want to lease it. Of course because of replication you would want at least two.

Implementing memcache may look like best idea from economic standpoint until you think about resources and risks - hardware upgrade is straightforward and can be done by operations and data center stuff so developers can keep developing new features which users are screaming about instead of this boring backend stuff and when hunt for bugs.

After this step you however would need to look at application - as you can get server much higher and you need to scale 10x within next 6 months by your conservative growth forecast. So you go ahead and implement memcache, sharding, replication or other techniques which allow you to scale out to the several boxes.

If you do this step properly you may end up in interesting situation - you can put 10, 20, 40 shards and scale almost linearly. Does it mean you should not look at application performance and just grow by getting more hardware ?

Not really. Even though you may have solved your survival problem it does not mean you’re running things cost effective way and getting maximum out of hardware you have. Plus there could be nice surprises such as data center power or space limits coming your way.

Assuming server cost of $10K per server and operating cost of $200 per server (space/power/admin overhead) if you have just one server optimizing your application very modest 10% does not give you much benefit - you would just need another server a bit later. However if you have 100 MySQL servers to deal with even modest 10% can save you $100K in hardware cost and $2000 a month in operating costs.

The point is the more large scale application you have the more time and effort you should spend on optimization. Same often applies to HA as higher scale applications tend to have higher cost of downtime.

So now what is about using your Own Resources vs Hiring Consultants ?

Again we should think about economics resources and risks.

In the early stage it often makes sense to do a lot of things yourself. Learning to understand the basics allow you to design the application more efficient way and you probably do not want to need consultants to review every simple query you design. Getting low hanging fruit is also not a rocket science - adding couple of extra indexes is not that hard to do.

At the same time it may make sense to attract experts to review what you’re doing to make sure you actually picked up all these low hanging fruits you could and what you’re not doing some silly mistakes with application architecture or table structure which may cost you much more to fix at later stage.

As time goes many project find them in the situation when they have hard time allocating resources to get application optimized properly - they need to move fast rolling out new stuff especially in high pace markets. Besides lack of resources it attracting experts to help you well may be less risky - hopefully they have already dealt with similar situations and know what works well which reduces trial and error and general risk of going wrong way. Plus because they know the stuff they often need to spend significantly less time on the same task which makes it cheaper than looking internal resources in the end.

As project moves to the higher end another variable comes into the play - even if your local resources have created the system which performing well getting experts which may have ideas how to optimize it another 10-15% may be a great idea. On higher end I’d even look to speak to few of them as optimization is as art as it is science and each expert may his own ideas which may or may not work well.

Reading this you may think I’m promoting our Services. Indeed I am. I truly think our services is a good match for a lot of companies and is cost efficient and low risk way to tackle these problems, and even though I just spoke about scaling and performance optimization it goes far beyond this.

Hiring us is of course not the only option. There are variety of companies on he market offering services around MySQL and LAMP and MySQL is one of them. Plus for large companies it may make sense to hire great experts and have them to do inhouse consulting helping various projects to get things right. Both Google and Yahoo for example have great experts in MySQL and Scaling in general.


Entry posted by peter |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(IBM DB2 Performance Optimization eKit) (MySQL Master Class after HighLoad.RU) (The risk of over-optimization) (The final goal of search engine optimization) (Looking for High Performance MySQL Interns

March 7, 2008

Speeding up GROUP BY if you want aproximate results

Doing performance analyzes today I wanted to count how many hits come to the pages which get more than couple of visits per day. We had SQL logs in the database so It was pretty simple query:

SQL:

  1. SELECT sum(cnt) FROM (SELECT count(*) cnt FROM performance_log_080306 GROUP BY page HAVING cnt>2) pv;

Unfortunately this query ran for over half an hour badly overloaded server and I had to kill it in the end.

The reason for slowness was of course huge temporary table was required (there were about 5 million of distinct pages visited during that day) which resulted in on disk temporary table which as we know quite slow.

Of course it would be possible to allocate more memory to the temporary table or switch to filesort method and get result faster.

I however picked another road which is quite helpful in similar cases - I did not need exact result but approximate figure so I could trick MySQL to do group by a hash of the page instead of page itself:

SQL:

  1. mysql> SELECT sum(cnt) FROM (SELECT count(*) cnt FROM performance_log_080306 GROUP BY crc32(page) HAVING cnt>3) pv
  2.     -> ;
  3. +———-+
  4. | sum(cnt) |
  5. +———-+
  6. 1127031 |
  7. +———-+
  8. 1 row IN SET (31.22 sec)

As you can see now it completes in about 30 seconds - quite handy.

Another trick I want to share which I use a lot when I want to analyze data distribution but table is to large is to just limit it to first number of rows:

SQL:

  1. mysql> SELECT avg(length(page)) FROM (SELECT page FROM performance_log_080306 LIMIT 10000) tmp;
  2. +——————-+
  3. | avg(length(page)) |
  4. +——————-+
  5. |           70.0444 |
  6. +——————-+
  7. 1 row IN SET (0.08 sec)

Again this is not exact value but normally close enough to make a decision.


Entry posted by peter |
2 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Speeding Up Table Creation Activity) (How to get out of Google’s supplemental results) (Group Policy: Optimizing Group Policy Performance) (Windows Administration: Your Guide to Group Policy Troubleshooting) (Windows Administration: Your Guide to Group Policy Troubleshooting

March 6, 2008

Sphinx 0.9.8 reaches RC stage, Docs updated

Andrew Aksenoff is pretty slow with release numbers for Sphinx. By MySQL Users Conference 2007 Sphinx version 0.9.7 was released and today we had just 0.9.8-rc1 announced This minor change in version number corresponds to about double source size (looking at download size) and major rewrite for many portions. Sphinx 0.9.8 snapshots were more stable than 0.9.7 for a long time but I guess Andrew did not call them releases because documentation was incomplete and this just was fixed now.

Hopefully Andrew gets 0.9.8 release out for MySQL Users Conference.

BTW Andrew will be speaking at MySQL Users Conference so if you wanted to learn more about Sphinx it is great place to come.
Also he will stay in Bay Area for a few more days so if you’re implementing Sphinx and would like some first hand help or advice this would be great opportunity.


Entry posted by peter |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Sphinx Developer joins our team) (SQL Server 2005: Set the Stage for a Smooth Upgrade) (SQL Server 2005: Set the Stage for a Smooth Upgrade) (Looking for someone with Chinese knowledge) (Inside Microsoft.com: Analyzing Denial of Service Attacks
« Previous entries