October 31, 2007

Sage 50 HR 2007 Software Suite Ships with Embedded MySQL Database

MySQL AB today announced that the new Sage 50 HR 2007 software suite is now shipping to small businesses in the UK with embedded MySQL database technology. The announcement was made this morning at the Sage Visions 2007 conference in Telford, UK, and follows a global agreement between the Sage Group plc and MySQL AB signed last year. Sage also plans to embed MySQL into two other products, Sage 50 Accounts and Sage 50 Payroll, which plan to ship this time next year.

Related Posts

(Sage Announces Global Partnership with MySQL AB) (The MySQL Embedded Database Server Powers ScienceLogic Network Management Meta-Appliances) (MySQL AB to Optimize its Open Source Database for SAP NetWeaver) (TRUMPF Laser to Embed MySQL in its Manufacturing Products) (WYSdom Dental Technologies Embeds MySQL Database

The new cool MySQL patch has landed! Check your queries performance!

Microslow patch is used by many DBAs and developers to accurately time their queries and to catch those which run less than a second as they can also be a performance killer for a busy application.

Recently I have started the development of an updated version of the patch. The basic idea is the same as for its predecessor - to get more information about query execution logged into slow log, however the new version is loaded with a set of cool new features.


CONNECTION IDENTIFIER

Each slow log entry now contains a connection identifier, so you can trace all the queries coming from a single connection.

CODE:
  1. # Thread_id: 4

MICROTIME RESOLUTION QUERY TIMING

This is the original functionality offered by Microslow patch. The new edition is free of a tiny bug which was to treat long_query_time value as seconds. In effect one could only choose to log all queries to see those which take less than 1s. Now long_query_time is in microseconds and you can set it for example to 300000 which is 0.3s!

CODE:
  1. # Query_time: 0.503016  Lock_time: 0.000048  Rows_sent: 56  Rows_examined: 1113

LOGGING OF THE REPLICATED STATEMENTS

Normally MySQL will not write into slow log any queries executed by the slave’s SQL thread. This patch allows you to change that behavior with the new parameter --log-slow-slave-statements.

THE DETAILED INFORMATION ABOUT QUERY EXECUTION

Each query can be executed in various ways. For example it may use indexes or do a full table scan, or a temporary table may be needed. These are the things that you can usually see by running EXPLAIN on the query. The patch will now allow you to see the most important facts about the execution in the log file.

CODE:
  1. # QC_Hit: No  Full_scan: No  Full_join: No  Tmp_table: Yes  Disk_tmp_table: No
  2. # Filesort: Yes  Disk_filesort: No  Merge_passes: 0

QC_Hit tells whether query cache hit occurred or not. If it says ‘Yes’, then all other values will be ‘No’ or 0, because the query was not actually executed by the engine.

If Full_scan is set then most likely the query is a bad one, because it read all the rows from a table.

Full_join means any of the joins didn’t use indexes.

If a temporary table is required for query execution Tmp_table will be set. Sometimes that table must be created on disk instead of in memory, in such case Disk_tmp_table will state that.

Filesort is for indicating that filesort algorithm was used, while Disk_filesort means that sorting was done through temporary files.

INNODB USAGE

The final part are the InnoDB usage statistics. MySQL currently allows you to see many per-session statistics for operations with SHOW SESSION STATUS, but that does not include those of InnoDB, which are always global and shared by all the threads. The patch gives you the opportunity to see those values for a given query.

CODE:
  1. #   InnoDB_IO_r_ops: 19  InnoDB_IO_r_bytes: 311296  InnoDB_IO_r_wait: 0.382176
  2. #   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.067538
  3. #   InnoDB_pages_distinct: 20

InnoDB_IO_r_ops is the counter for the number of page read operations scheduled. The acutal number of read operations may be different, but since this can be done asynchronously, unfortunatelly there is no good way to measure that. InnoDB_IO_r_bytes is the corresponding value in bytes.

With InnoDB_IO_r_wait you can see how much time (in seconds) it took InnoDB to actually read the data from storage.

InnoDB_rec_lock_wait is the time (in seconds) query had to wait on row locks.

InnoDB_queue_wait is the time (in seconds) query spent either waiting to enter the InnoDB queue or inside that queue waiting for execution.

InnoDB_pages_distinct tells the number of unique pages accessed by the query. This is actually an approximation based on a small hash array representing the entire buffer pool, because it could take a lot of memory to map all the pages. The inaccuracy grows with the number of pages accessed by a query as there is more chance for hash collisions.

In case InnoDB is not used for the query execution that information is written into the log instead the above statistics.


The full slow log entry for a query using InnoDB can look like this:

CODE:
  1. # User@Host: macko[macko] @ localhost []
  2. # Thread_id: 4
  3. # Query_time: 0.503016  Lock_time: 0.000048  Rows_sent: 56  Rows_examined: 1113
  4. # QC_Hit: No  Full_scan: No  Full_join: No  Tmp_table: Yes  Disk_tmp_table: No
  5. # Filesort: Yes  Disk_filesort: No  Merge_passes: 0
  6. #   InnoDB_IO_r_ops: 19  InnoDB_IO_r_bytes: 311296  InnoDB_IO_r_wait: 0.382176
  7. #   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.067538
  8. #   InnoDB_pages_distinct: 20
  9. SET timestamp=1193841780;
  10. SELECT DISTINCT c from sbtest where id between 501895 and 502895 order by c;

or if InnoDB was not used:

CODE:
  1. # Time: 071031 20:03:16 # User@Host: macko[macko] @ localhost []
  2. # Thread_id: 12
  3. # Query_time: 0.026385  Lock_time: 0.000158  Rows_sent: 1  Rows_examined: 0
  4. # QC_Hit: No  Full_scan: No  Full_join: No  Tmp_table: No  Disk_tmp_table: No
  5. # Filesort: No  Disk_filesort: No  Merge_passes: 0
  6. # No InnoDB statistics available for this query
  7. SELECT COUNT(1) FROM t2;

Important note: Although the patch was tested and run successfully under stress with both synthetic benchmarks and real database traffic, it”s considered experimental and as such I don”t recommend using it in production or on any important system!

The patch is available for MySQL Community 5.0.45.

Maciek Dobrzanski


Entry posted by Maciej Dobrzanski | One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Microslow patch for 5.1.20) (Enabling/Disabling query log without server restart in MySQL 5.0) (Getting real life query speeds with MySQL) (Surprise, surprise!) (Yahoo Search Suggestions for MySQL

Earthquake!

That was a weird experience … my first earthquake. 5.6 magnitude, centered 9 miles away, the whole hotel shook for about 15 seconds. At the Adobe building across the road, the lights must not be wired in very tightly, because lights were flashing on and off the whole time. OK, I guess it”s sort of like being on an airplane during turbulance.

 

 

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(This year’s Business of Software conference

October 30, 2007

Yet another presentation submission

Looking at the Feedback I got for other Presentations Proposals I thought I indeed should submit general presentation focusing on Web Application performance tuning and explaining how you analyze performance and why do you do it this way, so here it is:

Performance Analysis of MySQL powered Web Applications In this session we’ll go beyond MySQL Performance Optimization and look at full stack performance tuning for Web Applications. It is practical session about Performance Tuning Methodology.

Several years ago I have heard the talks about Oracle Performance Optimization methodology by Cary Millsap and that was a great talk with a lot of techniques and concepts could (and should) be applied for any kind of applications, including Powered by MySQL. Now we’re using similar principles in our MySQL Consulting for performance analyzes of MySQL and Web Applications and it works very well.

I also think it is good idea to “spice up” the conference and get some foundational stuff rather than just tips tricks and benchmark results.

Lets see what Conference Board thinks about it.

Also I see submissions are coming at increased rate. Few hours before official submission deadline this talk gets number 815. The one I submitted about week ago got number 372. So expect tough competition this year !


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(7 professional website promotion tools in one) (An Overview of Windows Presentation Foundation) (Lang .NET 2008, Scripting, and Visual Basic) (Sorry for the broken comments…) (Speaking on MySQL Users Conference 2008

Reciprocal links are alive and kicking

A while back there was quite a bit of scare mongering going around the SEO industry about how reciprocal links were dead. No, reciprocal links are not dead and now I have the proof.

Related Posts

(Do reciprocal links still work?) (One-way links, reciprocal links, three-way-links. What’s best?) (Beware the reciprocal link short sellers) (The top 5 myths about link popularity building tools) (Can too many links hurt your website?

New websites

Those of you who develop custom software for clients have probably been burned more than once by clients who are disappointed with what you thought was going to be the “final” version, and start demanding countless changes. You”ve probably built pretty good defenses against this: I”ve seen contrators ask their clients to sign every single page of the spec so that there can be no question that the client got exactly what they were promised.

It doesn”t help. The client is still unhappy. They knew what you were going to build, intellectually, but as soon as they saw it in real life, they started finding lots of problems:

Custom development is that murky world where a customer tells you what to build, and you say, “are you sure?” and they say yes, and you make an absolutely beautiful spec, and say, “is this what you want?” and they say yes, and you make them sign the spec in indelible ink, nay, blood, and they do, and then you build that thing they signed off on, promptly, precisely and exactly, and they see it and they are horrified and shocked, and you spend the rest of the week reading up on whether your E&O insurance is going to cover the legal fees for the lawsuit you”ve gotten yourself into or merely the settlement cost. Or, if you”re really lucky, the customer will smile wanly and put your code in a drawer and never use it again and never call you back.

(from Set Your Priorities)

That”s just how the world works. A spec can”t reveal everything. Clients can”t use the spec for their real work, and they”ll never notice just how many keystrokes it takes to do basic operations until you get, at least, a UI prototype working.

A good way to defend yourself against this is to deliver lots and lots of interim versions to the client: real, working interim versions, and get them using it so you can build feedback into future iterations. Rather than going off in a cave and building something for a year, only to find out that 9 months of that work is wasted, you show the client something every month, say, and get instant feedback and then you adjust directions if needed. That”s one thing that many people waving the “agile” flag are talking about. The first delivery should be the minimum thing that could possibly be useful.

This should not be seen as an excuse to write code without designing it, first. All code is going to be designed eventually. But if you try to design it first in a programming language, the designing process is vastly slower than if you were designing it with pencil and paper and descriptive paragraphs in the English language, so you”re wasting time.

Anyway. Where was I going with this? Oh yes. For the first time, I had the honor of being the “client” in that client/developer relationship, when we hired Happy Cog Studios to build our new website.

We gave them a relatively precise, unambiguous spec. Basically, we wanted the same website as we already had, only we wanted it to be prettier.

Here”s what our original website, designed by Dave Shea, looked like (click to enlarge:)

It was a great design, but several years old, and it felt dated.

Happy Cog gave us a couple of design options to use as starting points:

Or:

Both were quite good, I think, but had showstoppers. The first design put the most important paragraph in a spot where people were very likely to miss it. The second design required 1024 pixels. Even when people have 1024×768 (or larger) monitors, they don”t keep their browsers that wide. And about 30% of the people we asked hated the orange.

Which led us into several rounds of iteration by “a committee of tasteless slobs,” that committee being mostly me, and somehow we ended up with a design that just got worse and worse and worse the more we tweaked it. This is what we started to build:

That”s when I knew we had to start over, and when I suddenly knew what it felt like when you told the barber to give you an “en brosse” haircut because you thought you would look like Tom Cruise, and that”s not what happened.

Anyway, at that point I realized the design was suffering because it was trying to stuff a FogBugz identity (with the kiwi and the FogBugz logotype) into a Fog Creek website (with the Fog Creek wave logo). This both ate a lot of space at the top of the page, where real estate was scarce, and made the page a confusing bundle of links.

Fortunately, Happy Cog was very patient with us. We asked them to start fresh, with a new design concept and a new designer. Second, we decided that the FogBugz website doesn”t have to look anything like the Fog Creek website. It should be a showcase for FogBugz, with a tiny link to Fog Creek at the bottom. Happy Cog would design the FogBugz website. The Fog Creek website, which far fewer people ever visit, is about the company itself. It links to our products but doesn”t have to have the same graphic design. I designed that myself, with a very minimalist design that captured the essense of the company:

Larger picture | Go to fogcreek.com

My favorite part is the slideshow, with a set of pictures that captures life at Fog Creek perfectly. It”s plain, it”s minimalist, it reflects the company personality, and it harks back to the very first website design from when we started the company seven years ago:

Happy Cog assigned Dan Mall to the redesign of the FogBugz page. His first design was really good, and I knew that it would be better to just shut up and give him artistic license to do whatever he thought was best. I thought that we could try and meddle in his design, tweaking things left and right, and get another bad design-by-committee, or we could just tell him we trusted him and whatever he delivered would be exactly what went live.

Which it was.

Larger picture | Go to fogbugz.com

So that”s where we stand. Babak and I spent a week creating a video FogBugz demo for the site using my favorite screen-recording program, TechSmith Camtasia. The sound we could get in the office was just not good enough, so we rented studio time to record the soundtrack.

It took a lot of editing, but we got the video down to about 13 minutes. Probably too long for a web video, but what can ya do. And I already talk to fast.

Anyway, that”s the saga of the new websites. How do you like them?

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(Should you buy text links?) (How to overcome Google’s filter for new websites) (How to fake PageRank and how to find websites with faked PR) (What is better: PPC or SEO?) (Google’s new position 6 penalty

October 29, 2007

MySQL AB Expands Presence in Italy

MySQL AB is building on its European organisation with a new sales office in Milan, Italy to focus on supporting corporate customers.

On November 8 in Milan, the database vendor’s first public event in Italy will take place. The Italian MySQL team invites users and customers to learn about the company, its products and services.

More information about the ‘MySQL in Italia’ event is available here.

Related Posts

(UN’s FAO Selects MySQL as its Open Source DB Standard) (MySQL K.K. to Strengthen its Presence in the Japanese Database Market) (MySQL Expands Telecom Services to Meet Growing Demand) (MySQL Expands Telecom Services to Meet Growing Demand) (mysql_real_escape_string() versus Prepared Statements

Some feedback on EBS

Jeff Atwood: “It”s a tremendous credit to Joel Spolsky that he made this crucial feature the centerpiece of the new FogBugz. I”m not aware of any other software lifecycle tools that go to such great lengths to help you produce good estimates.”

Rafe Colburn: “We’re rolling out FogBugz 6.0 at work, and I’m finding that I actually like the time tracking. For one thing, it’s a tool for focus. When you kick off the timer on a task, you don’t want to jump around and multitask because it will just throw off the timer. The timer feature itself is pretty easy to use.”

Scott Rosenberg: “What’s most interesting about the new FogBugz is what Spolsky and his team are calling ‘Evidence Based Scheduling…’”

Reg Braithwaite: “I built a prototype that did the exact thing that FogBugz is doing quite some time ago. However, prototypes are not shipping products. FogBugz is a shipping product. My prototype was not. And that makes all the difference.”

Also: I wrote an earlier article describing Evidence-Based Scheduling, which was in Better Software Magazine last March.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(Another article preview up… this one is a content rotator for ASP.NET) (Another article preview up… this one is a content rotator for ASP.NET) (Data Layer Diversity: It’s Not Just Relational Anymore) (Yes, it’s been quiet around here…) (VB 2005 Performance QFE released…

Hacking to make ALTER TABLE online for certain changes

Suppose you want to remove auto_increment from 100G table. No matter if it”s InnoDB or MyISAM, you”d usually ALTER TABLE `huge_table` CHANGE `id` `id` int(6) NOT NULL and then wait hours for table rebuild to complete. If you”re unlucky i.e. you have a lot of indexes and not too much RAM - you could end up waiting days. If you want to make this happen quick - there”s another way. Not documented, but works well with both - InnoDB and MyISAM.

Now that more and more folks hit the InnoDB auto-inc scalability issue with MySQL 5.0 and older versions, employing other techniques to maintain the PK auto incremental becomes more of an issue. One of the steps here is to remove current PK auto_incremental from the table. As a rule of thumb, this usually involves altering huge InnoDB tables and huge tables take time to rebuild.

Disclaimer: try this at your own risk. It worked for me, it may work for you too, but always have a backup before doing that kind of stuff, as it is not the way MySQL would advice you to do it and we can”t guarantee it will work well for you either.

So, in a nutshell, all you have to do is create another table with desired table structure and switch .frm table definition files. For safety, I”d recommend to flush tables with read lock while switching .frm files. When and how it works:

auto_increment (removing). Let”s have a simple table with auto_increment we want to get rid of:

SQL:
  1. CREATE TABLE `huge_table` (
  2.   `id` int(6) NOT NULL AUTO_INCREMENT,
  3.   `text` text NOT NULL,
  4.   PRIMARY KEY  (`id`)
  5. ) ENGINE=InnoDB

To remove auto_increment, we (1) create table with the same layout but without auto_increment, (2) flush tables with read lock, (3) swap .frm files while keeping mysql suspended and (4) unlock the tables afterwards:

SQL:
  1. – 1.
  2. mysql> CREATE TABLE `huge_table_new` (
  3. ->   `id` int(6) NOT NULL,
  4. ->   `text` text NOT NULL,
  5. ->   PRIMARY KEY  (`id`)
  6. -> ) ENGINE=INNODB;
  7. Query OK, 0 rows affected (0.01 sec)
  8.  
  9. – 2.
  10. mysql> FLUSH TABLES WITH READ LOCK;
  11. Query OK, 0 rows affected (0.00 sec)
  12.  
  13. – 3.
  14. mysql> ^Z
  15. [1]+  Stopped                 mysql test
  16. # mv huge_table.frm huge_table_old.frm && mv huge_table_new.frm huge_table.frm && mv huge_table_old.frm huge_table_new.frm
  17. # fg
  18. mysql test
  19. – 4.
  20. mysql> UNLOCK TABLES;
  21. Query OK, 0 rows affected (0.00 sec)
  22.  
  23. mysql> SHOW CREATE TABLE `huge_table`G
  24. *************************** 1. row ***************************
  25. TABLE: huge_table
  26. CREATE TABLE: CREATE TABLE `huge_table` (
  27.   `id` int(6) NOT NULL,
  28.   `text` text NOT NULL,
  29.   PRIMARY KEY  (`id`)
  30. ) ENGINE=InnoDB DEFAULT CHARSET=latin1
  31. 1 row IN SET (0.00 sec)
  32.  
  33. mysql> INSERT INTO `huge_table` (text) VALUES (“test”);
  34. ERROR 1062 (23000): Duplicate entry “0″ FOR KEY 1

Unfortunately, adding auto_increment does not work that way.

Enum values (add and remove). Enumerated values are added and removed the same way that auto_increment is removed. I”ve been a bit surprised, that removing value from enum() works as good as adding it - rows that have incorrect values are just returned as empty. But I suppose this does violate mysql data file structure, so be really careful with that one.

Default values. MySQL rebuilds table even if we only want to change the default value for new records so this may save one from a lot of trouble.

Table comment. I”m pretty sure that would work for changing table comment as well, however - changing a comment with a help of ALTER TABLE does not rebuild the table, so we better use the documented method for that.

What I”m surprised about is that changing a comment does not to require table to be rebuilt, while things like removing auto_increment or changing a default value still do even though this information is stored in table definition file.

If you”ll ever try this, please leave a comment if it did work for you. Maybe you have discovered some new things to alter that way?


Entry posted by Aurimas | 4 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Metalink Hacking V1.01) (Working with large data sets in MySQL) (MSDN Webcast: A Hacker’s View of Your Web Applications Part 2: Web Hacking - Attack Scenarios and Examples - Level 300) (Online Gambling and Kids: A Bad Bet) (Magic Innodb Recovery self healing

October 26, 2007

Heikki Tuuri Innodb answers - Part I

Its almost a month since I promised Heikki Tuuri to answer Innodb Questions. Heikki is a busy man so I got answers to only some of the questions but as people still poking me about this I decided to publish the answers I have so far. Plus we may get some interesting follow up questions out of this.

I had added my comments to some of the questions. HT will stand for Heikki Tuuri in the answers and PZ for myself.

Q1: Why did you decide to go with gzip compression instead of implementing “classical” approach of index prefix compression ?

HT: The reason is that index prefix compression prevents one from using a binary search on a B-tree node. We could not compare a search key K to an index record R if we only knew the suffix of R. There are, of course, hybrid approaches where some prefix information is stored, and a search almost as efficient as a binary search can be performed.

But InnoDB’s adaptive hash indexes require that the prefix is stored in every record R. An adaptive hash index tries to guess where our search would end in the B-tree. To verify that we arrived at the right position, we need to know the whole record R.

Also, a gzip compression potentially compresses the database more than just prefix compression. The downside in gzip is increased usage of CPU time.

PZ: We will see how it plays when we can see more benchmarks. The big problems I see so far with this feature are ease of use and requirement to “guess” how your data will compress for the whole table. Among other things compression for data pages and uncompressed index pages can be significantly different.

Q2: Does Innodb has pointers to the next/previous page in index tree leaf pages ?

HT: Yes

PZ: It is great as it can help to make full table scans and index scans more efficient. I can’t wait for ability to create physically sorted indexes with Innodb (with index built by sort) this is when this should be helpful the most.

Q3: Does Innodb secondary Indexes trees are allocated in two extents same as primary key tree or stored in the same extent ?

HT: An InnoDB extent is a contiguous block of 64 pages, 16 kB each. Each B-tree index in InnoDB occupies two SEGMENTs. One segment is for the non-leaf nodes, and the other for leaf nodes. The first 32 pages of a segment are allocated from ‘fragment extents’ where any individual page can be allocated to whatever use in whatever segment. After that, a segment always reserves whole 64 page extents.

Thus, the answer to the question is: for small tables, everything can reside within a single extent. For big tables, a secondary index reserves different extents from the clustered index.

Q4: Does Innodb ever merges sequential pages together if they become almost empty

HT: Yes. If a page becomes less than half full, a merge is attempted.

btr0cur.h:

 /* In the pessimistic delete, if the page data size drops below this limit, merging it to a neighbor is tried */  #define BTR_CUR_PAGE_COMPRESS_LIMIT     (UNIV_PAGE_SIZE / 2) 

Q5: When Innodb free space becomes available for use within same object? When does Innodb make it available for other objects.

HT: Free space can always be recycled within a segment. A segment can release free space to other segments, if a 64 page extent becomes totally empty, and also it can release individual pages from its 32 ‘fragment extent’ pages.

PZ: This is the reason why when you delete a lot of scattered rows you may not see a “free space” in Innodb tablespace to grow significantly.

Q6: Does Innodb policy replacement algorithm takes into account page position in btree index ?

HT: No. It is a pure LRU algorithm.

PZ: This is one of the things which we would like to play with - I would expect serious improvements are possible especially on high IO pressure workloads.

Q7: Does Innodb has any protection from pages being overwritten in buffer pool by large full table scan

HT: No

PZ: Another possible area of optimization. I frequently see batch jobs killing server performance overtaking buffer pool. Though full table scan is only one of replacement policy optimizations possible.

Q8: How do you assess current state of Innodb scalability with multiple threads and multiple concurrent transactions ?

HT: The scalability in my opinion is ‘mediocre’, and we are working on improving it.

PZ: Thank you for honest acknowledgment and I can’t wait to get hold of further improved versions.

Q9: Do you favour database “self tuning” or prefer to expose tuning options to be available to the user.

HT: I favor self tuning.

PZ: I also like self tuning in theory. But I really like to have tools to steer the database behavior in the cases when self tuning gets it wrong.

Q10: When Innodb decides to schedule sequential read-ahead, random read ahead ?

HT: InnoDB schedules a sequential read-ahead when it notices an ascending access pattern that passes an extent boundary. Roughly, if most of the pages in the extent have been accessed, and in an ascending order of the file address, then InnoDB schedules the read of all the 64 pages of the extent where the next index page in the ascending order is. InnoDB has a similar heuristic for descending index scans.

The InnoDB sequential read-ahead is not optimal at all. It should schedule the read of the next extent when we are in the middle of the processing of the previous extent.

Random read-ahead means that when InnoDB notices that if at least 13 pages in an extent have been accessed very recently, then InnoDB reads in all the rest of the pages in the extent.

PZ: Interesting enough, there was some work done by DIKU students improving algorithms used for read ahead and results were significant. They however were not merged in Innodb mainline yet.

Q11: If read-ahead is happening and Innodb needs to access one of the pages being fetched does it has to wait for whole read ahead request to complete

HT: Unfortunately, it often has to wait. A read-ahead of a 64 page extent is often done as a single 1 MB file read.

PZ: “often” a bit stikes me here. I thought it would be always single read, typically 1MB in size. I should also mention a post on this topic once.

Q12: If read-ahead request was placed to the queue but did not start executing yet will innodb has to wait for it to complete or can just go ahead and read page it needs

HT: InnoDB has to wait.

PZ: This one is a bummer. I thought Innodb will make sure to schedule synchronous wait in front if read-ahead IO has not been started already. The interesting question it brings - how many read-ahead requests can be queued at once.

Q13: Is there any IO concurrency control - how many random and sequential read aheads can be happening at the same time

HT: All read-aheads are performed in the Unix version with a single thread. Only one read-ahead can be happening on Unix at a time. On Windows, InnoDB uses native async I/O, and can perform many read-aheads at the same time.

PZ: Another thing I’d like to experiment with. I would expect you can gain significantly with multiple IO threads on systems with large amount of hard drives. Google has Patches which let you to try it out.

Q14: Are there any plans to fix Innodb so it unlocks (or never locks) the rows if they were not matched by query where clause when it was not resolved by index

HT: That is already implemented in 5.1: if you set the transaction isolation level READ COMMITTED, then InnoDB normally does not lock the ‘gaps’, and it does not lock rows whose latest committed version does not match the WHERE condition in a search. Please use this with care: you MUST use row-based binlogging and replication, and remember that the execution is not serializable.

PZ: Cool. This is one of little known new features in MySQL 5.1

Q15: How frequently does Innodb fuzzy checkpointing is activated

HT: InnoDB flushes about 128 dirty pages per flush. That means that under a heavy write load, a new flush and a checkpoint happens more than once per second.

PZ: Not what this answers question exactly. So is it activated as soon as 128 pages must be flushed ? If so how this is discovered when. I think fuzzy checkpointing one of little known aspects of Innodb operations while I see it causing problems every so often.

Q16: How Innodb decided how many pages to flush at each checkpoint interval

HT: Usually it is 128, or less.

PZ: Looks like another magic number to experiment with.

Q17: How InnoBD handles blobs/text fields (needs more info) According to the documentation, InnoDB put first 768 chars of each text/blob in the page and will allocate some space outside of the page.

However: 1. if the total size of the row is less than 8100 bytes InnoDB will not allocate additional space for blobs, even if each blob is larger than 768. 2. InnoDB will allocate additional space outside of the page for _each_ blob (if we have 8 blobs 8xN bytes will be allocated)

Question: How much space InnoDB allocates for each blob outside of the page?

HT: For each column that InnoDB needs to store ‘externally’, it allocates whole 16 kB pages. That will cause a lot of waste of space if the fields are less that 16 kB.

The ‘zip’ source code tree by Marko has removed most of the 768 byte local storage in the record. In that source code tree, InnoDB only needs to store locally a prefix of an indexed column.

PZ: I think it is also very interesting question what happens for blobs larger than 16K - is exact size allocated or also segment based allocation is used.

Q18: Innodb Q: Is Group commit still busted in 5.0/5.1?

HT: Yes. However, work has been done on it by Inaam.

Q19: INNODB uses it’s own internal buffer pool but it’s in conflict with the Linux kernel’s buffer pool. In high load situations where INNODB is using most of the system’s memory the kernel can decide (incorrectly) to swap out MySQL. There are two solutions here. Memlock and O_DIRECT.

Memlock is apparently unstable on Linux. O_DIRECT is an alternative but Linus hates it. What are your suggestions here?

HT: According to tests by Peter Zaitsev, O_DIRECT works well to remove double buffering. In distant future, I would like to have the memlock also.

PZ: memlock already works - it is global MySQL Server option, though it may have problems on Linux if you lock very large portion of physical memory. There is also third alternative - “huge pages” which can be used for buffer pool and which are not swappable. They also allow to reduce number of TLB cache misses. huge pages along however do not stop IO pressure and OS may swap out not buffer pool but some other important parts of MySQL process.

Q20: We’ve migrated to using INNODB so that it operates 100% out of memory. This way every write is serial so that we can see the full write speed of the disk for throughput.

INNODB uses fuzzy checkpointing to commit data to disk. The problem that we’ve seen in high IO scenearios is that it commits too often and we’re only seeing 33% of the raw disk write speed. Since the commit isn’t perfectly serial it’s having to seek on disk which slows down throughput.

If we could control the checkpointing rate we could use an 8G write ahead log and tell innodb to do a full write of the database (basically dump memory to disk) once every minute or two.

This way we’d see a 3x performance boost and it would write at 100MBps vs 33MBps.

Any plans to enable tuning of the checkpointing rate? Postgres exposes this data and allows the user to tune the checkpointing values.

HT: Hmm… we could tune the way InnoDB does the buffer pool flush. I think Yasufumi Kinoshita talked at Users’ Conference 2007 about his patch that makes InnoDB’s flushes smoother and increase performance substantially.

I assume there is lots of room to tune the flushes, since I never optimized the algorithm under a realistic workload.

Making the doublewrite buffer bigger than 128 pages would require a bit more work. Now it is allocated permanently in the system tablespace when an InnoDB instance is created.

PZ: Sequential “dump” of buffer pool would need more optimizations to work - Innodb would also need also to write not-modified pages if this will get sequential write. Say if we have 1MB segment with all pages in buffer pool it well may be faster to do 1 write than flush 10 non sequential pages from it which were modified.

Q21: What’s the status of INNODB in 5.1.x? Specifically: What’s the current status of innodb_thread_concurrency ? Is it suggested to set this value to a LARGE value (somewhat like 200) for additional throughput?

HT: ha_innodb.cc in 5.1:

 static MYSQL_SYSVAR_ULONG(thread_concurrency, srv_thread_concurrency,   PLUGIN_VAR_RQCMDARG,     "Helps in performance tuning in heavily concurrent environments. Sets the max     imum number of threads allowed inside InnoDB. Value 0 will disable the thread t     hrottling.", NULL, NULL, 8, 0, 1000, 0); 

The default is 8, which will work well in most cases. If you are not seeing ‘thread thrashing’ (lots of threads waiting for semaphores in SHOW INNODB STATUS), you can try to disable thread throttling completely by setting the value 0.

But if you see thread thrashing, then a value 1 or 2 often solves your thrashing problem.

PZ: Values 1,2 may solve your thrashing problem but also will limit innodb to be able to use only couple of CPUs efficiently. Thought even that is often better than bad thrashing.

Q22: Also, What’s the deal with group commit being broken in 5.1.x? Is it possible to get the same benefit if you’re performing INSERT with multiple values? We build up INSERTS on our clients and insert values 50 or so at a time.

HT: For most users, group commit being broken in 5.1 makes no difference. If you have a battery-backed disk controller cache, then the commit returns in less than 100 microseconds. And if you do not have, then the only way to get good performance is to set innodb_flush_log_at_trx_commit=2, that is, to flush the log to the disk only once per second.

PZ: We’ve helped number of users who had serve regressions in 5.0 because of group commit being broken. Most do not have RAID with BBU. With good RAID with BBU you can get 2000 fsync/sec or more which is typically enough.

Q23: Rumor has it that if you disable the new XA support (which we don’t use) then you’ll get group commit again.

HT: I do not think so. Sergei Golubchik serialized the MySQL binlog write and the InnoDB log flush with a mutex in 5.1, to implement XA, and switching off the XA will not remove that serialization. MySQL’s binlog and InnoDB’s log must have the transactions in the same order, for a recovery based on MySQL’s binlog to work.

Of course, if you remove that mutex from 5.1, then InnoDB’s group commit works again.

PZ: You can disable binary logging to get Group Commit back. Though this is often not the option. Here are some benchmarks we did.

Q24: INNODB has typically had problems scaling on multicore boxes. The new quad core CPUs from AMD and Intel means that 8-core boxes are going to be common place. We’re considering buying 40 8-core boxes with 32G of memory. Have any specific thoughts here? INNODB was originally written on single core CPUs.

HT: Users have reported lots of InnoDB scalability problems from multicore CPU’s. The most recent 5.0.xx behave better. We will continue the scaling improvements. Latest patches of Yasufumi Kinoshita attained very nice scaling up to 8 or 16 cores. These patches are not in the official InnoDB, though.

PZ: If your load is CPU bound and you’re scaling out you can get better performance by using couple of MySQL Servers on single node. You can make them to use different hard drives and bind to different CPUs.

Q25: In InnoDB, the referenced columns of a foreign key constraint need not form a primary key or a unique constraint: it is sufficient if the referenced columns form a consecutive set of leftmost columns of any index. Can you describe a real-world use case where it would be useful to have a foreign key not reference an entire primary key or unique constraint (something that is mandatory in every other RDBMS that suppors foreign key constraints)? Any concrete examples are appreciated.

HT: Yes, it is sufficient that a foreign key and the referenced key appear as the FIRST (leftmost) columns in an index, in the same order in the foreign key and the referenced key (this same order condition could be relaxed, though). The index is required for fast lookups: it can be used if the columns are the first columns in an index.

I think it is sound design practice to make the foreign key to reference an entire primary key, so that you can easily port your applications to other database brands. Maybe in some denormalized database (for example, if ORDER and ORDERLINE tables are joined to form a single table), it might make sense for a foreign key to reference only a prefix of the primary key.

PZ: Thanks Heikki, Remember we’re waiting you to answer second portion of the questions.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Heikki Tuuri to answer your in depth Innodb questions) (Heikki Tuuri answers to Innodb questions, Part II) (Innodb Recovery - Is large buffer pool always better ?) (Opening Tables scalability) (UC2007 Presentation and Notes
« Previous entries