September 28, 2007

Heikki Tuuri to answer your in depth Innodb questions

Have you ever had a question about Innodb internal design or behavior which is not well covered in MySQL manual ? I surely had. Now you have a great chance to have them answered !

Heikki Tuuri, Creator of Innodb will answer your Questions about Innodb at MySQL Performance Blog.

Please leave your questions as comments to this post by 5th of October and I will pass them to Heikki to reply merging with questions I have myself.

Note: due to Oracle policies Heikki will likely be unable to answer your questions about Innodb new features or timetables.


Entry posted by peter | 4 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(ASP Fractal) (ASP Fractal) (Heikki Tuuri answers to Innodb questions, Part II) (Heikki Tuuri Innodb answers - Part I) (Ask me something in Japanese, please…

Using VIEW to reduce number of tables used

Many Open Source software solutions use database per user (or set of tables per user) which starts to cause problems if it is used on massive scale (blog hosting, forum hosting etc), resulting of hundreds of thousands if not millions of tables per server which can become really inefficient.

It is especially inefficient with Innodb tables both in terms of space (some tables would keep only couple of small rows, but require at least 16K page in Innodb), keeping all tables open in Innodb dictionary and number of other challenges in IO management and recovery. For MyISAM it works better but still overhead can get significant because table_cache can”t be made large enough and so a lot of table reopens needs to happen which requires table header modification, which is costly.

Of course if you can simply rewrite software to store multiple users per table it is best way to go, however quite typically this is way too much work and also requires constant patches as new software versions come out.

It is very tempting to use VIEWs to reduce number of tables dramatically - merging say 1000 of users to the same table as VIEWs are significantly less expensive and cheap to “open”.

How this could work ?

Say you have “post” tables which keeps blog posts and contains id, title, body columns (to keep it simple). There are one table per user so we have post123 table which keeps posts for user number 123 etc.

Converting it to the views we can have “post” table which has user_id,id, title,body columns and create post123 as view:

SQL:
  1. CREATE VIEW post123 AS SELECT id,title,body FROM post WHERE user_id=123 WITH CHECK OPTION;

This would give us post123 containing only posts for users for user=123 which is exactly what we”re looking for. WITH CHECK OPTION is used to ensure we do not insert data in the view which will be invisible.

The VIEW approach works for SELECT, UPDATE, DELETE queries but not for INSERT:

SQL:
  1. mysql> INSERT INTO post123 VALUES(1,“aaa”,“bbb”);
  2. ERROR 1369 (HY000): CHECK OPTION failed “test.post123″

This happens because as we do not pass user_id value to underlying table MySQL tries to set it to default, which does not work as it has to be different for each of the views. You could think MySQL would look at the WHERE clause to figure out which value should be used but unfortunately it does not work this way.

What would be good to have is to specify different default fields for different views, though I would not expect it to happen as I do not think standard defines anything like it.

The other alternative would be to allow triggers on views, specially INSTEAD OF triggers so you could write triggers to perform insert inserts to base tables instead of views. BEFORE INSERT triggers would not help because the column user_id is not part of the view so would not be available even if MySQL would support triggers on VIEWs (which it does not)

But OK. That is all ideas - what can you do now ?

The obvious option is to go over the code and change inserts so they go in the base table - generally there should be only few places when insert happens so it should not be big problem.

Technically if you can”t make inserts to the base table but can have them going to the different table (typically would be very easy change) say “insertpost123″ instead of “post123″ you can create these insert tables as blackhole and define a triggers on them to update base tables, which will make rows available in the views. As the storage engine is BlackHole BEFORE triggers will act the same as INSTEAD OF triggers.

A bit ugly but can be helpful.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(Count the Number of Rows in Two Tables Using a Single Query) (Find the Total Number of Tables in an MS Access Database) (Implementing efficient counters with MySQL) (COUNT(*) for Innodb Tables) (SchemaSpy 2.0.0 released

Innodb usability and ease of use.

It always surprised me how little Innodb team seems to think about product usability/ease of use, when it comes to settings, performance management etc.

I could understand many things 5 years ago, like a lot of information being available only in hard to parse SHOW INNODB STATUS output or even uglier hacks with creating tables such as innodb_lock_monitor to get more detailed information free space specified in table comments (which need to be parsed) etc. 5 years ago Heikki was along and he had a lot to do to make things work well so a lot of these things were just done quick and dirty way.

It is however hard for me to understand why so many years later with significantly increased team not only many of these things remain unfixed but things are still done similar way ?

Other the years variables like innodb_thread_concurrency were added with rather complicated history of changes for meaning of the values, which seems to now settled to more or less understandable with value 0 meaning disabling Innodb internal thread queuing.

Another one is innodb_flush_logs_at_trx_commit - initially it had only values 0 and 1 which was more or less obvious as 0 typically means “No” and 1 “Yes” in many MySQL options. When Value 2 was added which actually has a meaning between what 0 and 1 mean which is very hard to understand.

Proper way would be of course to use some string values which are more self explanatory so at least you can”t mix what do you currently have set in your my.cnf file - For example using values “none”, “disk”, “os” instead of 0,1,2 will be more explanatory.

Furthermore MySQL even has infrastructure to support both string and integer values, which would allow to preserve compatibility. Look for example on query_cache_type variable which can be set to ON/OFF/DEMAND as well as to 0 and 1.

Interesting enough in some cases Innodb team does get things right. innodb_flush_method variable does not use value 0,1,2,3,4,6 which you have to lookup in the manual each time, but it uses more understandable values such as O_DIRECT, O_DSYNC, fdatasync etc.

However Look at new variable innodb_autoinc_lock_mode freshly added in 5.1.22 (which is what motivated me to write this post) - looking at the manual we again get values 0,1,2 which have to be translated to “traditional”, “consecutive” and “interleaved” ?

Using string values would make things much more friendly with my.cnf and “show variables” output being more obvious and self documenting.


Entry posted by peter | 5 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Related Posts

(How Innodb flushes data to the disk ?) (COUNT(*) for Innodb Tables) (More about elevators) (Innodb Second Start prevention bug ?) (InnoDB auto-inc scalability fixed

September 27, 2007

Increase Your Reach and Your Market with the Microsoft® Stack

When choosing a development platform for your product, it’s time to look beyond the typical desktop deployment. Because of tight stack integration, the Microsoft® platform offers market opportunities—and the ability to deliver new functional capabilities and the participation in a strong partner ecosystem—you wouldn’t have previously considered. Here’s how you can extend the reach of your code into new markets and what you, the software developer, need to know to go there.

Related Posts

(Oracle Users Indicate Increase in Use of Open Source) (Search engine market shares in Europe) (How to increase your sales with the right web page titles) (MySQL K.K. to Strengthen its Presence in the Japanese Database Market) (TechRepublic Real World Guide: Keeping Pace with Enterprise Portal Trends

Ten Security Steps You Need to Implement Today

Picking the right security software solution may seem a daunting task, but it isn’t the only thing you should be doing to keep your data and infrastructure safe. These ten tips to increase your IT security should be on the top of your to-do list every day—or it won’t matter what security software you deploy in your enterprise.

Related Posts

(Multiple Languages Surfaced Via a Cube Using SAS 9.1 OLAP Server) (The Four HTML Optimization Steps) (Serving The Web: Nine Tips to Enhance IIS Security) (TechNet Webcast: SQL Server 2005 Security Enhancements (Level 200)) (SQL Server: Reduce Downtime: Implement SQL Server 2000 On A Cluster

September 26, 2007

Create Scalable Semantic Applications with Database-Backed RDF Stores

Store your RDF triples in a database for faster performance and greater scalability.

Related Posts

(The influence of semantic indexing on your search engine rankings) (DB2 System Catalog Views: Everything You Need to Know About Your DB Objects) (WordPress 2.0 is better than ever) (WordPress 2.0 is better than ever) (MySQL AB & NitroSecurity to Jointly Develop Database Storage Engine

Explaining the Excel Bug

By now you”ve probably seen a lot of the brouhaha over a bug in the newest version of Excel, 2007. Basically, multiplying 77.1*850, which should give you 65,535, was actually displaying 100,000.

Before I try to explain this, I should disclose that I did work on the Excel team, but that was thirteen years ago. I haven”t been there for a long time. I don”t even think I know anyone on that team any more. I”m just trying to explain the bug a little bit as a public service.

The first thing you have to understand is that Excel keeps numbers, internally, in a binary format, but displays them as strings. For example, when you type 77.1, Excel stores this internally using 64 bits:

0100 0000 0101 0011 0100 0110 0110 0110 0110 0110 0110 0110 0110 0110 0110 0110

The display is showing you four characters: “7″, “7″, “.”, and “1″.

Somewhere inside Excel is a function that converts binary numbers to strings for displaying. This is the code that has the bug that causes a few numbers which are extremely close to 65,535 to be formatted incorrectly as 100,000.

If you use the number further along in calculations, for example, if you add 2 to the results, you”ll get the right thing.

=77.1*850 -> displays 100000

=77.1*850+2 -> displays 65537, correctly.

Just to throw people off, this bug also exists for a few numbers which are extremely close to 65,536. They display incorrectly as 100,001.

=77.1*850+1 -> displays 100,001, incorrectly.

This is still only a bug in the number formatting code; if you try to make a chart with that number in it, you”ll get a correct chart.

Now… you may have noticed that I said that this bug exists for numbers which are extremely close to 65,535, but not for 65,535 itself. Indeed if you enter 65,535 you see 65,535. But, you notice, 77.1 * 850 should be exactly 65,535, not extremely close to 65,535!

Look closely at the binary representation for 77.1:

0100 0000 0101 0011 0100 0110 0110 0110 0110 0110 0110 0110 0110 0110 0110 0110

See how there”s a lot of 0110 0110 0110 there at the end? That”s because 0.1 has no exact representation in binary… it”s a repeating binary number. It”s sort of like how 1/3 has no representation in decimal. 1/3 is 0.33333333 and you have to keep writing 3″s forever. If you lose patience, you get something inexact.

So you can imagine how, in decimal, if you tried to do 3*1/3, and you didn”t have time to write 3″s forever, the result you would get would be 0.99999999, not 1, and people would get angry with you for being wrong.

The same thing happens in binary with  numbers ending in 0.1: they are repeating decimals, so when you do mathematical operations on them, very small insignificant errors creep in somewhere way to the right of the decimal point. (PS: same for .2, .3, .4, .6, .7, .8, and .9, but not .5).

The IEEE has a standard, IEEE 754, for how to represent floating point numbers in binary, and this is what almost everybody uses, including Excel, and they have for a really long time, and it means sometimes you get imprecise results when you add a lot of 0.1″s together, but if you”re rounding the numbers to a reasonable number of decimal points, you won”t really care.

Back to the Excel bug, which is a genuine bug, not just an artifact of this IEEE 754 stuff. Since 77.1 has no exact representation, Excel stores it as

0100 0000 0101 0011 0100 0110 0110 0110 0110 0110 0110 0110 0110 0110 0110 0110

and then when you try to multiply it by 850, you get something very close to 65,535, but not exactly 65,535, because of the fact that 77.1 wasn”t stored exactly because that would take infinite memory. And this number, which is very close to 65,535, happens to be one of only 12 possible floating point numbers which trigger this bug in Excel.

OK, Q&A.

Q: Isn”t this really, really bad?

A: IMHO, no, the chance that you would see this in real life calculations is microscopic. Better worry about getting hit by a meterorite. Microsoft, of course, will be forced to tell everyone “accuracy is extremely important to us” and I”m sure they”ll have a fix in a matter of days, and they”ll be subjected to all kinds of well-deserved ridicule, but since I don”t work there I”m free to tell you that the chance of this bug actually mattering to you as an individual is breathtakingly small.

Q: Shouldn”t they be testing for these kinds of things?

A: I”ll bet that most of the numeric testing done on the Excel team is done automatically with VBA code. Cells containing this value display as 100,000, but from VBA, they”re going to look like 65,535 (since the number would be passed into the Basic runtime in binary, before the display formatting.) I”m sure there”s plenty of code to test display formatting, but with a bug like this that only happens on 12 out of 18446744073709551616 possible floating point binary numbers, it”s unlikely that any set of black-box tests would cover this case.

Q: What caused the bug?

A: I”m not sure exactly, since I don”t have the code. Off the top of my head, I can”t think of anything that would cause this behavior. Play around with Quanfei Wen”s IEEE-754 calculator, maybe you”ll find something.

Q: Why not use “exact” (decimal) arithmetic?

A: It”s much slower than floating point arithmetic, since there”s no hardware on your CPU chip to do it for you natively.

Over the years, Microsoft got so much heat for floating point rounding artifacts in the Windows Calculator that they rewrote it to use an arbitrary-precision arithmetic library. Since you have to poke at Windows Calculator with a stick, it doesn”t have to be as fast as Excel. That said, CPUs have gotten pretty fast. I”ll bet an arbitrary-precision version of Excel would perform pretty well these days. Still, the Microsoft Excel support team has spent the last 20 years defending IEEE 754, and it”s not surprising that they”ve started to believe in it.

And let”s face it — do you really want the bright sparks who work there now, and manage to break lots of perfectly good working code — rewriting the core calculating engine in Excel? Better keep them busy adding and removing dancing paper clips all day long.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(Enterprise Reporting with Excel) (Microsoft Security Advisory (947563): Vulnerability in Microsoft Excel Could Allow Remote Code Execution) (How to Display ASP.NET DataGrid Data in Excel) (Blog Tales: Introduction to Excel XML) (Blog Tales: Introduction to Excel XML

Unfocused and Unabashed

“I didn”t relish the carpal tunnel syndrome that would result from signing all these forms. We tried to “sign” them by running the forms through the laser printer again to print the signature in the right place. But, as I learned, something traumatic happens to paper on its way through a printer. Once a sheet of paper has been through a laser printer, the next time you try to print on it, it”s going to fight back, jamming the printer and resulting in the deaths of three other pages, and you”re going to spend five minutes with a putty knife cleaning up the bloodshed.”

How Hard Could It Be?: Unfocused and Unabashed

PS: This article will appear in the October issue of Inc. Magazine; it”s the first installment of what will be a monthly column, mostly about business of software startups. The column is called “How Hard Could It Be?” As an entrepreneur I”ve really enjoyed this magazine over the years and I”ve learned a lot, so I”m honored to be a columnist. They will be publishing each column on their website, and I”ll link to it as soon as they do so, but you may also want to subscribe to the print magazine; it”s less than $10 a year and well worth it.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(No related posts) 

InnoDB auto-inc scalability fixed

There was long played scalability issue with InnoDB auto-increment field. For details check Bug 16979. In short words the problem is in case of insert into table with auto-increment column the special AUTO_INC table level lock is obtained, instead of usual row-level locks. With many concurrent inserted threads this causes serious scalability problems, and in our consulting practice we had a lot of customers who was affected by InnoDB auto-inc. For several of them we even advised to replace auto-inc column by that or another solution. Good news is the bug is fixed. Bad news is it is fixed only 5.1.22, which is not released yet. I wonder if the fix is going to be ported to 5.0, as I mentioned it affected many production systems and not all of them are ready to upgrade to 5.1. The interesting also is the fix introduces new system variable innodb_autoinc_lock_mode which determines behavior of InnoDB for tables with autoinc. I do not want to copy-paste MySQL documentation, the very good and informative description of the problem and solution is available here.


Entry posted by Vadim | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks |

Related Posts

(Opening Tables scalability) (Bug fix of InnoDB scalability problem) (MySQL/Innodb scalability tests after fix) (Hacking to make ALTER TABLE online for certain changes) (MySQL Users Conference - Innodb

September 25, 2007

MySQL Conference & Expo 2008 ‘Call for Participation’ Opens!

The sixth annual MySQL Conference & Expo, co-presented by MySQL AB and O’Reilly Media, will take place April 15-18, 2008 in Santa Clara, California. The event is expected to bring over 1,500 open source and database users together to harness the power of MySQL and celebrate the large and active MySQL ecosystem. The call for participation is now open for prospective speakers — submissions will be accepted until October 30, 2007.

To submit a proposal or learn more about the conference, please visit www.mysqlconf.com.

Related Posts

(MySQL Users Conference 2006 - Call for Participation) (Program Unveiled for 2008 MySQL Conference & Expo — Registration is Now Open!) (YouTube, Flickr, and Wikipedia to Share their Secrets of Success at the 2007 MySQL Conference & Expo) (Visionary Keynote Speakers Announced for MySQL Conference & Expo 2008) (MySQL K.K. Opens Registration for the MySQL Users Conference, Japan 2007
« Previous entries