March 30, 2007

Scoble Show

It”s Friday, and you weren”t going to get anything done anyway, so go watch Robert Scoble interview me for The Scoble Show. The full version is just over an hour. There”s also a 6 minute edited version.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(Testing separating my work life from my private life) (The MicroISV Show) (More Gotchas with MySQL 5.0) (Feature Idea: Finding columns which query needs to access) (A little belated crow to eat on ‘Battlestar’…

Yes, it’s been quiet around here…

It’s amazing how many viruses babies manage to bring into the house. I think we’ve been down with something pretty much continuously since we got back. Fortunately, we haven’t been all sick at once, but this is starting to get a bit old…

Things have also been extremely busy around work. I’ve been working on something that I’m hoping we’ll have at least a mention of at MIX07 and will probably have more to say about later in the year. I realize there are some threads that have been left hanging, so I’m going to try and tie them up in the coming weeks.

And, finally, the mail on my webserver is now working again, so if you’ve wanted to send me feedback on the Contact form and haven’t been able, you’re back in business. Sorry about that!

Related Posts

(VB blogs you should be reading…) (Things will be slower (than usual) around here…) (Exciting times) (Learning from Dave Winer) (An update on VBx…

March 29, 2007

Speaking on RIT-2007, Moscow, Russia

I’ve been invited to speak at RIT-2007 which stands for “Russian Internet Technologies” conference. It takes place 16-17 April in Moscow, Russia. It looks like this is going to be biggest Russian conference on Internet technologies so far with many large Russian Internet projects sending their delegates. You can check out conference sessions schedule here .

Andrew Aksenoff, the author of Sphinx Search Engine will also be giving a talk where.

The day following the conference April 18 I’m giving full day “Master Class”/Training on MySQL Performance.

Even though this post is in English the conference and training will be in Russian language :)

Related Posts

(Speaking on HighLoad Conference, Moscow, Russia) (Back from RIT2007) (HighLoad 2007 Review) (Speaking on MySQL Users Conference 2008) (Nginx Powers Hulu.com ?

March 28, 2007

MySQL Replication and Slow Queries

I just filed a bug regarding slow queries executed by replication thread are not being logged to the slow query log.

This is not a big deal but it is ugly little gotcha which I think few people know about. It is especially bad if you’re using tools to analyze slow query log to find what queries take up the most resources on your server - in certain configurations replication load itself may be very significant contributor.

But even if you do not, as I wrote in the bug report it is quite handy to have this information our where as query times on master and slave can be different in a lot of circumstances.

One more thing to consider - slow replication queries also identify replication lag, ie if you have query which ran 10 seconds on the slave and your replication is well loaded, this means at some point in time your replication lag was at least 10 seconds.

Related Posts

(Figuring out what limits MySQL Replication) (Managing Slave Lag with MySQL Replication) (Microslow patch for 5.1.20) (Making MySQL Replication Parallel) (Enabling/Disabling query log without server restart in MySQL 5.0

March 27, 2007

Microsoft Queuing Frameworks: SQL Service Broker vs. MSMQ

Not quite a knockout, SQL Service Broker beats MSMQ at its own game.

Related Posts

(MS07-065 – Important: Vulnerability in Message Queuing Could Allow Remote Code Execution (937894) - Version:1.3) (SQL Server: Async Lifestyle: Manage Your Tasks With Service Broker) (SQL Server: Async Lifestyle: Manage Your Tasks With Service Broker) (Investigate Notification Services and the Service Broker in SQL Server 2005) (Frameworks, Implementation and Open Problems for the Collaborative Building of a Multilingual Lexical Database

Job listings for India

OK, it”s live! I”ve put up a new version of the job board specifically for jobs in India, jobs.joelonsoftware.co.in.

To help fill it up, posting a job is only $50—about 2200 rupees.

Unlike the huge boards, Monsterindia, Naukri.com, etc., a job posted to Joel on Software will get far fewer resumes, but they will be of much higher quality, and as usual, we”ll return your money if you”re not completely satisfied.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(Localizing the job board) (JOB: PHP / MYSQL - 5 PROGRAMERS NEEDED RIGHT NOW - India) (How to use the NOODP tag to influence your search engine listings) (The importance of page 1 listings and how to get them) (Mysql Goes Eastward, Starts Japanese Operations, Comes To …

Integers in PHP, running with scissors, and portability

Until recently I thought that currently popular scripting languages, which mostly evolved over last 10 years or something, must allow for easier portability across different platforms compared to ye good olde C/C++.

After all, their development started a few decades after C, so its notorious caveats are all well-known and should be easy to avoid when designing a new language, right?

However, PHP just brought me a new definition of “portable” - and that was when working with… integers.

PHP is not able to handle unsigned integers, and converts values over 2^31 to signed. So if your IDs go slightly over 2 billion, and PHP decides to treat them as integers, you’re in trouble.

Oh wait, no - that’s on 32-bit platforms only! PHP int size is platform-dependent, and it seems to be 8 bytes on our 64-bit boxes. Yes, the very same ones where C/C++ int is 4 bytes, you know.

That was the easy part. It was mostly documented.

Now, there’s a function called unpack() which essentially allows to convert different types of data from binary strings to PHP variables. What if you try to unpack unsigned 32-bit big endian integer (format code “N”)? Let’s check the doc:

If you specify a number beyond the bounds of the integer type, it will be interpreted as a float instead.

Having read the doc I personally blatantly relied upon it and expected that large unsigned 32bit numbers would be converted to float, or string, or something, but handled properly. However, a couple or so weeks ago the following notice suddenly appeared:

“Note that PHP internally stores integral values as signed. If you unpack a large unsigned long and it is of the same size as PHP internally stored values the result will be a negative number even though unsigned unpacking was specified.”

How sweet. No, it just could not behave like documented and convert 32-bit unsigned value to float on x32 or keep it integer on x64 - you now suddenly have to care about value size yourself. Ah, and by the way, there’s no official way to know what’s int size.

To make things even better, 5.2.1 introduced a nice bug in unpack(), which f..ed unpacking less-than-16-bit values on x64. (I assume you understand that “f..ed” means “fixed”). It took some time and several tries to convince PHP team that x64 has enough bits to hold 16-bit unpacked value, but thankfully its now acknowledged and assigned.

To summarize, if you need to unpack an unsigned 32bit int from binary stream, you have to:

  • convert it to float or string manually,
  • do that depending on int size on current platform,
  • which can not be done using anything documented,
  • and specifically avoid PHP 5.2.1 on x64.

Most people could probably learn all that, and then use sprintf(”%u”,$id), work with string IDs everywhere, avoid 5.2.1 and be happy.

Unfortunately, my final goal was to have support for 64-bit document IDs…

Let’s do a small time travel. Integer types in C/C++ have always been a pain, but back in 1999 ISO commitee ratified ISO/IEC 9899:1999 standard, also known as ISO C99, which guarantees that “long long int” integer type must be at least 64 bits in size. By now, most compilers support that part perfectly.

However, designers of PHP 5 (released in 2004) type system were either not aware of this change, or decided to not rely on the standard which has been out for “only” 5 years by then, or just thought that 31 (no typo) bits and 640K should be enough for everybody.

Long story short, it’s 2007 now but there’s no native 64-bit integer type in PHP. Let me remind that built-in “int” might be 64-bit, but then again it might be not, and there’s no official way to tell.

This time, there’s a number of routes one could take - either use ints (and pray that the app is never run on x32, and that “platform dependent” size does not change to 4 next version); or use GMP or bcmath extensions if they are available.

Fine, so 99.999% of the world would hit that, compile in bcmath, and be happy again.

Unfortunately, I needed to develop a library which could be deployed in any environment - and still work, and produce reasonable results. The worst case is x32, and neither GMP nor bcmath available.

And this is how the following code was born.

PHP:

  1. /// portably build 64bit id from 32bit hi and lo parts
  2. function _Make64 ( $hi, $lo )
  3. {
  4.     // on x64, we can just use int
  5.     if ( ((int)4294967296)!=0 )
  6.         return (((int)$hi)<<32) + ((int)$lo);
  7.  
  8.     // workaround signed/unsigned braindamage on x32
  9.     $hi = sprintf ( “%u”, $hi );
  10.     $lo = sprintf ( “%u”, $lo );
  11.  
  12.     // use GMP or bcmath if possible
  13.     if ( function_exists(“gmp_mul”) )
  14.         return gmp_strval ( gmp_add ( gmp_mul ( $hi, “4294967296″ ), $lo ) );
  15.  
  16.     if ( function_exists(“bcmul”) )
  17.         return bcadd ( bcmul ( $hi, “4294967296″ ), $lo );
  18.  
  19.     // compute everything manually
  20.     $a = substr ( $hi, 0, -5 );
  21.     $b = substr ( $hi, -5 );
  22.     $ac = $a*42949; // hope that float precision is enough
  23.     $bd = $b*67296;
  24.     $adbc = $a*67296+$b*42949;
  25.     $r4 = substr ( $bd, -5 ) +  + substr ( $lo, -5 );
  26.     $r3 = substr ( $bd, 0, -5 ) + substr ( $adbc, -5 ) + substr ( $lo, 0, -5 );
  27.     $r2 = substr ( $adbc, 0, -5 ) + substr ( $ac, -5 );
  28.     $r1 = substr ( $ac, 0, -5 );
  29.     while ( $r4>100000 ) { $r4-=100000; $r3++; }
  30.     while ( $r3>100000 ) { $r3-=100000; $r2++; }
  31.     while ( $r2>100000 ) { $r2-=100000; $r1++; }
  32.  
  33.     $r = sprintf ( “%d%05d%05d%05d”, $r1, $r2, $r3, $r4 );
  34.     $l = strlen($r);
  35.     $i = 0;
  36.     while ( $r[$i]==“0″ && $i<$l-1 )
  37.         $i++;
  38.     return substr ( $r, $i );         
  39. }
  40.  
  41. list(,$a) = unpack ( “N”, \xff\xff\xff\xff” );
  42. list(,$b) = unpack ( “N”, \xff\xff\xff\xff” );
  43. $q = _Make64($a,$b);
  44. var_dump($q);

For reference, this is what would the equivalent C/C++ snippet look like:

CODE:

  1. typedef unsigned long long myuint64; // just for brevity
  2. unsigned int a = 0xffffffffULL;
  3. unsigned int b = 0xffffffffULL;
  4. myuint64 c = myuint64(a) * myuint64(b);
  5. printf ( “%llu”, c );

Portability in year 2007.

Related Posts

(PHP vs. BIGINT vs. float conversion caveat) (running Apache on Win32?) (Migrating Oracle Application Server Applications to BEA WebLogic Server 9.0: Business Drivers, Important Issues to Consider, and How-To Instructions) (Microslow patch for 5.1.20) (MaxDB Database Backup and Recovery White Paper

PHP Sessions - Files vs Database Based

One may think changing PHP session handler from file based to database driven is fully transparent. In many cases it is, sometimes however it may cause some unexpected problems as happened to one of our customers.

If you use file based sessions PHP will lock session file for whole script execution duration, which means all requests from the same sessions will be serialized on PHP level, which means they also will be serialized for single user on database level. If you change to store PHP sessions in MySQL instead this effect may be no more true and you may have number of requests executing for the same session at the same time. First of course means you may have your session data damaged because you will have lost session variables update from one of the script, in addition however you may run into database related issues of modifying user profile or other user/session related data in parallel, if you do not use transactions or lock tables.

So how you can get back your old file based session behavior with MySQL Sessions ?

If you have dedicated connection to session database and use Innodb tables for your session storage you can start transaction on the session start and use SELECT … FOR UPDATE to lock the session row in the session table for whole request length. On the end of the session the same row is updated and transaction is committed.

If you share session connection with other modules or do not use transactional tables for session you can use GET_LOCK to get same behavior. In the start of the session you can do SELECT GET_LOCK(’‘,10) and in in the end of the request
SELECT RELEASE_LOCK(’‘) where session_id is current session identifier. Note - setting this external lock on session name should be done before session data is read from database for things to work properly.

This approach assumes you do not use GET_LOCK in other places in your application as as soon as it is called second time previous lock is automatically released. The good thing about it however - you can use it as an extra to your current MySQL Sessions system without need to change how it works internally. If you do not use persistent connections you even do not have to release lock - as soon as connection is closed the lock is automatically released.

The value 10 in GET_LOCK is timeout in seconds - if lock can’t be granted for this amount of time it will return “0″ indicating lock was not granted in this case you can select to continue without session or may do something else, like logging error as this generally should not happen in well tuned applications.

Related Posts

(Running A MySQL-Based DNS Server: MyDNS (24 Jan 2006)) (Running A MySQL-Based DNS Server: MyDNS (24 Jan 2006)) (UC2007 Presentation and Notes) (2007 MySQL Conference & Expo Adds New Speakers and Sessions) (Utility Spotlight: ADMX Migrator

All about Google’s new PPA advertising product

Until now, Google has primarily sold pay-per-click (PPC) ads, so-called AdWords ads: advertisers pay when someone on Google or a Google partner site clicks on the ad. Google is testing a new advertising system that allows businesses to advertise on a cost per action basis.

Related Posts

(How to react to Google’s latest AdWords changes) (Are your paying too much for pay per click advertising?) (Google uses a new spider to index web pages) (WhenU: (Software Review)) (What is better: PPC or SEO?

March 26, 2007

Localizing the job board

The job board, jobs.joelonsoftware.com, has been rather successful. For the next upgrade, I was thinking about setting up a separate job board specifically focused on jobs in India, since there are so many Joel on Software readers there and I”ve only seen three jobs (in Bangalore) posted.

What should I do to localize the job board for India? Should I use jobs.joelonsoftware.in or .co.in? Or neither? How much should I charge? Right now the site only accepts credit cards… is there another payment method that would work better for India? Should I allow US employers willing to sponsor H1B visas? Or just jobs actually in India? What other changes should I consider to make a more suitable product for the Indian market?

Post your suggestions here.

Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.

Related Posts

(MySQL AB Adds Dana Evan to its Board of Directors) (Tim O’Reilly to Join MySQL AB’s Board of Directors) (MySQL AB Adds Dana Evan to its Board of Directors) (UseBB 0.7 released) (Job listings for India
« Previous entries