There's nothing faster than RAM
Saturday, July 19. 2008
This lengthy posting will be a rather technical one about a pretty interesting backend performance improvement I've just implemented. So if you're not a relational database freak of some sort, you might as well skip this one and wait for the "new MobMap version is out" posting that'll be here soon (hopefully ;-) but I'm making good progress with the next major release which will include some bugfixes and a truly new feature). But for those who are interested - I found a nice way to increase the performance of the MobMap SQL backend database considerably (like 300-400%) using one of the oldest tricks in the computing world: RAMdisks!
But let me start from the beginning. With MobMap growing in popularity, the amount of submitted data grew considerably, too. That is especially true when there's a WoW patch, but even on normal days there's now around 5 gigabytes of raw XML data being uploaded in multiple tens of thousands of uploads.
All that data has to be parsed, consolidated and finally inserted into a "real" database in order to work with it efficiently. To do this, I've got a specialized Java-based program which is running in some kind of batch processing mode. Uploaded data is being collected in a buffer file which is then being taken by the parser program and processed afterwards. The advantage of this is that I can cope with varying server loads (that is "uploads per minute") very easily since there's no actual processing at the moment of the upload besides some simple data validity checks. But if there's more data being uploaded than the parser can process for a long time, the new data starts to accumulate. That situation was almost reached now, so I had to find some way to improve data processing speeds to keep the whole system running.
I pretty much knew where the main bottlenecks were to be found: Disk I/O was the problem (a classic database bottleneck). I just have two standard hard disk drives in a RAID-1 setup for redundancy, so there's not one of those ultra-fast server disk storage system in place - I just can't afford such a thing. Of course a faster disk subsystem would help considerably, but it just wasn't an option because of the costs.
So I devised a different solution: I know that there are just a few tables which are being updated really often, that's the position database, the loot database and some smaller tables I use for statistical data analysis. By optimizing access to these tables, the system as a whole should be considerably faster. And what would be a better optimization than completely eliminating the Disk I/O dependency of updates/inserts into these tables? Fortunately the size of the tables allowed me to completely store them in RAM, so I set up a RAMdisk (which is extremely simple on Linux!) and moved the MySQL table files there, linking to them from their usual path so MySQL wouldn't notice the change.
The first tests were awesome: more than 300% more queries per second (from ~700 to ~2.500 qps, with peaks at 15.000 qps)! But there was a problem: content in RAMdisks is as volatile as anything else in RAM, and I wouldn't want to loose all that data if the machine crashes or loses power or has to be rebooted for some reason. Besides that, only one of the two databases (german and english databases are split) could be placed in RAM at once because of their size. So I needed some way to move the necessary tables into the RAMdisk while the parser was working and automatically move them back to disk afterwards. Fortunately, god created shell scripts :D A script that would automatically create a RAMdisk, backup the tables, move the tables from disk to RAM (of course after flushing them in MySQL, we don't want to accidentially damage table integrity), put the symlinks in place and undo everything after the parser has finished working was written and put in place. It seems to work fine now (the stability of this solution still has to be proven during the next days and weeks, but I've got a good feeling :) ), and the performance improvement definitely speaks for itself:

That should make the system fast enough to cope with the surge of uploads I expect with the final release of WotLK.
But let me start from the beginning. With MobMap growing in popularity, the amount of submitted data grew considerably, too. That is especially true when there's a WoW patch, but even on normal days there's now around 5 gigabytes of raw XML data being uploaded in multiple tens of thousands of uploads.
All that data has to be parsed, consolidated and finally inserted into a "real" database in order to work with it efficiently. To do this, I've got a specialized Java-based program which is running in some kind of batch processing mode. Uploaded data is being collected in a buffer file which is then being taken by the parser program and processed afterwards. The advantage of this is that I can cope with varying server loads (that is "uploads per minute") very easily since there's no actual processing at the moment of the upload besides some simple data validity checks. But if there's more data being uploaded than the parser can process for a long time, the new data starts to accumulate. That situation was almost reached now, so I had to find some way to improve data processing speeds to keep the whole system running.
I pretty much knew where the main bottlenecks were to be found: Disk I/O was the problem (a classic database bottleneck). I just have two standard hard disk drives in a RAID-1 setup for redundancy, so there's not one of those ultra-fast server disk storage system in place - I just can't afford such a thing. Of course a faster disk subsystem would help considerably, but it just wasn't an option because of the costs.
So I devised a different solution: I know that there are just a few tables which are being updated really often, that's the position database, the loot database and some smaller tables I use for statistical data analysis. By optimizing access to these tables, the system as a whole should be considerably faster. And what would be a better optimization than completely eliminating the Disk I/O dependency of updates/inserts into these tables? Fortunately the size of the tables allowed me to completely store them in RAM, so I set up a RAMdisk (which is extremely simple on Linux!) and moved the MySQL table files there, linking to them from their usual path so MySQL wouldn't notice the change.
The first tests were awesome: more than 300% more queries per second (from ~700 to ~2.500 qps, with peaks at 15.000 qps)! But there was a problem: content in RAMdisks is as volatile as anything else in RAM, and I wouldn't want to loose all that data if the machine crashes or loses power or has to be rebooted for some reason. Besides that, only one of the two databases (german and english databases are split) could be placed in RAM at once because of their size. So I needed some way to move the necessary tables into the RAMdisk while the parser was working and automatically move them back to disk afterwards. Fortunately, god created shell scripts :D A script that would automatically create a RAMdisk, backup the tables, move the tables from disk to RAM (of course after flushing them in MySQL, we don't want to accidentially damage table integrity), put the symlinks in place and undo everything after the parser has finished working was written and put in place. It seems to work fine now (the stability of this solution still has to be proven during the next days and weeks, but I've got a good feeling :) ), and the performance improvement definitely speaks for itself:

That should make the system fast enough to cope with the surge of uploads I expect with the final release of WotLK.

