Wednesday, March 24th, 2010 | Author:

Ok, I’ve been sitting on these for a few days and wanted to get them out there. I’ve got an old server that I configured with 4x750GB western digital black SATA drives, and used an LSI2008 controller in raid10 with the default 64k stripe size. It’s a 4 core xeon 5200 series, I believe, with 24G of RAM. The OS for every test was CentOS 5.4 with default virtual memory/sysctl configurations and minimal packages.

The VMs were built with 300GB virtual disks and 4GB of memory.   The KVM guests had disk drivers and cache settings as indicated, and was on ext3 with a QCOW2 image (options were 1MB cluster size, preallocated metadata).  The ESX guest had the pvscsi driver enabled, and a 2MB cluster size was used on the filesystem due to the filesize limitations.

First off, postmark.  For those who don’t know, postmark is a simple, yet decent utility that’s designed to give an idea of small I/O workloads. It’s tunable, but geared toward web/mail server type loads. It creates a ton of small files, then does random operations on them, and spits out the results.

Here’s the config:

set buffering false
set number 100000
set transactions 50000
set size 512 65536
set read 4096
set write 4096
set subdirectories 5
show
run
quit

The primary things I’m interested in here are the KVM virtio performance (specifically writethrough and nocache) compared to ESX4.1 and the native host disks. The IDE driver and writeback tests were done just to see what would happen, but they’re not exactly what I’d prefer to use in production. The KVM virtio driver is nearly on par with native speed when it comes to reads and writes. It falls behind a bit on the actual operations per second, but it must have made up the time somewhere, since the benchmark overall only took a hair longer (if this confuses you, basically the benchmark goes through several stages: creating dirs, creating files, performing transactions, deleting files, deleting dirs. Only the transactions part counts towards the ops/sec number).  The ESX guest didn’t do as well with the pvscsi drivers. In fact, this benchmark alone would be enough to put any of my concerns about virtio performance and choosing KVM vs the tried and true ESX.

Next up, iozone.  This benchmark tries to create a sort of ‘map’ of the disk I/O, by testing various file sizes at various record sizes, creating sort of a matrix.  I regret to say that the read numbers are a bit skewed, as my setup didn’t include an unmountable volume on every system, that’s really the only way to clean out the read caches between tests with this benchmark.  Still, we’ve got some good write numbers and some interesting cache comparisions.

As you can see, the host’s read cache is much faster than the guest’s. Still, some of those guests are posting upwards of 1GB/s, not bad, but it does give us some insight into the overhead of the vm, we’re likely seeing the added latency of fetching the data from cache and passing it through, which can be pretty big when memory speed is measured in ns.

Also of note yet again is the good performance of virtio, and that writethrough and writeback score roughly the same.  The KVM IDE driver didn’t fare so well, in fact in writeback mode it caused the mount to go read-only repeatedly, so I gave up on it. ESX, again, not so good, beating out only the VMs that aren’t using any cache.

Here we see the huge performance boost that a VM using writeback cache can attain, the virtio driver has no problem with cheating.  Now, we’re not talking about storage controller, battery-backup writeback, we’re talking about writes going into the host’s dirty memory and being considered complete.  As such, you had better trust that your host won’t crash or suddenly reboot, or at the least make sure you’ve got snapshots you can roll back to in case of an emergency. You can be fairly certain that your writes will be committed within a minute or so at the worst (check the hosts dirty_expire_centisecs), most likely much sooner unless the host is spending a lot of time in IOWAIT, my point being that if you choose to go this route you can be certain that you’ve at least got a good, recent snapshot if you can get a few minutes away from the latest one before catastrophe.

Here is the same data, with the writeback taken out, so we can get a better look at the rest of the pack.

Not really too much exciting about the sequential graph, except for the  nocache and writethrough VMs being faster than the host. As a guess I’d attribute this to the 1M cluster size on the qcow2 file, i.e. even though we’re writing 4k at a time in the VM, it’s probably writing them in much larger chunks when the writes hit the host. I also did some ‘dd’ tests in each of these systems, but the results were very similar so I’m not going to rehash them.

Random writes… here we actually see ESX perk up a bit and hold its own on 4-8k record sizes. The host is even faster on the low end, and in some respects this random write graph mirrors the iops results from our postmark test if you kind of average together the left half.

In all I must say that I’m fairly pleased with the progress of KVM and it’s I/O performance.

Category: Stuff
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

3 Responses

  1. 1
    mikerj 

    Hey Marcus – good post.

    Is the hardware you used on VMware’s HCL? Curious results because of how much VMware touted the improvement of i/o throughput in version 4. Would be curious to see the same comparison running through FC and SAN.

  2. 2
    admin 

    Hey! How’s it’ going?

    The Chipset, CPU, and LSI2008 controller are all on the HCL

    I think in particular the 2MB cluster size weakened it a bit for my small I/O tests, and its iozone results weren’t too out of whack vs the competition. I’m not versed enough in VMFS to talk in detail about its journaling, or barriers, inode tables, or anything else that might impact performance, but I wouldn’t be surprised at all if they’ve tuned their system to work well with SAN and systems with large amounts of cache. After all, they probably don’t have a lot of large customers using local storage.

    I’ll have to dig up some other individuals’ tests, really I’m interested more in comparing the visualization technologies to native; seeing the overhead, and in that sense this post isn’t fair because I didn’t test the VMFS volume against the native ESX host (is that even possible?). /proc/sys/vm settings can make a big difference, as well as driver versions, and a host of other things, so it’s probably not right to compare the ESX guest to the Linux host and call that overhead.

    Anyway, good to hear from you.

  3. 3
    prometheanfire 

    For testing nocache I’d recomend using lvm as the backing store to get the best performance if you can.

Leave a Reply