Fist In Hand Networks
26Feb/111

NexentaStor ZFS Performance Testing

Seagate ST9500620SS hard drives

During the last couple weeks I have been testing a SAN appliance software called NexentaStor. The software uses parts of the the Sun Solaris operating system to build a feature rich and high performance storage system. The underlying ZFS file system can do some pretty neat things and looks to be an excellent contender for replacing costly hardware RAID driven setups.

My goal was a SAN capable of serving iSCSI LUNs to my VMware vSphere virtual infrastructure with a respectable performance for under $6000. Commercial SAN products with a feature set and minimal amount of drive space seem to start at around $10k and only go up sharply from there. The NexentaStor platform, in theory, should allow me to build my own system using specifications best suited for my environment at less upfront cost than commercial products.

I had the opportunity to test 2 different drive setups in the machine before production deployment. The NexentaStor machine has 4 gigE ports to the switch with 2 sets of aggregated ports (2 ports each). There are 2 gigE ports at the VMware host level and are setup to do round robin multi-pathing between available paths. To ensure a good distribution of bandwidth over the links I set the I/O Operation Limit for the LUNs being tested to 3. There is an excellent article explaining this and lots of other great information about VMware iSCSI multipathing at http://virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html. On with the benchmarks!

The NexentaStor Hardware:

In NexentaStor I am using 2 different pools (volumes) of disks for comparisons. The first pool (pool1) tests 8 x Fujitsu MHZ2160BK-G2 160GB 7200RPM SATA drives in various RAIDZ2 configurations. The same is done for the second pool (pool2) using 8 x Seagate ST9500620SS 500GB 7200RPM SAS drives (which are intended to be the production drives). Each pool has autoexpand and deduplication turned off and sync is set to standard. All tests were done using the iozone command:

iozone -ec -r 32 -s 32768m -l 2 -i 0 -i 1 -i 8

The first tests are done with the NexentaStor host level at the NMC to get an idea of how the overall ZFS pool performs. For the VM tests, we configure a zvol as a iSCSI LUN, and setup multipath aggregated over a total of 2 gigE ports to VMware ESXi. However, you will still only be able to utilize a single link per thread (which the tests expose).

Test numbers:
  1. 7 x Fujitsu MHZ2160BK-G2 in RAIDZ2.
  2. 7 x Seagate ST9500620SS in RAIDZ2.
  3. 8 x Fujitsu MHZ2160BK-G2 (2 groups x 4 drive RAIDZ1).
  4. 8 x Seagate ST9500620SS (2 groups x 4 drive RAIDZ1).
  5. 4 x Western Digital WD1500HLFS (4 drive RAID10).

Interesting to note here is the low Mixed Workload performance and the higher initial write than rewrite. Rewrite over the same data should be slightly higher performing since filesystem metadata is already in place. Now the same tests were run on a VM using iSCSI. Here, the theoretical max performance for the VM should be 100 MB/s (single gigE link).

Again we notice a higher initial write than rewrite for test 1. I reran test 1 and 3 multiple times and the results were the same. The results are closer to what we would expect over a gigE link and the newer Seagate drives show only a little lead in this case. Mixed Workload tests are still very low for what I would consider real world performance. I included a VM test on the local RAID drives in the host (4 x Western Digital VelociRaptors in RAID10) for comparison. The local VM read and write tests can push more single thread bandwidth. But the Mixed Workload performance still suffers worse than the iSCSI tests.

While the Mixed tests are all low, they still do show the relative performance per tests. Here is a graph of just the Mixed Workload tests.

The local VM test still jumps ahead here. But VM performance for test 4 still seems acceptable compared to the other configurations. I am still learning how to interpret these results and may do additional testing with jumbo frames, different types of aggregation, multiple iozone processes, and maybe even a different performance testing software. These tests should give a good performance comparison for others running the iozone benchmarks in NexentaStor.

21Feb/110

Scheduled Downtime – 9:00 PM Thursday, March 3rd

FIHN will be doing a service impacting network reconfiguration at 9:00 PM on Thursday, March 3rd, 2011 to prepare for BGP setup and additional uplink. This maintenance will take the entire network offline for up to 10 minutes while a change is made at my switch and at my providers end to accommodate for the change. These changes will make it possible for me to fail over the network to a second provider in case of an issue with the primary. A BGP fail over in this scenario can usually be detected within a matter of minutes and the route for the failing provider then taken out of the routing table to force traffic out a different provider. The second provider has yet to be determined. If you have any questions about this maintenance please contact me by email or phone. Thanks! Update: Maintenance completed at 9:40 PM. Thanks to Di @ EGN for setting up a second /30 to transfer BGP setup without downtime!
3Feb/1114

WD VelociRaptor Firmware Upgrade / RAID Drive Fallout

I recently experienced a bug with the Western Digital 150GB VelociRaptor drives (WD1500HLFS) and the LSI 9260-4i RAID card. After about 49 days the drives would begin to drop out of the RAID and become 'failed'. Though, upon manual removal and insertion back into the logical RAID volume, the drive would become active and the degraded array would rebuild. Was the drive actually bad? S.M.A.R.T data and Western Digital's own tools showed no signs of drive problems. More strange is that the drive failure happened to 3 different WD1500HLFS in the array within a couple days of each other. Coincidence? Unlikely...

After some more investigation I found a couple forum threads mentioning similar problems with the WD1500HLFS and other VelociRaptors dropping out of RAID arrays (http://forums.anandtech.com/showthread.php?t=327367 and http://forums.pcper.com/showthread.php?t=469557). The latter mentions that the latest firmware 0404v02 didn't resolve the issue. I built a custom ISO image of UBCD with the LSI MegaCLI DOS utility added to it and then remotely logged into the console of the host and remotely booted this custom CD. In UBCD you choose to run the FreeDOS utility and you can then exit the UBCD menu to a FreeDOS command line. I was able to run the following command to see the drive information: MegaCli -pdInfo -a0 -aALL

And what I got back was this information about the drives:

Slot: 0 Data: WD-WX61C10CXXXXWDC WD1500HLFS-01G6U1 Firmware: 04.04V02

Slot: 1 Data: WD-WX61C10CXXXXWDC WD1500HLFS-01G6U1 Firmware: 04.04V02

Slot: 2 Data: WD-WX61C104XXXXWDC WD1500HLFS-01G6U1 Firmware: 04.04V02

Slot: 3 Data: WD-WXC1C100XXXXWDC WD1500HLFS-01G6U3 Firmware: 04.04V05

I was now able to make a helpful correlation. The slots that were failing had version 0404v02! I contacted Western Digital about the issue and they offered a new firmware (WD1500HLFS 0404v06 Firmware) that was supposed to solve the issue.

I was having trouble getting the SATA ports recognized by the FreeDOS UBCD so I had to use another more "traditional" method (yes, Floppy drive). Since I couldn't fit the firmware all on one floppy, I opted for a bootdisk with a RAM drive (Windows 98 SE OEM) so I could use multiple floppies to copy the data over. Once the wd_dnld.exe utility and 0404v06.bin are on the RAM drive you can flash the drives using:

wd_dnld.exe 0404v06.bin

The utility claims it will flash all the drives connected. However, it only actually flashed the first 2 drives. I had to physically move drive in slots 2 and 3 to slots 0 and 1 to complete the flash.

Success! VMware ESXi Health Status now reports drive firmware 0404v06 on all 4 drives. In another 49 days I should know if the problem is resolved :) For anyone else having the same issue I would encourage you to contact Western Digital before flashing the firmware linked above. They explained to me that there can be multiple revisions of drive models which sometimes utilize different firmware. Flasher beware!

Filed under: Tutorials 14 Comments
1Feb/110

Working For FIHN Full Time In 2011

Dear FIHN Friends, Starting February 1st I will be working full time for my Fist In Hand Networks endeavor! This major change allows me to dedicate my full time and attention to your projects and the FIHN infrastructure. I have several FIHN projects that I will be focusing on in 2011. Host your application on FIHN virtual infrastructure and get comparable stability and performance you get with other cloud solutions with the exception business to business support you have come to know and love. Hourly IT work rates in 2011 will start at $75/hour and I will also be announcing base Virtual Machine pricing at a later date. Expect to be contacted in the near future if you have outstanding projects with FIHN that have not yet been completed. Some infrastructure projects that will be a focus in 2011: - New SAN deployment based on NexentaStor allowing shared storage VM deployments - Additional VM host nodes for real time VM host cluster fail over - Additional Internet provider for BGP setup - New web site and blog to outline FIHN services and current events - Your business IT project and hosting needs! Thank you for your continued business throughout the years. FIHN wants to continue providing you excellent IT support and application hosting. Lets make 2011 a great year for both of us!
Filed under: Business No Comments