Fist In Hand Networks
31Jan/11Off

Welcome To Fist In Hand Networks

Welcome Image

Fist In Hand Networks is a business to business Hosting, Colocation, and Managed Services provider housed at the SVTIX data center in located in San Jose, California.

Email us at helpdesk@fihn.net or give us a call at 408-256-3446 for more information!

Filed under: Business Comments Off
26Feb/110

NexentaStor ZFS Performance Testing

Seagate ST9500620SS hard drives

During the last couple weeks I have been testing a SAN appliance software called NexentaStor. The software uses parts of the the Sun Solaris operating system to build a feature rich and high performance storage system. The underlying ZFS file system can do some pretty neat things and looks to be an excellent contender for replacing costly hardware RAID driven setups.

My goal was a SAN capable of serving iSCSI LUNs to my VMware vSphere virtual infrastructure with a respectable performance for under $6000. Commercial SAN products with a feature set and minimal amount of drive space seem to start at around $10k and only go up sharply from there. The NexentaStor platform, in theory, should allow me to build my own system using specifications best suited for my environment at less upfront cost than commercial products.

I had the opportunity to test 2 different drive setups in the machine before production deployment. The NexentaStor machine has 4 gigE ports to the switch with 2 sets of aggregated ports (2 ports each). There are 2 gigE ports at the VMware host level and are setup to do round robin multi-pathing between available paths. To ensure a good distribution of bandwidth over the links I set the I/O Operation Limit for the LUNs being tested to 3. There is an excellent article explaining this and lots of other great information about VMware iSCSI multipathing at http://virtualgeek.typepad.com/virtual_geek/2009/09/a-multivendor-post-on-using-iscsi-with-vmware-vsphere.html. On with the benchmarks!

The NexentaStor Hardware:

In NexentaStor I am using 2 different pools (volumes) of disks for comparisons. The first pool (pool1) tests 8 x Fujitsu MHZ2160BK-G2 160GB 7200RPM SATA drives in various RAIDZ2 configurations. The same is done for the second pool (pool2) using 8 x Seagate ST9500620SS 500GB 7200RPM SAS drives (which are intended to be the production drives). Each pool has autoexpand and deduplication turned off and sync is set to standard. All tests were done using the iozone command:

iozone -ec -r 32 -s 32768m -l 2 -i 0 -i 1 -i 8

The first tests are done with the NexentaStor host level at the NMC to get an idea of how the overall ZFS pool performs. For the VM tests, we configure a zvol as a iSCSI LUN, and setup multipath aggregated over a total of 2 gigE ports to VMware ESXi. However, you will still only be able to utilize a single link per thread (which the tests expose).

Test numbers:
  1. 7 x Fujitsu MHZ2160BK-G2 in RAIDZ2.
  2. 7 x Seagate ST9500620SS in RAIDZ2.
  3. 8 x Fujitsu MHZ2160BK-G2 (2 groups x 4 drive RAIDZ1).
  4. 8 x Seagate ST9500620SS (2 groups x 4 drive RAIDZ1).
  5. 4 x Western Digital WD1500HLFS (4 drive RAID10).

Interesting to note here is the low Mixed Workload performance and the higher initial write than rewrite. Rewrite over the same data should be slightly higher performing since filesystem metadata is already in place. Now the same tests were run on a VM using iSCSI. Here, the theoretical max performance for the VM should be 100 MB/s (single gigE link).

Again we notice a higher initial write than rewrite for test 1. I reran test 1 and 3 multiple times and the results were the same. The results are closer to what we would expect over a gigE link and the newer Seagate drives show only a little lead in this case. Mixed Workload tests are still very low for what I would consider real world performance. I included a VM test on the local RAID drives in the host (4 x Western Digital VelociRaptors in RAID10) for comparison. The local VM read and write tests can push more single thread bandwidth. But the Mixed Workload performance still suffers worse than the iSCSI tests.

While the Mixed tests are all low, they still do show the relative performance per tests. Here is a graph of just the Mixed Workload tests.

The local VM test still jumps ahead here. But VM performance for test 4 still seems acceptable compared to the other configurations. I am still learning how to interpret these results and may do additional testing with jumbo frames, different types of aggregation, multiple iozone processes, and maybe even a different performance testing software. These tests should give a good performance comparison for others running the iozone benchmarks in NexentaStor.

21Feb/110

Scheduled Downtime – 9:00 PM Thursday, March 3rd

FIHN will be doing a service impacting network reconfiguration at 9:00 PM on Thursday, March 3rd, 2011 to prepare for BGP setup and additional uplink. This maintenance will take the entire network offline for up to 10 minutes while a change is made at my switch and at my providers end to accommodate for the change. These changes will make it possible for me to fail over the network to a second provider in case of an issue with the primary. A BGP fail over in this scenario can usually be detected within a matter of minutes and the route for the failing provider then taken out of the routing table to force traffic out a different provider. The second provider has yet to be determined. If you have any questions about this maintenance please contact me by email or phone. Thanks! Update: Maintenance completed at 9:40 PM. Thanks to Di @ EGN for setting up a second /30 to transfer BGP setup without downtime!
3Feb/1113

WD VelociRaptor Firmware Upgrade / RAID Drive Fallout

I recently experienced a bug with the Western Digital 150GB VelociRaptor drives (WD1500HLFS) and the LSI 9260-4i RAID card. After about 49 days the drives would begin to drop out of the RAID and become 'failed'. Though, upon manual removal and insertion back into the logical RAID volume, the drive would become active and the degraded array would rebuild. Was the drive actually bad? S.M.A.R.T data and Western Digital's own tools showed no signs of drive problems. More strange is that the drive failure happened to 3 different WD1500HLFS in the array within a couple days of each other. Coincidence? Unlikely...

After some more investigation I found a couple forum threads mentioning similar problems with the WD1500HLFS and other VelociRaptors dropping out of RAID arrays (http://forums.anandtech.com/showthread.php?t=327367 and http://forums.pcper.com/showthread.php?t=469557). The latter mentions that the latest firmware 0404v02 didn't resolve the issue. I built a custom ISO image of UBCD with the LSI MegaCLI DOS utility added to it and then remotely logged into the console of the host and remotely booted this custom CD. In UBCD you choose to run the FreeDOS utility and you can then exit the UBCD menu to a FreeDOS command line. I was able to run the following command to see the drive information: MegaCli -pdInfo -a0 -aALL

And what I got back was this information about the drives:

Slot: 0 Data: WD-WX61C10CXXXXWDC WD1500HLFS-01G6U1 Firmware: 04.04V02

Slot: 1 Data: WD-WX61C10CXXXXWDC WD1500HLFS-01G6U1 Firmware: 04.04V02

Slot: 2 Data: WD-WX61C104XXXXWDC WD1500HLFS-01G6U1 Firmware: 04.04V02

Slot: 3 Data: WD-WXC1C100XXXXWDC WD1500HLFS-01G6U3 Firmware: 04.04V05

I was now able to make a helpful correlation. The slots that were failing had version 0404v02! I contacted Western Digital about the issue and they offered a new firmware (WD1500HLFS 0404v06 Firmware) that was supposed to solve the issue.

I was having trouble getting the SATA ports recognized by the FreeDOS UBCD so I had to use another more "traditional" method (yes, Floppy drive). Since I couldn't fit the firmware all on one floppy, I opted for a bootdisk with a RAM drive (Windows 98 SE OEM) so I could use multiple floppies to copy the data over. Once the wd_dnld.exe utility and 0404v06.bin are on the RAM drive you can flash the drives using:

wd_dnld.exe 0404v06.bin

The utility claims it will flash all the drives connected. However, it only actually flashed the first 2 drives. I had to physically move drive in slots 2 and 3 to slots 0 and 1 to complete the flash.

Success! VMware ESXi Health Status now reports drive firmware 0404v06 on all 4 drives. In another 49 days I should know if the problem is resolved :) For anyone else having the same issue I would encourage you to contact Western Digital before flashing the firmware linked above. They explained to me that there can be multiple revisions of drive models which sometimes utilize different firmware. Flasher beware!

Filed under: Tutorials 13 Comments
1Feb/110

Working For FIHN Full Time In 2011

Dear FIHN Friends, Starting February 1st I will be working full time for my Fist In Hand Networks endeavor! This major change allows me to dedicate my full time and attention to your projects and the FIHN infrastructure. I have several FIHN projects that I will be focusing on in 2011. Host your application on FIHN virtual infrastructure and get comparable stability and performance you get with other cloud solutions with the exception business to business support you have come to know and love. Hourly IT work rates in 2011 will start at $75/hour and I will also be announcing base Virtual Machine pricing at a later date. Expect to be contacted in the near future if you have outstanding projects with FIHN that have not yet been completed. Some infrastructure projects that will be a focus in 2011: - New SAN deployment based on NexentaStor allowing shared storage VM deployments - Additional VM host nodes for real time VM host cluster fail over - Additional Internet provider for BGP setup - New web site and blog to outline FIHN services and current events - Your business IT project and hosting needs! Thank you for your continued business throughout the years. FIHN wants to continue providing you excellent IT support and application hosting. Lets make 2011 a great year for both of us!
Filed under: Business No Comments
22Dec/100

Upstream Router Maintenance – Thursday, Dec. 23rd @ 7:00PM

I am passing along this (rather short notice) router maintenance that I received. This maintenance is likely to affect everyone in the 72.13.92 and 68.68.98 subnets. ------------------------------------ Start time: 7:00PM PST 12/23/10 End time: 7:30PM PST 12/23/10 Work order number: SJC2-MLX01-12232010 Expected Outage/Downtime: 5 or less minutes Energy Group Networks customers receiving service in our San Jose data centers will be affected by this maintenance. Customers should expect up to a five minutes service interruption during the maintenance window. Customers may also see some BGP reconvergence during the maintenance window. The reason for this outage is to perform a necessary firmware update to address an bug discovered with current firmware that occurs after 624 days of uptime. This update will also add a few more IPv6 enhancements previously not available. ------------------------------------ Sorry for the late notice. One of my goals in 2011 is to setup a BGP session with multiple providers to better control maintenance windows like this by shifting traffic on the fly. As always, please let me know of any questions. Thank you!
Filed under: Maintenance No Comments
11Nov/090

Switch Upgrade Completed 11/13/09 – 8:37 PM PST

FIHN has completed the switch upgrade maintenance. Everything has been upgraded as planned and the network appears to be running smoothly. We have done our best to test that all VLANs are up and responding correctly. If you are having connection issues please respond to this email or give us a call at 408-256-3446. Thank you!
11Nov/090

11/21/09 Planned Network Maintenance + 11/13/09 Switch Upgrade

FIHN has learned of a planned network maintenance by our upstream provider. Please note that this maintenance is separate from the FIHN switch upgrade happening this Friday, 11/13/09, at 8:00 PM PST. Please let me know if you have any questions about either of these maintenance periods! Maintenance email: Dear Cogent Customer, As a valued customer, Cogent is committed to keeping you informed about any changes in the status of your service with us. This email is to alert you regarding maintenance we will be performing on our network: Start time: 2:00am pacific 11/21/09 End time: 6:00am pacific 11/21/09 Work order number: NA945-16 Expected Outage/Downtime: 20 minutes The purpose of this work is to perform software and hardware upgrades on core routers at key locations. During this maintenance window, you will experience one or more brief interruptions in service while we complete the maintenance activities; the interruptions are expected to last less than 20 minutes total. However, due to the complexity of the work, your downtime may be longer. Customers may also see some reconvergence during the maintenance window. Our network operations engineers closely monitor the work and will do everything possible to minimize any inconvenience to you. If you have any problems with your connection after this time, or if you have any questions regarding the maintenance at any point, please call Customer Support at 1-877-7-COGENT and refer to this Maintenance Ticket: NA945-16. We appreciate your patience during this work and welcome any feedback.
29Oct/090

Core Switch Upgrade On 11/13/09

FIHN is planning a switch upgrade on 11/13/09 at 8:00 PM PST. The maintenance window for this upgrade will be 1 hour. Though the downtime may be less than the entire maintenance window. During this time the network will go down at least once and may be slow for a couple minutes after the initial turn up. For more information about the upgrade please reply to this email. If your services are down for longer than 1 hour please give me a call at 408-256-3446. Thank you for your continued business!
16Oct/090

Network Upgrades Coming Soon

FIHN is planning some network upgrades that will affect you! * Switch Upgrade: In the next coming months FIHN is planning to replace the current core switch with a Foundry FESX448. This new switch allows for gigabit network connections, better handling of broadcast and excessive bandwidth storms, and BGP peering. This switch is a step forward in speed and features and prepares FIHN for the next step in the upgrade - a second bandwidth provider! * Additional Bandwidth Redundancy: After the FIHN switch upgrade there will be a second upstream bandwidth connection added to make FIHN a multi-homed network. This allows FIHN control of outgoing and incoming paths in the even a provider goes down or has a routing issue, create specific outbound and inbound policies (for /24 or larger networks), and provide additional on-demand capacity for users exceeding 100 megabit capacities at times. * Upgrade Timeline: Exact dates are yet unknown. FIHN is still waiting to finalize the second bandwidth connection and configure the new hardware. At the same time additional cable management and minor re-cabling will occur. Step 1: Switch upgrade. Announcement will come soon. Service will be impacted for up to 30 minutes. Step 2: Second provider addition and BGP configuration. Service will be impacted for brief moments (likely multiple sub-minute outages in a 30 minute span). I will send a second email outlining the exact dates and times of each of these upgrades at least 2 weeks before each upgrade. Stay tuned!
Filed under: Uncategorized No Comments