For When You Can't Have The Real Thing
[ start | index | login ]
start > netapp > Disk Arrays

Disk Arrays

Created by dave. Last edited by dave, 8 years and 255 days ago. Viewed 3,009 times. #1
[edit] [rdf]

Lunch And Learn

(6 October 2006)

Had the Network Appliance guys out for lunch today. (Well to be more accurate, they had us out for lunch… but anyways.) They like to do a lunch-and-learn with their customers periodically, and this time we were it. They picked us because we face a lot of customers and have driven a lot of business their way with our recommendations (we like NetApps installed at our customers).

This wasn't a marketing-type meeting, it was more an informal discussion about whatever we wanted to talk about. The guys ended up giving us a detailed description of how the disk writes work on their system so that you can get excellent performance out of synchronized writes while still leaving the disk heads free to seek for random reads. They also talked about how fiber-channel disks differ from SATA disks when you are using them as storage devices.

Things I learned today:

  • While data is written to a RAID4 array, RAID parity is not read on reads. (And practically nobody in the industry reads the parity on a RAID read.) The vendors all rely on the disks themselves to decide if they have a failure (since they all have CRC codes built in at the firmware level), at which point the RAID parity is used to rebuild the failed disks. This means that (most of the time) your reads can be random as you only read the blocks you actually want, you don't have to read the whole stripe and check the parity.
  • SATA disks can read about as fast as FC disks can, but their writes can be effectively three to eight times slower. As a result, mixing FC disks and SATA disks on the same filer head, while possible, is not advised as the SATA disks will probably lower the performance of the FC disks. This is because of the way that the NetApp works at the kernel level.
  • SATA disks make more "heroic" attempts to recover data on bad reads; while good (you stand a better chance of getting your data back) it can also be bad (the disk "vanishes" for up to two minutes at a time -- while not failing!).
  • SATA disks are also more prone to phantom-write syndrome, where the disk claims that the data was written, but it never was. However since the data on the target block was consistent, when you try to read the data back, as far as the disk can tell the correct data was read (because the existing data at the block level was internally consistent) you can get back a block which doesn't belong to your file -- meaning that the file is not corrupted, and the disk isn't registering a problem. The NetApp guys claim that this is a much bigger problem than the industry would have you believe.
  • SATA disks are more prone to catastrophic read failures than FC disks, mostly (but not completely) due to the higher capacities of the disks. So if you get a disk failure on a raid plex, you end up having something like a one-in-ten chance that during the resulting rebuild of the plex to a new disk (during which process you _must_ read _all_ data in the remaining RAID group correctly, or the plex cannot be rebuilt) you will get such an error. A filer-panic stop-everything you-have-lost-data error. BAD.
  • SATA disks in NetApp filers have a reliability expectation that is 95% of the FC disks. Reliability expectations for the same disks on other vendors devices is not as good. NetApp speculates that this is due to the way that their writes happen; because writes are done as a block, and the writes happen in areas near the most recent reads, head stepping is less than on other vendor's devices.
  • Virtual Tape Libraries can have a return on investment period as short as six months -- and this includes having two copies of the data, one in a remote site. This depends on what kind of problem you are solving. For sites with large data churn, the ROE period can be longer, up to two or three years.
  • While their write strategy works well in 95% of the situations that you will encounter, it does mean that their systems are not the best for some applications. Because the writes happen in one synchronous block, it means that those disks cannot read during the time that the write is happening. Which means for applications which demand a high-speed rate of data input (such as real-time non-linear video editing) there is a risk that the application will starve of data while this synchronous write happens.
The rest of the group was sufficiently interested to humor my interest in these details, and I learned a lot that I didn't know. Since I was trained as a computer programmer, I am an algorithm junkie, always interested in understanding problems and the details of the solutions people use to solve those problems. The guys from NetApp are great, and seem to know the details inside the product very well. Mainly the problem we have with new customers is that they look at the price of a NetApp and freak out, comparing that to some home-rolled file server with lots of disk attached. Many go that way, and of those we always get some going to NetApp in the end because of the problems and performance limitations that almost always seem to ensue.
no comments | post comment
This is a collection of techical information, much of it learned the hard way. Consider it a lab book or a /info directory. I doubt much of it will be of use to anyone else.

Useful: | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt