A savvy web hosting customer knows the importance of a storage solution that allows users to access and interact with their data quickly. In Part 1 of Evaluating Disk Storage, we talked about the benefits and drawbacks of different types of drives. But to assess the reliability and performance of an enterprise storage solution, it is important to know both the type of drive as well as the RAID configuration.
RAID arrays and controllers allow for much higher capacity, performance, and reliability than individual drives. And some of the surprising benefits and limits of each type of drive in a RAID array dictate the optimal choice in any given application.
What is RAID?
RAID started out as “Redundant Array of Inexpensive Disks” and was reworked to “Redundant Array of Independent Disks.” RAID arrays were originally described by the RAID class of the array (RAID 0 through 6), and then hybrid classes were added (RAID 0+1, RAID 10 (1+0), RAID 50 (5+0), RAID 60 (6+0)). Later, the RAID Advisory Board went ahead and specified 3 classifications of RAID: Failure-resistant, Failure-tolerant, and Disaster-tolerant. But almost every manufacturer has stuck with the original RAID classes, because they best describe exactly the benefits and tolerance of the array in a single phrase.
Countless articles cover the definitions of RAID configurations (Wikipedia has a great article here), so we are going to focus on the ‘why’ behind storage decisions. The strengths and weaknesses of the configuration let you manage and mitigate risk. RAID can let you get better performance from an array than you can get from any one drive alone. RAID can let you survive the loss of one or more drives in an array. RAID can let you survive power and controller failures. But the trade-offs and the details of implementation have to be carefully controlled to make sure that new types of failures aren’t introduced.
Here are the six main goals for server RAID design:
- to provide the best performance possible for the workload.
- to be able to resist and prevent data loss from a drive failure.
- to provide continuous access to data during and while recovering from a drive failure.
- to prevent data loss and provide access to data even during a hardware failure.
- to provide as much available drive space as possible while meeting the earlier constraints.
- to provide sufficient monitoring to know when a drive is approaching failure, to be able to respond before it completely fails and know definitively when it fails.
You can use features of the RAID controller to trade off disk space, array performance, data redundancy, and the ability to rebuild the array.
In our last article we covered the performance of a single drive, whether it is SAS, SATA or SSD. Now we’ll look at what happens when you have multiple drives. It breaks into three performance figures; best case, worst case, and average case. And it involves a small amount of probability and statistics.
The number of allowed drives in a RAID array is limited by the characteristics of the controller. But for our purposes, we’ll look at the common configurations ServInt uses.
Two drives: Appended vs. Mirrored
When you have two drives, there are a couple of ways to put them together. You can append the two drives together and get twice the space, but your performance numbers are a bit odd. Best case, if you have two reads and one hits each drive, you get twice the read performance.
Same thing applies to the write performance. But if the drives are appended, you have to have one write on the first drive and one on the second to gain this benefit. This is a bit unlikely.
You can spread the data more evenly between the drives to increase your chances of two sequential writes hitting different drives by breaking the data into smaller chunks and putting the odd numbered chunks on the first drive and the even chunks on the second drive. Practical testing with real operating systems and real applications will tell you what chunk size gives you the best read and write performance. (This gives you a RAID parameter known as the stripe size.)
But, with the drives appended this way you have no redundancy. If you lose one drive, you lose half of your data. And in most applications, that’s a fatal problem.
The other option with two drives is to mirror the data between the two drives. You get no space benefit versus a single drive, but your performance numbers are better than with one drive. You are guaranteed that if you have two reads, you can read one from each drive and get twice the read performance. But if you have two writes, you have to write to both drives at once, which gives you the same write performance as one drive.
With the drives mirrored this way, you can lose one drive and still retrieve all of your data.
Now, with a RAID controller, you can add a memory cache on the controller and re-order reads and writes to boost mirrored write performance to a level commensurate with the amount of memory cache on the controller.
In this setup, 2 SAS drives will beat 2 SATA drives, assuming that your performance needs exceed those of 2 SATA drives.
The four drive configuration is a bit more complicated, but still fairly similar to the two-drive configuration. The best case will give you twice the performance of the two-drive setup. SAS will still beat out SATA at the high end of the performance curve.
Here’s where things get interesting. For a lot of controllers, the price-point allows controlling four SAS drives or six SATA drives. Six SATA drives give you better performance than four SAS drives if your load is ‘statistically varied’ enough. (Translation: if your load has enough independent operations to keep the drives busy.)
Six drive spindles provide six independent heads for reading/writing, compared to four. When the majority of the reads/writes are independent from one another, six gives you more than 50% better performance than four drives.
Furthermore, with six drives the RAID controller has some tricks it can do behind the scenes. For example, it can do disk checks while the drives are idle, actively finding disk errors before they become performance issues.
For ServInt, our disk IO data is independent enough that we are able to exploit these sorts of performance gains to maximize our server performance.
For some purposes, we run twenty-four drives as a RAID 6 array, giving massive performance boosts to the server IO, while maintaining high levels of data integrity and being able to survive and rebuild from 2 (and up to 4) drive failures while keeping the data online and the server usable.
For some customers, we run forty-eight drives as a massive RAID 60 array with hot-spare drives, giving huge performance boosts to the server IO and wonderful redundancy. But we have to tune the server load to match the performance of the array so that money is not being spent on performance which will never be leveraged in production.
Join us next time for Part 3 as we conclude our discussion of disk storage by looking specifically at the RAID controller.Photo by Julia Folsom