In previous articles, we have covered the system bus, host bus adapters, and disk drives. Now we will move up the food chain at take a look at getting several disks to operate as one.
In 1988 David A. Patterson, Garth Gibson, and Randy H. Katz authored a seminal paper, A Case for Redundant Arrays of Inexpensive Disks (RAID). The main concept was to use off the shelf commodity hardware to provide better performance and reliability and a much lower price point than the current generation of storage. Even in 1988, we already knew that CPUs and memory were outpacing disk drives. To try to solve these issues Dr. Patterson and his team laid out the fundamentals of our modern RAID structures almost completely RAID levels 1 through 5 all directly come from this paper. There have been improvements in the error checking but the principals are the same. In 1993, Dr. Patterson along with his team released a paper covering RAID 6.
Speed, Fault Tolerance, or Capacity?
You can’t have your cake and eat it too. In the past, it was hard to justify the cost of RAID 10 unless you really needed speed and fault tolerance. RAID 5 was the default because in most situations it was good enough. Offering near raid 0 read speeds. If you had a heavy write workload, you took a penalty due to the parity stripe. RAID 6 suffers from this even more so with two parity stripes to deal with. Today, with the cost of drives coming down and the capacity going up RAID 10 should be the default configuration for everything.
Here is a breakdown of how each RAID level handles reads and writes in order of performance.
RAID Level | Write Operations | Notes | Read Operations | Notes |
RAID 0 | 1 operation | High throughput, low CPU utilization. No data protection |
1 operation | High throughput, low CPU utilization. |
RAID 1 | 2 IOP’s | Only as fast as a single drive. | 1 IOP | Two read schemes available. Read data from both drives, or data from the drive that returns it first. One is higher throughput the other is faster seek times. |
RAID 5 | 4 IOP’s | Read-Modify-Write requires two reads and two writes per write request. Lower throughput higher CPU if the HBA doesn’t have a dedicated IO processor. | 1 IOP | High throughput low CPU utilization normally, in a failed state performance falls dramatically due to parity calculation and any rebuild operations that are going on. |
RAID 6 | 6 IOP’s | Read-Modify-Write requires three reads and three writes per write request. Do not use a software implementation if it is available. | 1 IOP | High throughput low CPU utilization normally, in a failed state performance falls dramatically due to parity calculation and any rebuild operations that are going on. |
Choosing your RAID level
This is not as easy as it should be. Between budgets, different storage types, and your requirements, any of the RAID levels could meet your needs. Let us work of off some base assumptions. Reliability is necessary, that rules out RAID 0 and probably RAID 0+1. Is the workload read or write intensive? A good rule of thumb is more than 10% reads go RAID 10. In addition, if write latency is a factor RAID 10 is the best choice. For read workloads, RAID 5 or RAID 6 will probably meet your needs just fine. One of the other things to take into consideration if you need lots of space RAID 5 or RAID 6 may meet your IO needs just through sheer number of disks. Take the number of disks divide by 4 for RAID 5 or 6 for RAID 6 then do your per disk IO calculations you may find that they do meet your IO requirements.
Separate IO types!
The type of IO, random or sequential, greatly affects your throughput. SQL Server has some fairly well documented IO information. One of the big ones folks overlook is keeping their log separate from their data files. I am not talking about all logs on one drive and all data on another, which buys you nothing. If you are going to do that you might as well put them all on one large volume and use every disk available. You are guaranteeing that all IO’s will be random. If you want to avoid this, you must separate your log files from data files AND each other! If the log file of a busy database is sharing with other log files, you reduce its IO throughput 3 fold and its data through put 10 to 20 fold.
RAID Reliability and Failures
Correlated Disk Failures
Disks from the same batch can suffer similar fate. Correlated disk failures can be due to a manufacturing defect that can affect a large number of drives. It can be very difficult to get a vendor to give you disks from different batches. Your best bet is to hedge against that and plan to structure your RAID arrays accordingly.
Error rates and Mean Time Between Failures
As hard disks get larger the chance for an uncorrectable and undetected read or write failure. On a desktop drive, that rate is 10^14 bits read there will be an unrecoverable error. A good example is an array with the latest two-terabyte SATA drives would hit this error on just one full pass of a 6 drive RAID 5 array. When this happens, it will trigger a rebuild event. The probability of hitting another failure during the rebuild is extremely high. Bianca Schroeder and Garth A. Gibson of Carnegie Mellon University have written an excellent paper on the subject. Read it, it will keep you up at night worrying about your current arrays. Enterprise class drives are supposed to protect against this. No study so far proves that out. That does not mean I am swapping out my SAS for SATA. Performance is still king. They do boast a much better error rate 10^16 or 100 times better. Is this number accurate or not is another question all together. Google also did a study on disk failure rates, Failure Trends in a Large Disk Drive Population. Google also found correlated disk failures among other things. This is necessary read as well. Eventually, RAID 5 just will not be an option, and RAID 6 will be where RAID 5 is today.
What RAID Does Not Do
RAID Doesn’t back your data up. You heard me. It is not a replacement for a real backup system. Write errors do occur.As database people we are aware of atomic operations, the concept of an all or nothing operation, and recovering from a failed transaction. People assume the file system and disk is also atomic, it isn’t. NTFS does have a transaction system now TxF I doubt SQL Server is using it. Disk drives limit data transfer guarantees to the sector size of the disk, 512 bytes. If you have the write cache enabled and suffer a power failure, it is possible to write part of the 8k block. If this happens, SQL Server will read new and old data from that page, which is now in an inconsistent state. This is not a disk failure. It wrote every 512-byte block it could successfully. When the disk drive comes back on line, the data on the disk is not corrupted at the sector level at all. If you have turned off torn page detection or page checksum because you believe it is a huge performance hit, turn it back on. Add more disks if you need the extra performance don’t put your data at risk.
Final Thoughts
- Data files tend to be random reads and writes.
- Log files have zero random reads and writes normally.
- More than one active log on a drive equals random reads and writes.
- Use Raid 1 for logs or RAID 10 if you need the space.
- Use RAID 5 or RAID 6 for data files if capacity and read performance are more important than write speed.
- The more disks you add to an array the greater chance you have for data loss.
- Raid 5 offers very good reliability at small scale. Rule of thumb, more than 8 drives in a RAID 5 could be disastrous.
- Raid 6 offers very good reliability at large scales. Rule of thumb, less than 9 drives you should consider RAID 5 instead.
- Raid 10 offers excellent reliability at any scale but is susceptible to correlated disk failures.
- The larger the disk drive capacity should adjust your number of disks down per array.
- Turn on torn page for 2000 and checksum for 2005/08.
- Restore Backups regularly,
- RAID isn’t a backup solution.