Using RAID in Your Homebuilt Computer
RAID stands for Redundant Array of Independent Disks. It's a way of using more than one hard drive in an "array" to provide fault tolerance (the machine keeps running if one drive fails), increased hard drive access speed, or both. This can be done by using a hardware RAID controller either integrated in or added to the motherboard ("hardware RAID"), or using software running on the computer's operating system ("software RAID"). When I refer to the "RAID controller" on this page without specifying hardware or software, it means either one.
There are many RAID schemes, but they all depend on these two basic concepts:
RAID striping (RAID 0) spreads data across two or more drives to speed up drive access. Consider it something along the lines of a road having additional lanes. The more lanes, the more traffic the road can efficiently handle. RAID striping provides no redundancy, however: If any drive in the array fails, the data will be lost.
RAID mirroring (RAID 1) provides redundancy and fault tolerance by writing the same data simultaneously to two or more identical drives in an array. If one drive fails, the RAID controller takes it offline, notifies the user of the problem, and keeps the system running on the remaining drive(s). When the failed drive is replaced, the system copies the data to the new drive.
Reasons Not to Use RAID
In my opinion, RAID is not something that most PC users need, especially with the high speeds and reliability of modern SSD drives. In fact, it's something I recommend against unless you really have a need for it. Here's why.
RAID mirroring, by itself, is not backup. The drives in a RAID mirror array are treated as one drive by the computer. That provides fault tolerance if one of the drives should fail; but it also means that bad system events such as viruses, ransomware, updates that hose the system, accidental deletions, and so forth will simultaneously affect all the drives in the mirrored array. RAID mirroring protects against drive failure, but nothing else.
If you use hardware RAID, you're actually adding a SPOF (Single Point of Failure) to the system: the RAID controller itself. Low-end RAID controllers are notoriously prone to failure, and middle- and high-end RAID controllers are very expensive. To make matters worse, When RAID controllers fail, they often go down in a blaze of glory, taking the data on their arrays down with them. That data may or may not be recoverable.
Finally, given the speed and reliability of high-quality SSD's like those made by Samsung or Crucial, typical users who aren't gamers and don't deal with large files or databases will experience little noticeable performance improvement by using RAID 0, and little or no improvement in reliability by using RAID 1. As stated earlier, a flaky RAID controller may actually increase the chances of downtime by becoming an additional SPOF in the system.
For most users, in my opinion, RAID simply isn't worth the hassle or expense.
Reasons to Use RAID
All that being said, there are cases when RAID makes sense. They all come under two broad use cases:
When You Need Very High Availability. RAID 1 is a good choice for machines that need to be up and running all the time. Some examples would include:
Servers of any kind are good candidates for RAID mirroring. One common case would be a small business whose operations depend on a single file or database server always being online. If that server goes down, the whole business grinds to a halt. That server is a good candidate for RAID 1.
The same is true for NAS (Network Attached Storage) devices that are used as file servers.
Home servers or NAS devices would also be good candidates for RAID 1 if family members' data is stored exclusively on the server or NAS.
Desktop computer users who make their livings on their computers and who absolutely cannot afford any downtime might also want to consider RAID 1.
When You Need Very Fast Drive Access. If you regularly use your computer in applications that require extremely fast drive access, then RAID 0 (striping), preferably in a configuration that also provides redundancy (more about that a bit further down the page), can dramatically improve your user experience. Some examples where RAID striping can significantly improve performance include:
File or database servers serving large files to multiple simultaneous users usually perform significantly better with RAID striping.
When encoding and saving long videos in Full HD or higher resolution, writing the finished video to your hard drive is often a bottleneck that slows the process because the processor or GPU has to wait for the hard drive to catch up. RAID striping will speed up the process.
The same is true for media servers used to feed multiple simultaneous video streams. A single SSD should be able to serve a single stream with no problem. But if you need the machine to serve multiple simultaneous streams, RAID striping may be the ticket to a better experience.
Some games use enormous amounts of local data for things like scenery and subroutines.
The same is true for some drafting programs, as well as software that relies on huge locally-stored databases. Basically, any machine running software that relies on large, locally-stored files is a candidate for RAID striping on at least the storage drive.
RAID Levels (RAID Configurations)
There are many possible RAID configurations, which are referred to as "levels." I'm only going to talk about the four that I think will be of the most interest to computer-building hobbyists. You can read about the rest of them here if you're interested or need more information.
RAID 0 (Striping)
RAID striping is designed to speed up storage access by spreading the data across two or more drives. The RAID controller divides the data between the drives when it writes to them, and reassembles it when it is requested by the computer. As with all RAID configurations, the computer addresses the array as if it were a single drive.
By itself, RAID 0 does not provide redundancy nor have parity. What that means in simple terms is that if any one of the drives in the array fails or becomes corrupted, none of the data will be accessible to the system.
In the case of a drive failure, the data will be lost, and recovery will be very difficult (if possible at all). In the case of filesystem corruption, the data may or may not be recoverable using operating system tools such as CHKDSK (on Windows) or FSCK (on Linux or other Unix-like systems).
If the drives become corrupted due to the hardware RAID controller misbehaving, on the other hand, the chances of recovery go down considerably. It depends on how much damage is done before the machine goes offline.
In my own experience, most cases of failed RAID controllers on Web servers have required restoring from backup after replacing the RAID controller (typically after spending hours waiting for an FSCK to finish in the vain hope that the system could be saved). But maybe I'm just unlucky.
RAID 1 (Mirroring)
RAID mirroring simultaneously writes data to two or more drives in an array, so they are kept identical. Its purpose (and its only purpose) is to maintain uptime and avoid the need to restore from backup if one of the drives fails.
By itself, RAID 1 is not backup. It does nothing to protect the system against malware, ransomware, system problems due to bad updates, or other untoward system events. If the computer becomes infected by a virus, all of the drives in the array will be infected. If a bad update hoses the system, all of the drives in the array will contain the same hosed system. So if you're thinking about using RAID 1 for backup, think again. RAID 1 is not backup.
What RAID 1 does do is provide a way for a computer to keep doing whatever it is that it does if a hard drive fails. If all goes according to plan, the RAID controller will take the failed drive offline, notify the user of the problem, and keep running on the remaining drive(s). When the failed drive is replaced, the RAID controller will begin copying data to the new drive.
Because of the reliability of SSD drives, I personally don't use RAID 1 on workstations anymore. I'd rather use that money to invest in good backup software, an ioSafe, and online backup. Even back in the old days of mechanical hard disk drives, I personally experienced more downtime due to RAID controller failures than I did due to hard drive failures.
In short, unless you're willing and able to invest in a top-shelf RAID controller, you may well be increasing your chances of downtime by using RAID mirroring with SSD drives because you'll be adding another SPOF to your system. I've yet to meet an affordable RAID controller that was more reliable than a high-quality SSD. On the top shelf, yes, they're out there. But not within the budgets of most hobbyists.
RAID 01 (or RAID 0+1)
RAID 01 attempts to gain both the drive access speed advantages of RAID 0 and the fault tolerance of RAID 1. It's often called a "mirror of stripes" because it consists of two or more sets of striped drives that are in turn mirrored, as in the diagram here.
It's also possible to use RAID 01 with three drives, in which case the first group will be two striped drives and the second group a single non-striped drive. In that configuration, the non-striped drive provides failover if either of the striped drives fails, but at single-drive, non-RAIDed access speed. In the real world, that configuration is seldom used because the cost savings are trivial compared to the performance loss in failover mode.
In terms of fault tolerance, a computer using a typical RAID 01 array in the more conventional four-drive configuration will continue to operate as long as all of the drives in any one group are functional. So using the diagram here, if drives 2 and 3 went down, the computer would keep running using drives 0 and 1. But if drive 0 and drive 2 went down, the array would fail. All the drives in at least one group of striped drives must be functional for the array to work.
In a nutshell, RAID 01 allows the array to deliver faster drive access than a single drive would provide, while providing enough fault tolerance to survive the loss of any number of drives as long as all the drives in any one group are functional. Using four physical drives, that means the array can survive the loss of any one drive, or of both drives in one group.
RAID 10 (or RAID 1+0)
Just as RAID 01 is a mirror of stripes, RAID 10 can be considered a stripe of mirrors. In theory, this provides a considerable redundancy advantage over RAID 01.
In a four-drive RAID 10 array, the two drives in group one are mirrors of each other and are striped with the two drives in group two, which are also mirrors of each other. For the array to remain functional, at least one drive containing each stripe would have to be functional.
Using this diagram, for example, the array could survive losing drives 0 and 3 because drives 1 and 2 each contain one of the two stripes between which the data is spanned. But it could not survive losing drives 0 and 1 because drives 2 and 3 are mirrors of each other and contain only one of the two stripes between which the data is spanned.
In theory, RAID 10 therefore doubles the fault tolerance of RAID 01. In practice, given the reliability of SSD drives, I still maintain that the chances of a RAID controller other than a top-shelf one simultaneously corrupting both drives in a mirror are higher than the chances of even one SSD failing in a workstation computer, never mind the chances of two SSD's failing simultaneously.
I emphasize "in a workstation" because server drives typically do a lot more work than workstation drives do; and as a consequence, they fail more often. In a data center environment, drive failures and replacements are an everyday occurrence, and the clients usually don't even know they happened. RAID keeps the machines happily humming along. So yes, servers need RAID, and RAID 10 is the most common level used. In a workstation, on the other hand, I think anything but a top-quality RAID controller increases the likelihood of a failure.
Yeah, I get it. I'm starting to sound pretty redundant myself. The long and short of it is that if you're going to use RAID with a hardware controller, buy the very best one you can afford. You're better off using software RAID than a cheap hardware RAID controller. Which reminds me...
Hardware vs Software RAID
Hardware RAID refers to a system in which a piece of hardware called a RAID controller manages the RAID array independently of the operating system. The controller may be built into the motherboard, or it may be an expansion card.
The main advantages to hardware RAID are that a high quality RAID controller will be a bit faster than software RAID and will not use any of the operating system's resources. The best ones will have onboard processors, onboard battery backup, and a generous amount of cache (which serves the same purpose as a hard drive's onboard cache does).
The biggest disadvantages to hardware RAID are that good controllers are very expensive, and that they add an additional SPOF to your system. When we were using mechanical drives, the chances of a drive failing were higher than that of a high-quality RAID controller failing. With SSD drives in desktop duty, I'm not so sure.
If you use a hardware RAID card, be sure to check the battery frequently and replace it as needed, according to the manufacturer's instructions. Otherwise a sudden loss of power while the drives are being written to can corrupt the array. Some controllers use rechargeable lithium ion batteries, while others use the same CR2032 batteries that are used on most motherboards. Either way, make checking the battery part of your routine computer maintenance.
Software RAID uses the operating system to manage the RAID array.
The advantages to software RAID are that it's less expensive (most operating systems have the ability built-in), that it's somewhat easier for users to configure, and that it tends to be more reliable and less likely to cause problems than low-end hardware RAID controllers. If you really need RAID, but can't afford a top-shelf hardware RAID controller, I advise you to use software RAID. I've never met an operating system whose software RAID wasn't more reliable than a cheap, no-name RAID controller.
The disadvantages of software RAID are that it does use system resources, and that it tends to be slower than what a high-quality hardware RAID controller would deliver. Whether or not it noticeably slows down your machine depends on how well-provisioned the machine is and how you use it. Software RAID uses processing power to manage the array and system RAM as write cache. The faster the processor and the more RAM you have, the less likely the resource load of software RAM is to make a noticeable difference. But I've heard gamers say it created a very slight lag that most users wouldn't notice, but which did affect their gaming.