What is a RAID?

A RAID is NOT a backup!

Let me type that again... A RAID is not a backup. A RAID is about data availability, and redundancy. If you have a hardware failure (dead hard drive) your data is recoverable. But if you delete something, it's gone.

You can save your backup data TO a RAID, for example we use Norton Ghost to backup the files on our PCs daily, these Ghost backups are stored on one of our RAID 5 systems. Ghost allows us to go back and retrieve files that may have become damaged or accidentally deleted. While the RAID doesn't back up data, it allows us to safely store the backups made by Ghost.

RAID - which stands for Redundant Array of Inexpensive Disks (as named by the inventors) or Redundant Array of Independent Disks (a name which later developed within the computing industry) - is a technology where two or more hard disk drives are used to achieve greater levels of performance, reliability, and/or larger data volume sizes.

The fundamental principle behind RAID is the use of multiple hard disk drives in an array that behaves like a single large, fast disk drive. There are a number of ways that this can be done, depending on the needs of the application, but in every case the use of multiple drives allows the resulting storage system to exceed the capacity, data security, and performance of any single drive in the system.
The tradeoffs--remember, there's no free lunch--are usually in cost and complexity.

RAID's various designs all involve two key goals:

  • Increased data reliability.
  • Increased input/output performance.
When several physical disks are set up to use RAID technology, they are said to be in a RAID array. This array distributes data across several disks, but the array is seen by the computer user or operating system as just one, single disk.

Some arrays are "redundant" in a way that writes extra data derived from the original data across the array so that the failure of one (sometimes more) disks in the array will not result in loss of data. The bad disk is replaced by a new one, and the data on it reconstructed from the remaining data and the extra data. A redundant array obviously allows less data to be stored; a 2-disk RAID 1 array loses half of its capacity, and a RAID 5 array with several disks loses the capacity of one disk.

Other RAID arrays are arranged in a way that makes them faster to write to and read than a single disk.

RAID levels 0, 1, and 5 are the most commonly found, and cover most requirements. We use RAID 1, RAID 5 and RAID 6 here in the lab, and we suggest you do the same.

We use RAID 1 in our workstations, The workstation's hard drive is mirrored with a second drive, so the same data is written on both drives, if one fails, we still have the second one. This comes at a cost, as were using two drives, but are storing the same thing on both drives, so in effect, we have one drive for the price of two.

We also have multiple RAID 5 systems in use for image and data storage. The advantage of RAID 5 is that if there are 5 drives in the array, and one drive fails for some reason, we haven't lost any data. The bad drive can be replaced and the array will automatically rebuild itself. The cost of a RAID 5's protection is the space of one drive. If you have 5 drives, you can only store 4 drives worth data.

We've added a RAID 6 for testing. RAID 6 builds on the protection of RAID 5. With a RAID 5 when one drive goes bad, it is replaced and the array is rebuilt. If a second drive fails during the rebuild, all the data is lost. With a RAID 6, a second drive can fail during rebuild and the data will still be safe. While the storage cost of a RAID 5 is one drive, the cost of a RAID 6 is two drives.

A RAID offers a wealth of significant advantages that would be attractive to anyone with a large ammount of important data to store. Unfortunately, there are costs, tradeoffs and limitations to be dealt with. The degree that you realize the various benefits below does depend on the exact type of RAID you use, but you are always going to get some combination of the following:

  • Higher Data Security: Through the use of redundancy, most RAID levels provide protection for the data stored on the array. This means that the data on the array can withstand even the complete failure of one hard disk (or sometimes more) without any data loss, and without requiring any data to be restored from backup. All RAID levels provide some degree of data protection, depending on the exact implementation, except RAID level 0.
  • Fault Tolerance: RAID implementations that include redundancy provide a much more reliable overall storage subsystem than can be achieved by a single disk. This means there is a lower chance of the storage system as a whole failing due to hardware failures.
  • Improved Availability: Availability refers to access to data. Good RAID systems improve availability both by providing fault tolerance and by providing special features that allow for recovery from hardware faults without disruption. (RAID 5 and RAID 6)
  • Increased, Integrated Capacity: By turning a number of smaller drives into a larger array, you add their capacity together, though a percentage of total capacity is lost to overhead or redundancy. All RAID levels provide this "combining" benefit, though the ones that include redundancy lose some of the space because of that redundant information or "parity" information in the case of RAID 5 or RAID 6.
  • Improved Performance: Last, but not least, RAID systems improve performance. Different RAID systems improve performance in different ways and to different degrees, but all improve it in some way.

Obviously, the hardware costs are higher for those implementing RAID arrays. However, these costs must be compared to the costs of data loss, data recovery and interruption of availability that would result if RAID were not used. For many companies, the entire cost of a RAID setup pays for itself the first time it prevents their system from having to be taken down for a day to deal with a hardware failure. For the working professional photographer, this means not having to restore images from backups, not having to have us re-scan film, and not losing your irreplacable digital images for which there is no film to be re-scanned. It also allows you to store your images on what appears to be one large drive. No more using multiple external drives and trying to remember what is stored on which drive.

  • RAID 0 distributes data across several discs in a way which gives improved speed and full capacity, but all data on all disks is lost if any one disk fails. In the picture below you can see that the file named "A" is broken up into 8 pieces which are spread across the two drives. If one of the drives fails, all information is lost. For the most part this is how many large external storage units are set up. A 1TB external drive is really made up of 2 500GB drives.
  • RAID 0 Image

  • RAID 1 uses two (or more) disks which each store the same data (mirrored disks), so that data is not lost as long as one disk survives. Total capacity of the array is just the capacity of a single disk. This is how our workstation hard drives are set up, we have two physical drives that appear as one. If one drive fails, we still have the second one. In the image below you can see that the drive on the right stores the same information as the drive on the left.

    RAID 1 Image

  • RAID 5 combines three or more discs in a way that protects data against loss of any one disc; the storage capacity of the array is reduced by one disk. Until recently RAID 5 was the best compromise of storage cost and redundancy. This is our main storage method here in the lab. In the image below, file "A" is broken up into pieces A1, A2 and A3, some math is then done to these three pieces and the result is stored as a fourth piece, piece Ap. It's this "parity" piece that allows the RAID 5 to remain usable even after one drive fails. You can see from "B," "C," and "D" the parity is distributed across all the drives in the array. When one drive fails, the information on it can be rebuilt from the remaining pieces, if disk 2 were to fail, you'd lose piece 3 of file "A," but the system can rebuild piece 3 from 1, 2 and p.

    RAID 5 Image

  • RAID 6 an improvement on RAID 5, can recover from the loss of two disks. But the storage capacity is reduced by two disks. This is because RAID 6 uses double parity.

    RAID 6 Image

RAID systems with redundancy continue working without interruption when one, or sometimes more, disks of the array fail. When the bad disk is replaced by a new one the array is rebuilt while the system continues to operates normally. Some systems have to be shut down when removing or adding a drive; others support hot swapping, allowing drives to be replaced without powering down.

RAID with hot-swap drives is often used in high availability systems, where it is important that the system keeps running as much of the time as possible. It is important to note that redundant RAID is not an alternative to backing up data.

As I said earlier, a RAID is NOT a backup.

Data may become damaged or destroyed without harm to the drive on which it is stored. Part of the data may be overwritten by a system malfunction; a file may be damaged or deleted by mistake or malice and of course the hardware is at risk of theft, flood, and fire.

Optical backups (to CD or DVD) aren't the only answer as optical discs can degrade over time, become scratched or otherwise unreadable. For the best security multiple backups (depending on your level of paranoia) are necessary.

If you have questions or need help finding a RAID solution, contact us.