What is "RAID?"

What is RAID?
A RAID is NOT a backup!

Let me type that again... A RAID is not a backup. A RAID is about data availability, and redundancy. If you have a hardware failure (dead hard drive) your data is recoverable. But if you delete something, it's gone.

You can save your backup data TO a RAID, for example we use Norton to backup the files on our PCs daily, and these backups are stored on one of our RAID systems. This backup allows us to go back and retrieve files that may have become damaged or accidentally deleted. While the RAID doesn't back up data, it allows us to safely store the backups.

RAID - which stands for Redundant Array of Inexpensive Disks (as named by the inventors) or Redundant Array of Independent Disks (a name which later developed within the computing industry) - is a technology where two or more hard disk drives are used to achieve greater levels of performance, reliability, and/or larger data volume sizes.

The fundamental principle behind RAID is the use of multiple hard disk drives in an array that behaves like a single large, fast disk drive. There are a number of ways that this can be done, depending on the needs of the application, but in every case the use of multiple drives allows the resulting storage system to exceed the capacity, data security, and performance of any single drive in the system.

The tradeoffs--remember, there's no free lunch--are usually in cost and complexity.

RAID's various designs all involve two key goals:

  •     Increased data reliability.
  •     Increased input/output performance.

When several physical disks are set up to use RAID technology, they are said to be in a RAID array. This array distributes data across several disks, but the array is seen by the computer user or operating system as just one, single disk.

Some arrays are "redundant" in a way that writes extra data derived from the original data across the array so that the failure of one (sometimes more) disks in the array will not result in loss of data. The bad disk is replaced by a new one, and the data on it reconstructed from the remaining data and the extra data. A redundant array obviously allows less data to be stored; a 2-disk RAID 1 array loses half of its capacity, and a RAID 5 array with several disks loses the capacity of one disk.

Other RAID arrays are arranged in a way that makes them faster to write to and read than a single disk.

RAID levels 0, 1, and 5 are the most commonly found, and cover most requirements. We use RAID 1, RAID 6 and a new “Beyond RAID” technology here in the lab, and we suggest you do the same.

We use RAID 1 in our workstations, the workstation's hard drive is mirrored with a second drive, so the same data is written on both drives, if one fails, we still have the second one. This comes at a cost, as we’re using two drives, but are storing the same thing on both drives, so in effect, we have one drive for the price of two.

We also have multiple RAID 6 systems in use for job data storage. The advantage of RAID 6 is that if there are 5 drives in the array, up to two drives can fail, and we haven't lost any data. The bad drive(s) can be replaced and the array will automatically rebuild itself. The cost of a RAID 6's protection is the space of 2 drives. If you have 5 drives, you can only store 3 drives worth data.

We've added boxes from DROBO which use “beyond RAID technology,” more about that later.

A RAID offers a wealth of significant advantages that would be attractive to anyone with a large amount of important data to store. Unfortunately, there are costs, tradeoffs and limitations to be dealt with. The degree that you realize the various benefits below does depend on the exact type of RAID you use, but you are always going to get some combination of the following:

Higher Data Security: Through the use of redundancy, most RAID levels provide protection for the data stored on the array. This means that the data on the array can withstand even the complete failure of one hard disk (or sometimes more) without any data loss, and without requiring any data to be restored from backup. All RAID levels provide some degree of data protection, depending on the exact implementation, except RAID level 0.

Fault Tolerance: RAID implementations that include redundancy provide a much more reliable overall storage subsystem than can be achieved by a single disk. This means there is a lower chance of the storage system as a whole failing due to hardware failures.

Improved Availability: Availability refers to access to data. Good RAID systems improve availability both by providing fault tolerance and by providing special features that allow for recovery from hardware faults without disruption. (RAID 5 and RAID 6)

Increased, Integrated Capacity: By turning a number of smaller drives into a larger array, you add their capacity together, though a percentage of total capacity is lost to overhead or redundancy. All RAID levels provide this "combining" benefit, though the ones that include redundancy lose some of the space because of that redundant information or "parity" information in the case of RAID 5 or RAID 6.

Improved Performance: Last, but not least, RAID systems improve performance. Different RAID systems improve performance in different ways and to different degrees, but all improve it in some way.

Obviously, the hardware costs are higher for those implementing RAID arrays. However, these costs must be compared to the costs of data loss, data recovery and interruption of availability that would result if RAID were not used. For many companies, the entire cost of a RAID setup pays for itself the first time it prevents their system from having to be taken down for a day to deal with a hardware failure. For the working professional photographer, this means not having to restore images from backups, not having to have us re-scan film, and not losing your irreplaceable digital images for which there is no film to be re-scanned. It also allows you to store your images on what appears to be one large drive. No more using multiple external drives and trying to remember what is stored on which drive.

RAID 0 Image

RAID 0 distributes data across several discs in a way which gives improved speed and full capacity, but all data on all disks is lost if any one disk fails. In the picture below you can see that the file named "A" is broken up into 8 pieces which are spread across the two drives. If one of the drives fails, all information is lost. For the most part this is how many large external storage units are set up. A 1TB external drive is really made up of 2 500GB drives.

RAID 1 Image

RAID 1 uses two (or more) disks which each store the same data (mirrored disks), so that data is not lost as long as one disk survives. Total capacity of the array is just the capacity of a single disk. This is how our workstation hard drives are set up, we have two physical drives that appear as one. If one drive fails, we still have the second one. In the image below you can see that the drive on the right stores the same information as the drive on the left.


RAID 5 Image  

RAID 5 combines three or more discs in a way that protects data against loss of any one disc; the storage capacity of the array is reduced by one disk. Until recently RAID 5 was the best compromise of storage cost and redundancy. This is our main storage method here in the lab. In the image below, file "A" is broken up into pieces A1, A2 and A3, some math is then done to these three pieces and the result is stored as a fourth piece, piece Ap. It's this "parity" piece that allows the RAID 5 to remain usable even after one drive fails. You can see from "B," "C," and "D" the parity is distributed across all the drives in the array. When one drive fails, the information on it can be rebuilt from the remaining pieces, if disk 2 were to fail, you'd lose piece 3 of file "A," but the system can rebuild piece 3 from 1, 2 and p.

RAID 6 Image

RAID 6 an improvement on RAID 5, can recover from the loss of two disks. But the storage capacity is reduced by two disks. This is because RAID 6 uses double parity.

“Beyond RAID”

DROBO’s Beyond RAID technology seeks to fix some issues with traditional RAID systems.  With a traditional RAID, you are “locked in” to the RAID level you choose at set-up.  If you start out with a RAID 5 and want to move your data to a RAID 6 later for more redundancy, you aren’t always able to do that.  Also in a traditional RAID system, expandability is time consuming and expensive, and if an error occurs, you may lose all your data.  Your traditional RAID has to be made up of identical drives, if you have 5 1TB drives and space is getting tight, you must replace all 5 drives with larger drives.  In a traditional RAID these drives need to be replaced one at a time, waiting for the RAID to rebuild after replacing each drive.  If a drive fails during this rebuild process (which can take days) all your data will be lost.

Beyond RAID takes the traditional RAID levels and adds a layer of virtualization designed to make it more flexible.

drobo image

DROBO’s system allows you to use different capacity drives and hot swap them in and out of the system as your storage requirements change.  Drives do not have to be added all at once, in the image above, say you started with 2-500GB drives, as you needed more room you added a 1 TB drive, 6 months later there was a sale so you added the second 1TB drive, then the 2TB drive a year later.  For your next move you could get an additional 2TB drive (or higher) and replace one of the 500GB drives to expand your storage.

Drives can be added as needed and they can be of any capacity. 

Because Beyond RAID is proprietary to DROBO, if the DROBO box itself stops working, in order to recover the data, you’d need a new box from DROBO. With a traditional RAID hardware failure, depending on the hardware and how the RAID was set up, the drives may be able to be moved to another computer and the data restored, but there is a high level of computer knowledge required, and a lot of “ifs.”

RAID systems with redundancy continue working without interruption when one, or sometimes more, disks of the array fail. When the bad disk is replaced by a new one the array is rebuilt while the system continues to operates normally. Some systems have to be shut down when removing or adding a drive; others support hot swapping, allowing drives to be replaced without powering down.

RAID with hot-swap drives is often used in high availability systems, where it is important that the system keeps running as much of the time as possible. It is important to note that redundant RAID is not an alternative to backing up data.

As I said earlier, a RAID is NOT a backup.

Data may become damaged or destroyed without harm to the drive on which it is stored. Part of the data may be overwritten by a system malfunction; a file may be damaged or deleted by mistake or malice and of course the hardware is at risk of theft, flood, and fire.

Optical backups (to CD or DVD) aren't the only answer as optical discs can degrade over time, become scratched or otherwise unreadable. For the best security multiple backups (depending on your level of paranoia) are necessary.

If you have questions or need help finding a RAID solution, Contact Us.