# Reed-Solomon for software RAID

Software RAID is a mechanism for aggregating multiple hard drives together, with the aim of improving at least one of:

- Storage capacity.
- Data throughput rate (read speed, or write speed, or both).
- Mean time between failure (MTBF).

Optimising purely for storage capacity is easy; given N hard drives each with 1 terabyte of storage capacity, it isn't hard to imagine how to aggregate them into a system with N terabytes of storage capacity. Optimising purely for data throughput rate is similarly easy. Optimising for MTBF is the interesting part. In order to improve MTBF, the aggregation needs to ensure that data is not lost even when some hard drives fail. It isn't hard to see that some capacity needs to be sacrified in order to improve MTBF. One simple approach would be to write the same data to every hard drive, so N hard drives each with 1 terabyte of storage capacity would yield an aggregate with just 1 terabyte of storage capacity, but any N-1 drives could fail without data loss. Another simple approach would be to arrange the hard drives in pairs and write any given piece of data to both drives in *some* pair, yielding an aggregate system with N/2 terabytes of storage capacity and which allows one hard drive failure within each pair. A slightly more complex approach would be to nominate one hard drive as storing the XOR of all the others, yielding N-1 terabytes of storage and allowing one hard drive failure.

Compared against these simple aggregations, a Reed-Solomon (RS) aggregation initially looks like magic. An RS(10,4) system is an aggregation of 14 hard drives which has the capacity of 10 hard drives, and allows *any* 4 of the 14 to fail without data being lost. The general RS(N,M) construction requires N+M hard drives, has the capacity of N hard drives, and allows *any* M of the N+M to fail without data being lost. The system is described at the level of hard drives, but the underlying mechanism operates at the level of small words. As hard drives store bytes, it is useful for a byte to be an integer number of words, or for a word to be an integer number of bytes. For this reason, the word size is typically one of: 1 bit, 2 bits, 4 bits, 8 bits, 16 bits, 32 bits, or 64 bits.

In the framing of matrix math, the RS(N,M) system can be viewed as taking a vector of N words and multiplying it by some N×(N+M) matrix (of words) to give a vector of N+M words. The N+M words are then stored on N+M different hard drives, and any N of the output N+M words can be used to recover all N original input words. For this system to work, "all" we need is for every N×N submatrix of the N×(N+M) matrix to be invertible. To see why, assume we have N output words, then take the N×N submatrix which gives just those outputs, invert it, and multiply it by the outputs to give the original inputs. Note that throughout this piece I'm using the most general meaning of submatrix; a submatrix is formed by deleting zero or more rows and zero or more columns - in particular, a submatrix need not be a contiguous block of the original matrix. It is convenient if *some* N×N submatrix of the N×(N+M) matrix is the identity matrix, as then N of the output words are identical to N of the input words, and this is actually easy to achieve: take *any* N×N submatrix, invert it, and then multiply this inverse with the full N×(N+M) matrix. Accordingly, we can view the N×(N+M) matrix as the concatenation of an N×N identity matrix with an N×M matrix.

The problem thus decomposes to:

- Coming up with an N×(N+M) matrix which happens to have every N×N submatrix be invertible.
- Being able to invert N×N matrices when the time comes.
- Efficiently doing K×N by N×M matrix multiplication for large K (K of 1 is the basic case, but in practice K will be the file system block size divided by the word size).

For step 2, it helps if all the non-zero elements of the matrix are themselves invertible. This isn't true in general for integers modulo 2^{W}, so the W-bit words need to be viewed as elements of GF(2^{W}) rather than integers modulo 2^{W}.

For step 1, there are a few common approaches: Vandermonde matrices, Cauchy matrices, and special case constructions. The latter two approaches tend to operate directly on the view of the concatenation of an N×N identity matrix with an N×M matrix. Every N×N submatrix of such a concatenation being invertible is equivalent to every square submatrix (of any size) of the N×M matrix being invertible. To see why, we can use matrix determinants and Laplace expansion: first note that matrix invertibility is equivalent to having a non-zero determinant, then consider some N×N submatrix of the concatenation, and do Laplace expansion of all the parts of the submatrix which came from the identity part of the concatenation. After repeated expansion, the remaining minor will be some square submatrix of the N×M matrix, and modulo signs, the determinant of this minor will equal the determinant of the N×N submatrix.

Special case constructions are worth considering first. Firstly, if the word size is just 1 bit, then the only N×M matrix meeting the invertibility requirement is the N×1 matrix with a value of one in every position: no zeroes can appear in the N×M matrix (as the 1×1 square submatrix containing that zero isn't invertible), and 1-bit words can only be zero or one, so the N×M matrix has to be entirely ones, at which point M ≥ 2 is impossible (as any 2×2 square submatrix consisting entirely of ones isn't invertible). This special case is *exactly* the case of having one hard drive store the XOR of all the others. One possible conclusion is that word sizes larger than 1 bit are the key component of doing better than the simple approaches outlined at the start of this piece. The second special case to consider is M of 2 with the N×M matrix being the stacking of two N×1 vectors: the first vector consisting entirely of ones, and the second vector containing N distinct non-zero elements of GF(2^{W}). This meets the invertibility requirements (every 1×1 submatrix is non-zero, and every 2×2 submatrix can have its determinant easily computed and seen to be non-zero). Such a matrix can be constructed provided that N < 2^{W}, and is a common RAID6 construction.

Next up are Cauchy matrices. A square Cauchy matrix is always invertible, and every submatrix of a Cauchy matrix is a Cauchy matrix. Taken together, this means that an N×M Cauchy matrix meets the requirement of every square submatrix being invertible. Furthermore, an N×M Cauchy matrix is easy to construct: choose N+M distinct elements of GF(2^{W}), call the first N of these X, and the last M of these Y, then the N×M matrix has `inv(X[i] ^ Y[j])`

in position `i,j`

. Such a matrix can be constructed provided that N+M ≤ 2^{W}. For RS(10,4) this would mean that words need to be at least 4 bits.

Finally we get to Vandermonde matrices. This construction gives an N×(N+M) matrix meeting the requirement of every N×N submatrix being invertible, which then requires further processing to get to the concatenation-with-identity form. Some texts instead directly concatenate an N×M Vandermonde matrix to an identity matrix, but this does not work in general. Again, N+M distinct elements of GF(2^{W}) are chosen, but this time each element gives rise to an N×1 vector. For an element `e`

, the vector is `[1, e, e*e, e*e*e, ...]`

. These vectors are stacked to give an N×(N+M) Vandermonde matrix. Every N×N submatrix of this N×(N+M) matrix is itself a Vandermonde matrix, and as the N elements which define the submatrix are distinct, the determinant is non-zero, and thus the submatrix is invertible.

Vandermonde matrices are appealing because they contain lots of `1`

values; the first element in every vector is `1`

, and an entire vector of `1`

is obtained when `e == 1`

. If presence of these `1`

values in particular locations can be presumed, then some computationally expensive GF(2^{W}) multiplications can be elided. For example, in an RS(10,4) system, if all these `1`

values could be presumed, then the number of GF(2^{W}) multiplications required to calculate 4 checksum words from 10 input words reduces from 40 to just 27. Unfortunately, the transformation of the N×(N+M) Vandermonde matrix to concatenation-with-identity form need not preserve *any* of these nice `1`

values. All is not lost though; if we have an identity matrix concatencated with an N×M matrix (be that a transformed Vandermonde matrix, or a Cauchy matrix, or any other construction), then the N×M matrix can be transformed in certain ways without affecting the invertibility properties of (every square submatrix of) that N×M matrix. In particular, any row and/or column can be scaled by any non-zero constant without affecting invertibility. This means we can look at every row in turn, and scale that row by the `inv`

of the row's first element, and then do the same for every column. This will give an N×M matrix where the first row and the first column are both entirely `1`

.

As a worked example for RS(10,4), here is a 10×14 Vandermonde matrix in GF(2^{8}) with `x`

, elements given as their hex representation:^{8} == x^{4} + x^{3} + x + 1

```
01 01 01 01 01 01 01 01 01 01 01 01 01 01
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d
00 01 04 05 10 11 14 15 40 41 44 45 50 51
00 01 08 0f 40 55 78 6b 36 7f 9e d1 ed b0
00 01 10 11 1b 1a 0b 0a ab aa bb ba b0 b1
00 01 20 33 6c 72 3a 36 2f 8d c2 72 01 bc
00 01 40 55 ab a1 9c 82 63 89 d5 2b 0c ed
00 01 80 ff 9a 13 65 a3 35 ad 43 3e 50 5d
00 01 1b 1a 5e 5f 45 44 b3 b2 a8 a9 ed ec
00 01 36 2e 63 38 85 c7 ef 55 7c df b0 50
```

The inverse of the leftmost 10×10 submatrix is:

```
01 ee 01 ab 42 42 00 45 43 43
00 01 ef ee 45 07 45 45 00 43
00 ad 68 99 a5 66 8a 45 78 28
00 3f 3e dc 0c 23 cf 45 50 28
00 c5 2f 88 b6 8c 0f 45 9b 89
00 0d ed cd 70 c9 4a 45 12 89
00 95 90 ad e0 92 85 45 e8 f2
00 1d fd e8 b2 d7 c0 45 1a f2
00 ce 92 17 f1 3a 00 00 90 10
00 f3 85 17 cb 3a 00 00 80 10
```

Multiplying these two matrices gives concatenation-with-identity form:

```
01 00 00 00 00 00 00 00 00 00 81 96 b3 da
00 01 00 00 00 00 00 00 00 00 96 81 da b3
00 00 01 00 00 00 00 00 00 00 af b8 6e 06
00 00 00 01 00 00 00 00 00 00 b8 af 06 6e
00 00 00 00 01 00 00 00 00 00 d2 c4 0c 65
00 00 00 00 00 01 00 00 00 00 c4 d2 65 0c
00 00 00 00 00 00 01 00 00 00 fe e8 d5 bd
00 00 00 00 00 00 00 01 00 00 e8 fe bd d5
00 00 00 00 00 00 00 00 01 00 03 02 05 04
00 00 00 00 00 00 00 00 00 01 02 03 04 05
```

Then rescaling rows and columns of the rightmost 10×4 block to give `1`

values along its leading row and column:

```
01 00 00 00 00 00 00 00 00 00 01 01 01 01
00 01 00 00 00 00 00 00 00 00 01 2c 5e 2e
00 00 01 00 00 00 00 00 00 00 01 45 4e 1e
00 00 00 01 00 00 00 00 00 00 01 2d c6 b1
00 00 00 00 01 00 00 00 00 00 01 d9 7a 94
00 00 00 00 00 01 00 00 00 00 01 fe d0 56
00 00 00 00 00 00 01 00 00 00 01 5e 53 da
00 00 00 00 00 00 00 01 00 00 01 2e da 7e
00 00 00 00 00 00 00 00 01 00 01 30 f6 85
00 00 00 00 00 00 00 00 00 01 01 3c a4 d5
```

It can be exhaustively verified that every 10×10 submatrix is invertible.

Alternatively, for doing a Cauchy construction, start with N distinct values vertically, and M more horizontally:

```
0a 0b 0c 0d
------------
00 |
01 |
02 |
03 |
04 |
05 |
06 |
07 |
08 |
09 |
```

Then construct the N×M matrix one element at a time by doing GF(2^{8}) `inv`

of the XOR of the corresponding vertical value and horizontal value:

```
0a 0b 0c 0d
------------
00 | 29 c0 b0 e1
01 | c0 29 e1 b0
02 | e8 4f e5 c7
03 | 4f e8 c7 e5
04 | e5 c7 e8 4f
05 | c7 e5 4f e8
06 | b0 e1 29 c0
07 | e1 b0 c0 29
08 | 8d f6 cb 52
09 | f6 8d 52 cb
```

Then rescale each row and column to give `1`

values along the leading row and column:

```
0a 0b 0c 0d
------------
00 | 01 01 01 01
01 | 01 2c 5e 2e
02 | 01 45 4e 1e
03 | 01 2d c6 b1
04 | 01 d9 7a 94
05 | 01 fe d0 56
06 | 01 5e 53 da
07 | 01 2e da 7e
08 | 01 30 f6 85
09 | 01 3c a4 d5
```

Then remove the guide values and instead concatenate with an identity matrix:

```
01 00 00 00 00 00 00 00 00 00 01 01 01 01
00 01 00 00 00 00 00 00 00 00 01 2c 5e 2e
00 00 01 00 00 00 00 00 00 00 01 45 4e 1e
00 00 00 01 00 00 00 00 00 00 01 2d c6 b1
00 00 00 00 01 00 00 00 00 00 01 d9 7a 94
00 00 00 00 00 01 00 00 00 00 01 fe d0 56
00 00 00 00 00 00 01 00 00 00 01 5e 53 da
00 00 00 00 00 00 00 01 00 00 01 2e da 7e
00 00 00 00 00 00 00 00 01 00 01 30 f6 85
00 00 00 00 00 00 00 00 00 01 01 3c a4 d5
```

This happens to be the *exact same* 10×14 matrix as given by the Vandermonde construction. This is no coincidence; it can be shown that taking the inverse of a square Vandermonde matrix and multiplying it by a different Vandermonde matrix (which is effectively what the Vandermonde construction does for the rightmost N×M block) yields a Cauchy matrix, modulo some per-row and per-column scaling. That scaling, whatever it is, gets undone by forcing the leading row and column to `1`

.