This article describes the AES reference implementation that I wrote in Haskell. I have implemented this algorithm half a dozen times over my career, for use in FIPS compliant implementations, for use in embedded systems, and even for use as a building block for more interesting things. This implementation is destined to be used for such an interesting project, which we will talk about during the next blog entry. First, we need an AES algorithm. AES is an interesting algorithm that combines some of the common elements typical of encryption, such as Confusion and Diffusion and Abstract Algebra, in this case Galois Field Theory. AES is based on the Rijndael block cipher, but with specific tunings as decided by NIST. Rijndael is a more general algorithm. This article stays close to AES, but implements tunable parameters which takes the algorithm away from the standard by adding back some ideas found in the original cipher. Since the point of this implementation is to build a parameterized PRNG and not to build an encryption system, I think this is a decent trade-off.
First, I should crush some expectations. This implementation of the cipher is a reference implementation. It should not be used in a production environment, or for any purpose where security is important. This only implements ECB mode, which is the weakest encryption mode. ECB is a building block for more interesting things, but it cannot be used on its own without opening one up to many terrible attacks. Also, this implementation is not timing resistant. Although it uses table lookups for the important operations, Haskell is not a low-level enough language to make any claims about cache coherency or even how operations are performed by the CPU. This implementation will leak key details through timing attacks. Finally, this implementation is garbage collected. Secret data will exist throughout RAM for an indeterminate amount of time. On most platforms, a Haskell process runs in user space, and its RAM could be accessible by other processes running on the system. Key details could be leaked long after a cryptographic operation is complete. My point here is that this implementation should not be used for any purpose other than education, testing, and experimentation.
The reason why I chose Haskell is because it is a high-level language that closely mimics the mathematical operations I will be performing. The reference implementation is more about exploring the theory behind AES so I can build the cipher from the ground up. This implementation isn’t fast, but it is useful for writing test cases for alternative implementations, be these in hardware or software. Also, we can play with the cipher by changing the number of rounds or changing state properties. This sort of experimentation, coupled with High-Order Logic, gives us the ability to write useful proofs or even to launch attacks against this cipher. Furthermore, when debugging hardware implementations or implementations in hand-written assembler, it is useful to be able to de-construct the cipher at different points for testing and optimization purposes. By building simple functions and composing these into the complete cipher, we gain a lot of flexibility for tackling any of these sorts of things. In fact, I used the low-level building blocks I defined here to check the math I used in derivation steps for higher-level functionality.
There is another reason why I’m writing this blog entry. I can re-create AES from first principles by performing some math and knowing some memorized constants and heuristics. This is a very powerful skill, although, only really useful if one finds oneself in a situation where ready access to encryption technology is not available. It is possible, although a rather tedious exercise, to perform AES encryption using pen and paper. A lot of derivation is required, followed by the building of lookup tables. Then, quite a bit of scrap paper would be required. But, the point is that with some know-how, and some memorization, we can build up the AES cipher from a very basic mathematical understanding. I don’t know about you, but I find this to be very cool. The mathematics behind this cipher are not difficult to understand. This is a cipher that each of us uses every day, some without realizing it. Many of us treat this technology as if it were “magic”, or that ciphers are things that other people write, using skills that are beyond what you and I could ever do. While it’s not advisable to write and use your own ciphers – they require many years of peer review before they can be considered “safe” – the mathematical understanding required for cryptography is available to anyone who wants to do some work to acquire these skills. I hope that I can demystify this cipher a little, and maybe inspire others to further study this topic.
Front Matter
This article is both an article on my blog and a Literate Haskell implementation of this cipher. The following section can be largely ignored, as it is just doing some set-up for the code.
For low-level operations, we need Data.Word
and Data.Bits
. To get foldl'
,
we use Data.List. To build lookup tables, we use Data.Array.
Mathematical Operations in AES
Before getting into the mechanics of AES, I should explain the mathematical operations in this cipher. AES performs operations in Galois fields. Two basic operations are defined: addition and multiplication. We use these two operations to define subtraction and division, which are all combined to form a field. These operations work differently than operations in or , which is what we tend to use most of the time.
Galois Fields
I’m not going to provide an in-depth discussion of the math in a Galois field. I will provide some information that should make it easier to grasp what is going on in the code below. A Galois field, or a finite field, is a field with a finite number of elements under which the mathematical operations of addition, subtraction, multiplication, and division are defined. These operations don’t necessarily have to work as we understand them. There just has to be a rigid way in which these can be done. The result of each of these operations should be another element in the field. There are some other properties of Galois fields which aren’t important for the discussion for now, and I feel would be a bit of a distraction. So, I won’t discuss them here.
First, as a matter of convention, when I describe values in different bases, I
will use the following to best mirror the NIST documentation. Hexadecimal
values will be represented with braces, such as {01} for 01 hex. Binary values
will have a b
at the end, like 0100b.
The Galois field that we will be working under is called . In this field, there is a bucket of bits. We call these bits , , , , , , , and . An element in this field is an additive combination of zero or more of these bits. So, for instance, is a member of this field. It corresponds to the number 00010011b, or {13}, or 19 in decimal. It should be easy to see the correspondence between the polynomial representation of this number and the binary representation. Each bit in the binary representation determines whether the corresponding bit in the bucket belongs to this element or not.
Addition in AES
Addition in the AES Galois field is the same as a bitwise exclusive or
operation, or xor
. So, for instance, , because
and . By exclusive-or:
Our result is, . Bits which exist in one element but not the other are combined. Bits that exist in both are canceled out. This is addition in .
We can define a generic addition operation that adds any two numbers of the same type for any field.
We use the Bits
type restriction on the generic type a
to ensure that the
xor
operation will be defined in the way we expect. Because we have defined
addition in terms of xor
, it is relatively easy to prove that subtraction is
the same operation as addition. I leave that as an exercise to the reader. As
a hint, try deriving the additive inverse for any number. Use a truth table.
Polynomial Expansion
Now we can shift our focus to multiplication. Multiplication in a Galois Field
is a little trickier than standard multiplication. We cannot use built-in CPU
operations to help us. A Galois field treats numbers as polynomials. In this
case, the bit position of a number indicates the power. So, for instance, the
number {51} corresponds to 1010001b. This also corresponds to the
polynomial . But, remember that addition in
this field is not the same as addition in
or .
So, the polynomial is really
xor
xor
. It is this tricky bit that makes the
math we are about to do seem a little different than typical polynomial math.
This also means that we must do multiplication and division the long way. But,
we will optimize things in a bit.
Now that we understand the polynomial representation, we can consider how this
corresponds to the bit pattern of a Word8
or Word32
. If each bit represents
a coefficient of the given power of two, then these words make a compact
representation. A Word8
can hold a polynomial of order 8, and a Word32
can
hold a polynomial of order 32. Since we know that xor
would cancel out two
terms raised to the same power, a single bit is more than enough storage space
to hold each coefficient, which can be either 1 or 0.
The compact representation cannot be used for any operation other than addition,
because of how binary arithmetic works. Instead, we need to expand this
representation so we can perform multiplications and divisions. This is what
expandPolynomial
does.
This function uses the recursive piecewise function expandPoly
to build a list
of expanded coefficients. Each of these has exactly one bit set. This is done
by first iterating through all bits in a 32-bit word and performing a bitwise
AND (.&.)
using this bit position and the value to be expanded. Each of these
values are pushed onto a list. Finally, when all bits have been iterated, zero
coefficients are removed by using the filter
function to retain only non-zero
coefficients from this list.
Polynomial Multiplication
Now we can implement polynomial multiplication. We have a modular step left
before this becomes the multiplication method we will use for this Galois field,
but we will talk about that in a bit. To multiply two values, x
and y
, we
expand one of the two values and then multiply this expansion by the other
value. This produces a set of intermediate values that must be added
together to get the result. As before, add means xor
. To best understand
this operation, think back to how you were taught long-hand multiplication in
grade school. The operation is the same process, more or less.
Consider the following example. We multiply 43 by 19.
As we can see, the polynomial multiplication of these two numbers is
1011001101b, which corresponds to 717. We multiply the left hand element by a
single bit in the right hand element for each of the bits in the right hand
element. We then add each of these values using xor
.
The corresponding function in Haskell is listed here.
There is a lot going on in this function. First, we use expandPolynomial
to
expand y
into a list of coefficients. We then use map
to multiply x
by
each of these coefficients. Finally, we use foldl
with gfAdd
to collapse
this back into a compact polynomial for this field. The left fold function
repeatedly applies the function in the first argument, gfAdd
to each element
in the list, accumulating the result. A more in-depth explanation of foldl
is
below in the Low Level Operations section.
Polynomial Multiplication in GF
The polynomial multiplication function as implemented needs some additional work to be closed under multiplication in the field. As we can see with the above example, the value we got from multiplying 43 by 19 is much larger than 255, which is the largest element in our field. Our field is therefore not closed under the multiplication function we have written above. It is possible to use this function with two elements to get an element not in the field. In order to get around this problem, we need to select a prime number that we can use to make this field modular. This means that if we generate an element through multiplication that is not an element in the field, we divide this element by a fixed prime number. The remainder of this division becomes the result of this closed multiplication. We repeat this modulus operation until the remainder is in .
AES uses the prime 283. In hex, this is {11b}. The polynomial form is therefore . This number is an irreducible polynomial, which means that it can only be divided by itself and 1 (also, this is the definition of prime). If we multiply two polynomials modulo this value, then our multiplication is closed under . Because this number is prime, the modular field will contain every element in . The explanation of this fact is rather long-winded, but I highly suggest reading up on field theory to understand why this is true. It is what makes algorithms like AES possible.
Just as we used long form multiplication for polynomials, we use long division for dividing polynomials. To get the modulus form, which is really what we are interested in, we perform the division and only keep the remainder. As with long division, we work from the left to the right of the dividend until the remainder is smaller than the divisor. The values that we multiply the divisor by during each step would, when added together, be the quotient. But, it’s only the remainder we care about.
Getting back to our multiplication example, let’s reduce our result, 717, modulo the prime 283. Here, 283 = 100011011b, and 717 = 1011001101b.
Here, the remainder, which is all we care about for this operation, is
11111011b, or 251. To capture this same operation in Haskell requires some
recursion. We need to find the greatest power of x
in the divisor and
dividend, and then find the number we must multiply the divisor by to be less
than or equal to this power. We apply the modulus operation until x
is back
in . This could mean applying the operation once
more when x
is less than 283 but still greater than 255.
In this recursive function, we continue execution until x
< 256. At this
point, x
is the remainder. Through each iteration, we compute the greatest
polynomial factor of both x
and y
and perform an integer division operation
on these factors. This gives us the factor by which we multiply y
to get the
largest multiple of y
that we can “subtract” from x for the given coefficient.
The process is very similar to computing a CRC.
Now, we can define gfMult
.
In this function, we just take the remainder of the polynomial multiplication of
x
and y
, which gives us ( x
· y
) mod
283.
Multiplicative Inverse
The next thing we are interested in obtaining for the AES algorithm is the multiplicative inverse for any given number in . Now, we could perform an extended Euclidean algorithm to build a function that does this, but there is a much easier way. The functions we have defined so far can be used to build multiplication tables. With some inverted mapping, we can use these tables to derive the multiplicative inverses for any value in the field. Since most AES implementations use table lookups for performing multiplications within the fields, we can kill two birds with one stone. We can build these tables, which can be used for efficient implementations, and we can use these tables to compute the inverse mapping.
First, let’s build a multiplication table. This table is 256 x 256,
and covers multiplying any element by any other element. We use Data.Array
to build a fixed length array of arrays. There are 256 arrays, each of
which contain 256 elements.
Here, we use list comprehension to define lists of pairs of elements. Each pair
contains the offset in the array to which value (the second argument of the
pair) is stored. The outer list defines offsets for each of the element arrays,
and then defines each element array. The inner list holds the definition of
elements in the element array and their offsets. This is just one giant
multiplication table, like we used to make in grade school. Note the
applications of fromIntegral
. The gfMult
operation operates on Word32
values. We did this for simplicity, but we actually only care about Word8
values. To help Haskell with some type inference, we use fromIntegral
to tell
Haskell where we want Word8
values. This allows Haskell to treat i
and j
as Word32
values, which are required for gfMult
. Haskell is a strongly
typed language, and will not perform implicit conversions, unlike C, C++, or
Java.
Now, let’s make an optimization. The old way of multiplying in
required using gfMult
which had to first do long
multiplication and then long division to perform the modulus operation. This
multiplication table has already performed this work for us. So, we can
optimize this function and call it gfMult'
.
To perform our multiplication, we look up the result in the multiplication
table. The (!)
operator is used to select an element at the given offset. We
use the x
offset to select the correct element array for multiples of x
, and
we use the y
offset to select the yth multiple. The advantage of this
multiplication is that it runs in constant time, unlike the original
multiplication operation.
Now, let’s find the multiplicative inverse for each element. First, let’s double-check our definitions. There are two special elements for multiplication. The first element is 0. Anything multiplied by 0 is 0. The second is 1. This element is known as the multiplicative identity. Anything multiplied by 1 is itself. Both of these special elements are immune to this inverse operation. The multiplicative inverse is an element that, when multiplied by the original element, results in the multiplicative identity, the element 1. So, 0 is exempt because it does not have an inverse. There is no number that we can multiply 0 by to get 1. The identity element 1 is exempt because it is its own identity. 1 · 1 = 1.
For simplicity, the first two elements in our inverse list will be 0 and 1. This isn’t exactly accurate, but for our use, it should be fine. Now, for each offset in the multiplication table, we are going to find the offset that holds the element 1. This offset becomes the multiplicative inverse for the given offset.
This list is pretty straight-forward. We start with the 0 and 1 elements. As
above, these remain constant. Then, we build a list comprehension in which we
find the value j
that, when multiplied by i
in
results in
the identity element 1. We know that our multiplication table already contains
all possible elements (otherwise, we’ve introduced a defect), so we simply
iterate through this table to find the correct value.
Now we can build the inverse multiplication table. This is the “division”
table, which allows us to divide the first offset by the second offset. All we
need to do is re-order the multArray
using this inverseArray
.
This divArray
allows us to define, for the first time, a complete division
operator for .
As before, this operation runs in constant time. Because of the overhead of
computing inverseArray
, the first division will take a long time. Haskell is
a lazy computing language. It does not perform computations until it is forced
to. Subsequent divisions will run in constant time.
For completion sake, we will also define gfAdd'
which also operates on 8-bit
words.
This allows us to operate within using 8-bit words and table based operations. In fact, we can now perform all four operations: addition, subtraction (the same as addition), multiplication, and division. Our Galois field is complete and ready for AES.
Low Level Operations
Before continuing with building up the individual steps in the cipher, we need
to define some low-level conversions. AES is a block cipher. This means that
it operates on a fixed size block of input data at a time. The block size for
AES is 128 bits, which is four 32-bit words or 16 8-bit bytes. We start by
defining an octets
function which converts a 32-bit word into 8-bit bytes.
Let’s decompress this function. Based on the signature, we can see that we are
converting a Word32
into a list of Word8
values. The definition of this
function uses repeated fromIntegral
calls and shiftR
calls to create a list
of Word8
bytes. The shiftR
method takes two arguments: an integer to shift,
and the number of positions that it should be bit shifted to the right. The
back ticks around the shiftR
function cause it to be treated as an infix
operator (think 1 + 1). The application here of fromIntegral
might seem like
a bit of magic. Haskell uses type inference to know that each element of the
list must be of type Word8
, while the result type of the shift is a Word32
.
Hence, it chooses an application of the generic fromIntegral
method that
converts from Word32
to Word8
.
To convert to a 32-bit word from a list of octets, we define fromOctets
.
Here, we use a left fold to repeatedly shift left and or
in the next byte. As
is obvious from both the function above and this function, we are working in
Big Endian mode.
First, let’s explain foldl
. The type signature of foldl
is as follows:
This function takes three arguments. The first argument is a function that
creates an a
result from two arguments, an a
and a b
. The second argument
is an initial a
value. The third argument is a list of b
arguments.
foldl
produces a result of a
that is created by “folding” the list through
repeated applications of the function. The first application takes the a
argument as provided and the first b
value from the list. This results in a
new a
value, which is used for the next invocation of the function along with
the next b
value from the list. Because it takes values from left to right in
the list, it is a left fold. There is also a foldr
version that takes values
from right to left.
Getting back to fromOctets
, we can see that the function passed to foldl
shifts the current result left by 8 bits and then performs a bitwise-or to fold
in the next byte. Here, fromIntegral
converts 8-bit Word8
values to 32-bit
Word32
values. Folding this function effectively undoes what octets
does,
as we would expect.
Plaintext and Ciphertext
We define Plaintext
and Ciphertext
as newtype
wrappers around [Word8]
arrays so we can build functions that cleanly operate on either plaintext or
ciphertext data.
This allows us to write function signatures that clearly define what the operations will be doing.
Key
We define a Key
type that is used for storing the three distinct key types
supported by AES: 128 bits, 192 bits, and 256 bits. First, we create a RawKey
that is just a thin newtype
wrapper around [Word32]
.
Now we can define a Key type which supports one of the three NIST approved key sizes.
Now in places where we need to treat the key type specially, we can do so by using pattern matching. In places where we just need to get access to the raw key, we provide the following conversion function:
The inverse allows us to map a raw key to the appropriate key type.
Here, 4 words correspond to 128 bits, 6 words correspond to 192 bits, and 8 words correspond to 256 bits.
Finally, we will need to consider a KeySchedule
as a special type. We create
a newtype
wrapper around [Word32]
to make our intentions clear when we
manipulate key schedules.
We define a default key schedule that does nothing. It is an infinite list of zeros.
Now we can build functions that make a little more sense from their argument types.
State Array
As per the AES specification, we can now create a state array. This is an array
of four columns representing the current 128-bit state of the cipher. We
express this array as four 32-bit words, named w0
, w1
, w2
, and w3
. We
also include a key schedule array, which will be explained later. Here is the
data type.
We need a function that loads the state array with input bytes. The
plaintextToState
method performs this operation using the low-level routines
provided above.
take
and drop
create new lists with the number of taken elements or the
first N elements removed, respectively.
The stateToCiphertext
method performs the opposite operation. Here, the ++
operator performs list concatenation. This function is how we get bytes back
out of our State
type when we are done with the block cipher.
This allows us to focus on writing operations in this cipher as functions
that map values in our State
space type onto itself.
AES Operations
There are four operations that we perform during AES rounds. In most of these
rounds, we perform all four of these operations. In the last round, we perform
only three. These operations are SubBytes
, ShiftRows
, MixColumns
, and
AddRoundKey
. To build up each of these operations, we will need to build upon
the math and low-level operations we have defined so far to apply these towards
each of these higher level operations.
SubBytes
The SubBytes
transformation is a non-linear byte substitution that is
performed on each byte in the State
. The byte is substituted with the
multiplicative inverse and then the following affine transformation is applied
to each bit of the byte (from the NIST document):
bi’ = bi + b(i+4) mod 8 + b(i+5) mod 8 + b(i+6) mod 8 + b(i+7) mod 8 + ci
Here, c
is the constant {63}. This logic is very easy to implement in code.
We can build up this substitution by mapping the inverseArray
with the
transformation as defined in the specification. We create a subBytesArray
that we can use to look up the complete substitution.
To simplify things, we create a subByte
function which performs the
substitution on a single byte.
With the lookup in the array, this is a constant time operation.
Next, we build a subWord
helper function that performs the subByte
operation
on each byte in a given Word32
.
Here, we split the Word32
value into a list of Word8
octets, map the
subByte
operation over this list, and re-combine the resultant map back into a
single Word32
value.
Now, we can build the subBytes
routine, which operates in the State
space.
This routine builds a new State
value in which each word in the state has been
replaced by the subWord
of the original word. With that, we have one
operation for each round of AES done.
ShiftRows
In the ShiftRows
operation, we shift the rows of the state. Keep in mind that
each word in the state is a column. So, in this operation, we are exchanging
bytes between columns to diffuse data. The first row is left unchanged. The
second row is rotated to the left by 1. The third is rotated to the left by 2.
The fourth is rotated to the left by 3.
Here, we use octets
to expand each column, and then use pattern matching in
the shift
helper function to grab each byte. From there, we can perform the
row level shifting for each column, and then recombine these into Word32
values using fromOctets
. Compared to SubBytes
, ShiftRows
is relatively
easy. In fact, it’s the simplest of the four steps.
MixColumns
The MixColumns
operation is where things get interesting. This is the core
diffusion operation in AES. The code itself is simple, but understanding what
the code is doing requires some explanation. The ShiftRows
operation shifted
bytes between columns to diffuse data within the block. The MixColumns
operation will perform some multiplications of bytes within each column over
multiplying them with a special polynomial , modulo the polynomial . I will explain
why this polynomial is special in a bit. If we treat each byte in a column,
say, s0, s1, s2, and s3 as a polynomial as well, then we get . Now we can derive the math necessary to build the
MixColumns
algorithm. This will be the second longest derivation in the
article, so bear with me. The math is actually quite beautiful, which is why
I’m showing it.
Let’s expand this operation by multiplying through:
At this point, we have a sixth order polynomial, which is three orders too large. To resolve this, we take the remainder of division by . On each line, we will “subtract” a multiple of this reducing polynomial and simplify the result on the next line. Keep in mind that addition and subtraction are the same thing. We use the minus signs just to show the multiple during each step of our modulo reduction.
This gives us our final polynomial. As the reduction modulo occurs,
note that terms missing for each factor are added back in. This is a beautiful
application of a reducing polynomial. Based on this math, we know exactly which
values should be xor
ed into each byte for the MixColumns
operation.
As per the reduction above, we will perform the following operations on each byte in the column. I have re-arranged the order of the bytes to match what is found in the NIST documentation. We can see that our derivation matches the implementation documentation exactly. If you read the above equation from bottom up, right to left, and these sets of equations from top down, left to right, the coefficients are the same.
We shall begin by building four simple Haskell functions that reproduce the
result for each byte in MixColumns
.
Now, we can put all of this together into a mixColumns
function.
At this point, we can perform three of the four steps in an AES round. We’ve
almost derived everything we need to understand how MixColumns
works. But, there is still something we haven’t delved into. What was that
polynomial,
about?
In order to diffuse data between columns, a linear operation was required in AES. The operation needs to “mix” things up, but it needs to do so in a way that can be reversed during decryption. The polynomial used for the mix operation is a different representation of an invertible matrix in . The operation on the bytes in the column can also be represented by a square matrix. We can form a circulant matrix from this polynomial. The circulant matrix is as follows:
Square matrices are either singular or invertible. The way to tell the
difference is to take the determinant of the matrix. If a square matrix has a
non-zero determinant, then it also has an inverse. We can use that inverse to
derive the polynomial which undoes the MixColumns
operation. Let’s take the
determinant of this matrix. Again, addition and subtraction are just customary
for this operation, representing the parity of cofactors. They still mean the
same thing in this field, and will be applied as xor
s.
Therefore, this matrix is invertible. We can find the inverse through Gauss-Jordan elimination. We’ll start by creating an augmented matrix, which will allow us to keep track of all of the operations necessary to invert our circulant matrix. When we are done, we will have the identity matrix augmented with the inverse of our matrix.
First, we multiply each row by the inverse necessary to turn its pivot column
into a unit value. This inverse is computed in
using the
inverseArray
we created earlier. Inverting these pivot columns ahead of time
greatly simplifies the elimination computations below.
Next, we subtract from and , and from . goes along for the ride.
Now we normalize the pivot for . All other rows go along for the ride.
Next, we cancel out in each of the other three rows.
Next, we cancel out in each of the other three rows.
Then, we normalize . If everything goes right, we should see the coefficients of our new polynomial appear in ’s augmented row.
Now, we just propagate this change through the rest of the matrix to get our inverse matrix.
This is the matrix that undoes MixColumns
when we decrypt:
The designers of AES were careful to select a square matrix that was invertible
over this field. Repeating this inversion operation may have seemed tedious,
but I wanted to demonstrate the math behind the construction of this step.
It is a very interesting application of matrix math, and it shows how we can
build upon known algebraic techniques in more exotic fields.
As a side-note, there are
other possible polynomials that could be used for mixing columns. These
polynomials would be incompatible with AES, but as long as one selects an
invertible polynomial and derives the inverse, one could perform this
substitution. It’s difficult to predict the effects that such a change would
have on the security of the cipher, but it’s a possible avenue of exploration
for the adventurous. The math that I have demonstrated here could be used to
derive a different set of polynomials for the MixColumns
step.
Simple pattern matching between the two matrices will make the inverse
MixColumns
polynomial for AES obvious.
It’s
.
I will leave the derivation
of the inverse MixColumns
operation as an exercise for the reader. It should
be a simple application of the steps that I went over from the start of this
section until the last Haskell function derived above. Now, let’s get back to
encryption.
Key Expansion
Before we can delve into AddRoundKey
, we will need to build up some code that
can perform AES key expansion. During each AddRoundKey
call, the current key
schedule for a given round is added into the current AES state. Without the
key, AES is simply a poor hash algorithm. It is the key which adds “confusion”
to the “diffusion”.
We can think of the key schedule as an infinite list that starts with the raw bytes of the key, and continues with values that have been expanded through an algorithm meant to mix these bits for additional rounds of the AES cipher. We will build this function from the ground up by building components necessary for the key expansion process. We can then easily compose the full key expansion function.
We re-use the subWord
function that we created in the SubBytes
step. We
also create a rotWord
function that rotates all of the bytes in a given word
once to the left.
Next, we need to define the Rcon
list. This is a list of words from 1 to N
where each word is made up of the following octets:
. Here, is defined as
. The following Haskell function captures this list.
Haskell already has a high-level function, called iterate
, which we can use to
generate the Rcon
list. The first argument to this function is a function
that should be applied during each iteration. The second argument is the value
to use for the first iteration. iterate
performs the following piece-wise
function:
Here, a0 refers to the second argument of iterate
. Based on our
use, this means that Rcon
will iterate through the powers of over
.
This is defined as an infinite list, but we know by the
properties of
that the sequence will repeat every 256 entries.
Before we build the key expansion operation, we need some helper types that will
help to keep our intentions clear as we define various functions. We need
KeyLength
, Index
, and KeyWord
, which correspond to the length
of our key in words, the index into the key schedule, and a word in this key
schedule, respectively.
The key expansion operation is similar for 128, 192, and 256 bit keys. The
first N words is the key itself. Afterward, a sequence of N words is generated
from these words by xor
ing a previous word in the sequence with the previous
value. Every N words, this temporary value is rotated and inverted. In the
case of a 256-bit key, every N/2 words, the temp value is inverted. Now we can
define the key expansion list. The following function performs expansion of a
key defined as a list of Word32
values. We will create a generic
parameterized function which expands the key based on parameters we provide, and
then we will define a concrete key expansion function for each of the 3
NIST-approved key lengths.
We can compose the expansion substitution function from some small building
blocks. First, we have the basic xor
with the previous value step that
happens during the non-special steps. Here, we just re-use the generic gfAdd
function.
The second building block is the major rotation / substitution operation that happens every N words.
Finally, for 256-bit keys, we provide subTemp
which performs a subWord
on
temp prior to the xorWords
operation.
Now, we can put this all together into a key expansion operation for AES-128.
The key expansion operation for AES-192 is exactly the same operation.
Finally, we can compose the AES-128 key expansion operation plus an additional
if
/then
/else
block to build the AES-256 substitution operation.
Now, we have substitution functions for each of the key lengths. Let’s define the key expansion operations for each cipher.
This function selects the correct key expansion operation based on the length of
the key. Note that we did not define a general case nistKeyExpand
function.
A runtime error will occur if we attempt to use a key size that has not been
defined by NIST.
AddRoundKey
The last step that we have to define is the AddRoundKey
step. This uses the
key schedule that we developed in the last section. We need to first create a
function that adds a given key schedule to the AES state.
This function takes an AES key and a State
as input, and returns a new State
with that AES key set. We will assume that this step has been done at the
point of performing the AddRoundKey
step.
Now, the addRoundKey
function is easy to define.
Each time addRoundKey
is called, it adds in four words from the key schedule,
and adjusts the key schedule for the next round. This is returned as a State
.
AES Encryption
Before we can write the AES encryption function, we need a function that returns the number of rounds required for a given key. We define it here.
At this point, we have all of the primitive operations and steps defined to perform AES block encryption. The complete operation can be constructed by composing these steps into a recursive function. This set of steps mirrors the pseudocode provided in the NIST documentation, but is written in a functional way. This has the advantage of making it very easy for us to unroll our implementation step-by-step for testing purposes. Also, because we avoid side-effects, we significantly lower the amount of possible defects that are in this implementation as compared to building an equivalent implementation based on the NIST pseudocode.
There you have it. This function will perform AES encryption, and will successfully pass the test vectors defined in the NIST document. I will expand upon this basic function in my next article, where I will use it to build a parametric pseudo-random number generator.