_                                                _
            _/B\_                                            _/W\_
            (* *)             Phrack #64 file 10             (* *)
            | - |                                            | - |
            |   |          Cryptanalysis of DPA-128          |   |
            |   |                                            |   |
            |   |                  By SysK                   |   |
            |   |                                            |   |
            |   |            syskall@phreaker.net            |   |
            (____________________________________________________)


--[ Contents

1 - Introduction
2 - A short word about block ciphers
3 - Overview of block cipher cryptanalysis
4 - Veins' DPA-128
  4.1 - Bugs in the implementation
  4.2 - Weaknesses in the design
5 - Breaking the linearized version
6 - On the non linearity of addition modulo n in GF(2)
7 - Exploiting weak keys
  7.1 - Playing with a toy cipher
  7.2 - Generalization and expected complexity
  7.3 - Cardinality of |W
8 - Breaking DPA based unkeyed hash function
  8.1 - Introduction to hash functions
  8.2 - DPAsum() algorithm
  8.3 - Weaknesses in the design/implementation
  8.4 - A (2nd) preimage attack
9 - Conclusion
10 - Greetings
11 - Bibliography


--[ 1 - Introduction

While the cracking scene has grown with cryptology thanks to the evolution
of binary protection schemes, the hacking scene mostly hasn't. This fact
is greatly justified by the fact that there were globally no real need. 
Indeed it's well known that if a hacker needs to decrypt some files then 
he will hack into the box of its owner, backdoor the system and then use
it to steal the key. A cracker who needs to break a protection scheme will
not have the same approach: he will usually try to understand it fully in
order to find and exploit design and/or implementation flaws. 

Although the growing of the security industry those last years changed a 
little bit the situation regarding the hacking community, nowadays there 
are still too many people with weak knowledge of this science. What is 
disturbing is the broadcast of urban legends and other hoax by some
paranoids among them. For example, haven't you ever heard people claiming
that government agencies were able to break RSA or AES? A much more clever
question would have been: what does "break" mean? 

A good example of paranoid reaction can be found in M1lt0n's article
[FakeP63]. The author who is probably skilled in hacking promotes the use
of "home made cryptographic algorithms" instead of standardized ones such
as 3DES. The corresponding argument is that since most so-called security
experts lake coding skills then they aren't able to develop appropriate
tools for exotic ciphers. While I agree at least partially with him
regarding the coding abilities, I can't possibly agree with the main
thesis. Indeed if some public tools are sufficient to break a 3DES based
protection then it means that a design and/or an implementation mistake
was/were made since, according to the state of the art, 3DES is still
unbroken. The cryptosystem was weak from the beginning and using "home 
made cryptography" would only weaken it more. 

It is therefore extremely important to understand cryptography and to 
trust the standards. In a previous Phrack issue (Phrack 62), Veins exposed
to the hacking community a "home made" block cipher called DPA (Dynamic 
Polyalphabetic Algorithms) [DPA128]. In the following paper, we are going 
to analyze this cipher and demonstrate that it is not flawless - at least
from a cryptanalytic perspective - thus fitting perfectly with our talk.


--[ 2 - A short word about block ciphers

Let's quote a little bit the excellent HAC [MenVan]:

"A block cipher is a function which maps n-bit plaintext blocks to n-bit 
ciphertext blocks; n is called the blocklength. It may be viewed as a 
simple substitution cipher with large character size. The function is 
parametrized by a k-bit key K, taking values from a subset |K (the key 
space) of the set of all k-bit vectors Vk. It is generally assumed that
the key is chosen at random. Use of plaintext and ciphertext blocks of
equal size avoids data expansion." 

Pretty clear isn't it? :> So what's the purpose of such a cryptosystem? 
Obviously since we are dealing with encryption this class of algorithms
provides confidentiality. Its construction makes it particularly suitable
for applications such as large volumes encryption (files or HD for 
example). Used in special modes such as CBC (like in OpenSSL) then it can 
also provide stream encryption. For example, we use AES-CBC in the WPA2, 
SSL and SSH protocols.

Remark: When used in conjunction with other mechanisms, block ciphers can 
also provide services such as authentication or integrity (cf part 8 of 
the paper).

An important point is the understanding of the cryptology utility. While 
cryptography aims at designing best algorithms that is to say secure and 
fast, cryptanalysis allows the evaluation of the security of those 
algorithms. The more an algorithm is proved to have weaknesses, the less 
we should trust it.


--[ 3 - Overview of block cipher cryptanalysis

The cryptanalysis of block ciphers evolved significantly in the 90s with 
the apparition of some fundamental methods such as the differential 
[BiSha90] and the linear [Matsui92] cryptanalysis. In addition to some 
more recent ones like the boomerang attack of Wagner or the chi square 
cryptanalysis of Vaudenay [Vaud], they constitute the set of so-called
statistical attacks on block ciphers in opposition to the very recent and
still controverted algebraic ones (see [CourtAlg] for more information).

Today the evolution of block cipher cryptanalysis tends to stabilize 
itself. However a cryptographer still has to acquire quite a deep knowledge
of those attacks in order to design a cipher. Reading the Phrack paper, we 
think - actually we may be wrong - that the author mostly based his design 
on statistical tests. Although they are obviously necessary, they can't 
possibly be enough. Every component has to be carefully chosen. We 
identified several weaknesses and think that some more may still be left.


--[ 4 - Veins' DPA-128 description

DPA-128 is a 16 rounds block cipher providing 128 bits block encryption 
using an n bits key. Each round encryption is composed of 3 functions
which are rbytechain(), rbitshift() and S_E(). Thus for each input block,
we apply the E() function 16 times (one per round) :

void E (unsigned char *key, unsigned char *block, unsigned int shift)
{
    rbytechain (block);
    rbitshift (block, shift);
    S_E (key, block, shift);
}

where:

- block is the 128b input
- shift is a 32b parameter dependent of the round subkey
- key is the 128b round subkey

Consequently, the mathematical description of this cipher is:
f: |P x |K ----> |C

where:
    - |P is the set of all plaintexts
    - |K is the set of all keys
    - |C is the set of all ciphertexts

For p element of |P, k of |K and c of |C, we have c = f(p,k)
with f = E°E...E°E = E^16 and ° meaning the composition of functions.

We are now going to describe each function. Since we sometimes may need
mathematics to do so, we will assume that the reader is familiar with
basic algebra ;>

rbytechain() is described by the following C function:

void rbytechain(unsigned char *block)
{
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        block[i] ^= block[(i + 1) % DPA_BLOCK_SIZE];
    return;
}

where:
    - block is the 128b input
    - DPA_BLOCK_SIZE equals 16

Such an operation on bytes is called linear mixing and its goal is to 
provide the diffusion of information (according to the well known Shannon
theory). Mathematically, it's no more than a linear map between two GF(2) 
vector spaces of dimension 128. Indeed, if U and V are vectors over GF(2) 
representing respectively the input and the output of rbytechain() then 
V = M.U where M is a 128x128 matrix over GF(2) of the linear map where 
coefficients of the matrix are trivial to find. Now let's see rbitshift(). 
Its C version is:

void rbitshift(unsigned char *block, unsigned int shift)
{
    unsigned int i;
    unsigned int div;
    unsigned int mod;
    unsigned int rel;
    unsigned char mask;
    unsigned char remainder;
    unsigned char sblock[DPA_BLOCK_SIZE];
    
    if (shift)
    {
        mask = 0;
        shift %= 128;
        div = shift / 8;
        mod = shift % 8;
        rel = DPA_BLOCK_SIZE - div;
        for (i = 0; i < mod; ++i)
            mask |= (1 << i);

        for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        {
            remainder =
            ((block[(rel + i - 1) % DPA_BLOCK_SIZE]) & mask) << (8 - mod);
            sblock[i] = 
	    ((block[(rel + i) % DPA_BLOCK_SIZE]) >> mod) | remainder;
        }
     }
     memcpy(block, sblock, DPA_BLOCK_SIZE);
}

 where:
    - block is the 128b input
    - DPA_BLOCK_SIZE equals 16
    - shift is derived from the round subkey

Veins describes it in his paper as a key-related shifting (in fact it has
to be a key-related 'rotation' since we intend to be able to decrypt the 
ciphertext ;)). A careful read of the code and several tests confirmed that
it was not erroneous (up to a bug detailed later in this paper), so we can 
describe it as a linear map between two GF(2) vector spaces of dimension 128. 

Indeed, if V and W are vectors over GF(2) representing respectively the 
input and the output of rbitshift() then:

W = M'.V where M' is the 128x128 matrix over GF(2) of the linear 
map where, unlike the previous function, coefficients of the matrix are 
unknown up to a probability of 1/128 per round.

Such a function also provides diffusion of information.

Finally, the last operation S_E() is described by the C code:

void S_E (unsigned char *key, unsigned char *block, unsigned int s)
{
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        block[i] = (key[i] + block[i] + s) % 256;
    return;
}

where:
    - block is the 128b input
    - DPA_BLOCK_SIZE equals 16
    - s is the shift parameter described in the previous function
    - key is the round subkey

The main idea of veins' paper is the so-called "polyalphabetic substitution"
concept, whose implementation is supposed to be the S_E() C function. 
Reading the code, it appears to be no more than a key mixing function over 
GF(2^8). 

Remark: We shall see later the importance of the mathematical operation
know as 'addition' over GF(2^8). Regarding the key scheduling, each cipher 
round makes use of a 128b subkey as well as of a 32b one deriving from it
called "shift". The following pseudo code describes this operation:

    skey(0) = checksum128(master_key)
    for i = 0, nbr_round-2:
        skey(i+1) = checksum128(skey(i))
    skey(0) = skey(15)
    for i = 0, nbr_round-1:
        shift(nbr_round-1 - i) = hash32(skey(i))

where skey(i) is the i'th subkey.

It is not necessary to explicit the checksum128() and hash32(), the reader
just has to remind this thing: whatever the weakness there may be in those
functions, we will now consider them being true oneway hash functions 
providing perfect entropy.

As a conclusion, the studied cipher is closed to being a SPN (Substitution
- Permutation Network) which is a very generic and well known construction 
(AES is one for example).


--[ 4.1 - Bugs in the implementation

Although veins himself honestly recognizes that the cipher may be weak and
"strongly discourages its use" to quote him [DPA128], some people could 
nevertheless decide to use it as a primitive for encryption of personal 
and/or sensitive data as an alternative to 'already-cracked-by-NSA' 
ciphers [NSA2007]. Unfortunately for those theoretical people, we were able
to identify a bug leading to a potentially incorrect functioning of the 
cryptosystem (with a non negligible probability). 

We saw earlier that the bitshift code skeleton was the following:

/* bitshift.c */
void {r,l}bitshift(unsigned char *block, unsigned int shift)
{
    [...] // SysK : local vars declaration
    unsigned char sblock[DPA_BLOCK_SIZE];
    if (shift)
    {
        [...] // SysK : sblock initialization
    }
    memcpy(block, sblock, DPA_BLOCK_SIZE);
}

Clearly, if 'shift' is 0 then 'block' is fed with stack content! Obviously
in such a case the cryptosystem can't possibly work. 

Since shift is an integer, such an event occurs with at least a theoretical
probability of 1/2^32 per round.

Now let's study the shift generation function:

/* hash32.c */
/*
* This function computes a 32 bits output out a variable length input. It is
* not important to have a nice distribution and low collisions as it is used
* on the output of checksum128() (see checksum128.c). There is a requirement
* though, the function should not consider \0 as a key terminator.
*/

unsigned long hash32(unsigned char *k, unsigned int length)
{
    unsigned long h;
    for (h = 0; *k && length; ++k, --length)
    h = 13 * h + *k;
    return (h);
}

As stated in the C code commentary, hash32() is the function which produces
the shift. Although the author is careful and admits that the output 
distribution may not be completely uniform (not exactly equal probability
for each byte value to appear) it is obvious that a strong bias is not
desirable (Cf 7.3). 

However what happens if the first byte pointed by k is 0 ? Since the loop
ends for k equal to 0, then h will be equal to 13 * 0 + 0 = 0. Assuming 
that the underlying subkey is truly random, such an event should occur with
a probability of 1/256 (instead of 1/2^32). Since the output of hash32() is
an integer as stated in the comment, this is clearly a bug. 

We could be tempted to think that this implementation failure leads to a 
weakness but a short look at the code tells us that:

struct s_dpa_sub_key {
    unsigned char key[DPA_KEY_SIZE];
    unsigned char shift;
};

typedef struct s_dpa_sub_key DPA_SUB_KEY;

Therefore since shift is a char object, the presence of "*k &&" in the code
doesn't change the fact that the cryptosystem will fail with a probability 
of 1/256 per round.

Since the bug may appear independently in each round, the probability of 
failure is even greater:

p("fail") = 1 - p("ok")
          = 1 - Mul( p("ok in round i") )
          = 1 - (255/256)^16
	  = 0.0607...

where i is element of [0, (nbr_rounds - 1)]
It's not too far from 1/16 :-)

Remark: We shall see later that the special case where shift is equal to 0 
is part of a general class of weak keys potentially allowing an attacker to 
break the cryptosystem.

Hunting weaknesses and bugs in the implementation of cryptographic 
primitives is the common job of some reverse engineers since it sometimes 
allows to break implementations of algorithms which are believed to be 
theoretically secure. While those flaws mostly concern asymmetric 
primitives of digital signature or key negotiation/generation, it can also
apply in some very specific cases to the block cipher world.

From now, we will consider the annoying bug in bitshift() fixed.


--[ 4.2 - Weaknesses in the design

When designing a block cipher, a cryptographer has to be very careful about
every details of the algorithm. In the following section, we describe 
several design mistakes and explain why in some cases, it can reduce the 
security of the cipher.

a) We saw earlier that the E() function was applied to each round. However 
such a construction is not perfect regarding the first round. Since 
rbytechain() is a linear mixing operating not involving key material, it 
shouldn't be used as the first operation on the input buffer since its 
effect on it can be completely canceled. Therefore, if a cryptanalyst wants
to attack the bitshift() component of the first round, he just have to 
apply lbytechain() (the rbytechain() inverse function) to the input vector.
It would thus have been a good idea to put a key mixing as the first 
operation.

b) The rbitshift() operation only need the 7 first bits of the shift 
character whereas the S_E() uses all of them. It is also generally 
considered a bad idea to use the same key material for several operations.

c) If for some reason, the attacker is able to leak the second (not the 
first) subkey then it implies the compromising of all the key material. Of
course the master key will remain unknown because of the onewayness of 
checksum128() however we do not need to recover it in order to encrypt
and/or decrypt datas.

d) In the bitshift() function, a loop is particularly interesting:

for (i = 0; i < mod; ++i)
    mask |= (1 << i);

What is interesting is that the time execution of the loop is dependent of
"mod" which is derived from the shift. Therefore we conclude that this loop
probably allows a side channel attack against the cipher. Thanks to X for 
having pointed this out ;> In the computer security area, it's well known 
that a single tiny mistake can lead to the total compromising of an 
information system. In cryptography, the same rules apply.


--[ 5 - Breaking the linearized version

Even if we regret the non justification of addition operation employment,
it is not the worst choice in itself. What would have happen if the key 
mixing had been done with a xor operation over GF(2^8) instead as it is the
case in DES or AES for example?

To measure the importance of algebraic consideration in the security of a 
block cipher, let's play a little bit with a linearized version of the 
cipher. That is to say that we replace the S_E() function with the 
following S_E2() where :

void S_E2 (unsigned char *key, unsigned char *block, unsigned int s)
{
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        block[i] = (key[i] ^ block[i] ^ s) % 256; [1] 
	// + is replaced by xor
    return;
}

If X, Y and K are vectors over GF(2^8) representing respectively the input,
the output of S_E2() and the round key material then Y = X xor K.

Remark: K = sK xor shift. We use K for simplification purpose.

Now considering the full round we have :

V = M.U         [a] (rbytechain)
W = M'.V        [b] (rbitshift)
Y = W xor K     [c] (S_E2)

Linear algebra allows the composition of applications rbytechain() and 
rbitshift() since the dimensions of M and M' match but W in [b] is a vector
over GF(2) whereas W in [c] is clearly over GF(2^8). However, due to the 
use of XOR in [c], Y, W and K can also be seen as vectors over GF(2). 
Therefore, S_E2() is a GF(2) affine map between two vector spaces of
dimension 128.

We then have:

Y = M'.M.U xor K

The use of differential cryptanalysis will help us to get rid of the key. 
Let's consider couples (U0,Y0 = E(U0)) and (U1,Y1 = E(U1)) then:

DELTA(Y) = Y0 xor Y1
         = (M'.M.U0 xor K) xor (M'.M.U1 xor K)
         = (M'.M.U0 xor M'.M.U1) xor K xor K     (commutativity & 
	                                          associativity of xor)
         = (M'.M).(U0 xor U1)                     (distributivity)
         = (M'.M).DELTA(U)

Such a result shows us that whatever sK and shift are, there is always a
linear map linking an input differential to the corresponding output 
differential. 

The generalization to the 16 rounds using matrix multiplication is obvious. 
Therefore we have proved that there exists a 128x128 matrix Mf over GF(2) 
such as DELTA(Y) = Mf.DELTA(X) for the linearized version of the cipher.

Then assuming we know one couple (U0,Y0) and Mf, we can encrypt any input U.
Indeed, Y xor Y0 = Mf.(U xor U0) therefore Y = (Mf.(U xor U0)) xor Y0.

Remark 1: The attack doesn't give us the knowledge of subkeys and shifts 
but such a thing is useless. The goal of an attacker is not the key in 
itself but rather the ability of encrypting/decrypting a set of 
plaintexts/ciphertexts. Furthermore, considering the key scheduling 
operation, if we really needed to recover the master key, it would be quite
a pain in the ass considering the fact that checksum128() is a one way 
function ;-)

Remark 2: Obviously in order to decrypt any output Y we need to calculate 
Mf^-1 which is the inverse matrix of Mf. This is somewhat more interesting
isn't it ? :-)

Because of rbitshift(), we are unable to determine using matrix 
multiplications the coefficients of Mf. An exhaustive search is of course 
impossible because of the huge complexity (2^16384) however, finding them 
is equivalent to solving 128 systems (1 system per row of Mf) of 128 
variables (1 variable per column) in GF(2). To build such a system, we need
128 couples of (cleartext,ciphertext). The described attack was implemented
using the nice NTL library ([SHOUP]) and can be found in annexe A of this
paper.

$ g++ break_linear.cpp bitshift.o bytechain.o key.c hash32.o checksum128.o 
-o break_linear -lntl -lcrypto -I include
$ ./break_linear
[+] Generating the plaintexts / ciphertexts
[+] NTL stuff !
[+] Calculation of Mf
[+] Let's make a test !
[+] Well done boy :>

Remark: Sometimes NTL detects a linear relation between chosen inputs 
(DELTA_X) and will then refuse to work. Indeed, in order to solve the 128 
systems, we need a situation where every equations are independent. If it's
not the case, then obviously det(M) is equal to 0 (with probability 1/2). 
Since inputs are randomly generated, just try again until it works :-)

$ ./break_linear
[+] Generating the plaintexts / ciphertexts
[+] NTL stuff !
det(M) = 0

As a conclusion we saw that the linearity over GF(2) of the xor operation 
allowed us to write an affine relation between two elements of GF(2)^128 in
the S_E2() function and then to easily break the linearized version using a
128 known plaintext attack. The use of non linearity is crucial in the 
design. Fortunately for DPA-128, Veins chose the addition modulo 256 as the
key mixer which is naturally non linear over GF(2).


--[ 6 - On the non linearity of addition modulo n over GF(2)

The bitshift() and bytechain() functions can be described using matrix over 
GF(2) therefore it is interesting to use this field for algebraic 
calculations.

The difference between addition and xor laws in GF(2^n) lies in the carry 
propagation:

w(i) + k(i) = w(i) xor k(i) xor carry(i)
where w(i), k(i) and carry(i) are elements of GF(2). 

We note w(i) as the i'th bit of w and will keep this notation until the end. 
carry(i), written c(i) for simplification purpose, is defined recursively:

c(i+1) = w(i).k(i) xor w(i).c(i) xor k(i).c(i)
with c(0) = 0

Using this notation, it would thus be possible to determine a set of 
relations over GF(2) between input/output bits which the attacker controls
using a known plaintext attack and the subkey bits (which the attacker
tries to guess).

However, recovering the subkey bits won't be that easy. Indeed, to determine
them, we need to get rid of the carries replacing them by multivariate 
polynomials were unknowns are monomials of huge order.

Remark 1: Because of the recursivity of the carry, the order of monomials 
grows up as the number of input bits per round as well as the number of 
rounds increases.

Remark 2: Obviously we can not use intermediary input/output bits in our
equations. This is because unlike the subkey bits, they are dependent of the
input.

We are thus able to express the cryptosystem as a multivariate polynomial 
system over GF(2). Solving such a system is NP-hard. There exists methods
for system of reasonable order like groebner basis and relinearization 
techniques but the order of this system seems to be far too huge.

However for a particular set of keys, the so-called weak keys, it is 
possible to determine the subkeys quite easily getting rid of the complexity
introduced by the carry.


--[ 7 - Exploiting weak keys

Let's first define a weak key. According to wikipedia:

"In cryptography, a weak key is a key which when used with a specific 
cipher, makes the cipher behave in some undesirable way. Weak keys usually
represent a very small fraction of the overall keyspace, which usually 
means that if one generates a random key to encrypt a message weak keys are
very unlikely to give rise to a security problem. Nevertheless, it is 
considered desirable for a cipher to have no weak keys."

Actually we identified a particular subset |W of |K allowing us to deal 
quite easily with the carry problem. A key "k" is part of |W if and only if
for each round the shift parameter is a multiple of 8. The reader should 
understand why later.

We will first present the attack on a reduced version of DPA for simplicity
purpose and generalize it later to the full version.


--[ 7.1 - Playing with a toy cipher

Our toy cipher is a 2 rounds DPA. Moreover, the cipher takes as input 4*8
bits instead of 16*8 = 128 bits which means that DPA_BLOCK_SIZE = 4. We 
also make a little modification in bytechain() operation. Let's remember 
the bytechain() function:

void rbytechain(unsigned char *block)
{
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
        block[i] ^= block[(i + 1) % DPA_BLOCK_SIZE];
    return;
}

Since block is both input AND output of the function then we have for 
DPA_BLOCK_SIZE = 4:

    V(0) = U(0) xor U(1)
    V(1) = U(1) xor U(2)
    V(2) = U(2) xor U(3)
    V(3) = U(3) xor V(0) = U(0) xor U(1) xor U(3)

Where V(x) is the x'th byte element.

Thus with our modification:

    V(0) = U(0) xor U(1)
    V(1) = U(1) xor U(2)
    V(2) = U(2) xor U(3)
    V(3) = U(3) xor U(0)

Regarding the mathematical notation (pay your ascii !@#):

    - U,V,W,Y vector notation of section 5 remains.
    - Xj(i) is the i'th bit of vector Xj where j is j'th round.
    - U0 vector is equivalent to P where P is a plaintext.
    - m is the shift of round 0
    - n is the shift of round 1
    - xor will be written '+' since calculation is done in GF(2)
    - All calculation of subscript will be done in the ring ZZ_32

How did we choose |W? Using algebra in GF(2) implies to deal with the carry.
However, if k is a weak key (part of |W), then we can manage the calculation
so that it's not painful anymore.

Let i be the lowest bit of any input byte. Therefore for each i part of the
set {0,8,16,24} we have:

u0(i)      = p(i)
v0(i)      = p(i) + p(i+8)
w0(i+m)    = v0(i)
y0(i)      = w0(i) + k0(i) + C0(i)
y0(i+m)    = w0(i+m) + k0(i+m) + C0(i+m)
y0(i+m)    = p(i) + p(i+8) + k0(i+m) + C0(i+m)            /* carry(0) = 0 */
y0(i+m)    = p(i) + p(i+8) + k0(i+m)

u1(i)      = y0(i)
v1(i)      = y0(i) + y0(i+8)
w1(i+n)    = v1(i)
y1(i)      = w1(i) + k1(i) + C1(i)
y1(i+n)    = w1(i+n) + k1(i+n) + C1(i+n)
y1(i+n)    = y0(i) + y0(i+8) + k1(i+n) + C1(i+n)
y1(i+n+m)  = y0(i+m) + y0(i+m+8) + k1(i+n+m) + C1(i+n+m)  /* carry(0) = 0 */
y1(i+n+m)  = p(i) + p(i+8) + k0(i+m) + p(i+8) + p(i+16) 
           + k0(i+m+8) + k1(i+n+m)
y1(i+n+m)  = p(i) + k0(i+m) + p(i+16) + k0(i+m+8) + k1(i+n+m)

As stated before, i is part of the set {0,8,16,24} so we can write:

y1(n+m)    = p(0)  + k0(m)    + p(16) + k0(m+8)  + k1(n+m)
y1(8+n+m)  = p(8)  + k0(8+m)  + p(24) + k0(m+16) + k1(8+n+m)
y1(16+n+m) = p(16) + k0(16+m) + p(0)  + k0(m+24) + k1(16+n+m)
y1(24+n+m) = p(24) + k0(24+m) + p(8)  + k0(m)    + k1(24+n+m)

In the case of a known plaintext attack, the attacker has the knowledge of
a set of couples (P,Y1). Therefore considering the previous system, the 
lowest bit of K0 and K1 vectors are the unknowns. Here we have a system 
which is clearly underdefined since it is composed of 4 equations and
4*2 unknowns. It will give us the relations between each lowest bit of Y 
and the lowest bits of K0 and K1. 

Remark 1: n,m are unknown. A trivial approach is to determine them which 
costs a complexity of (2^4)^2 = 2^8. Although it may seem a good idea, 
let's recall the reader that we are considering a round reduced cipher! 
Indeed, applying the same idea to the full 16 rounds would cost us 
(2^4)^16 = 2^64! Such a complexity is a pain in the ass even nowadays :-)

A much better approach is to guess (n+m) as it costs 2^4 what ever the 
number of rounds. It gives us the opportunity to write relations between 
some input and output bits. We do not need to know exactly m and n. The 
knowledge of the intermediate variables k0(x+m) and k1(y+n+m) is 
sufficient.

Remark 2: An underdefined system brings several solutions. We are 
thus able to choose arbitrarily 4 variables thus fixing them with values of 
our choice. Of course we have to choose so that we are able to solve the 
system with remaining variables. For example taking k0(m), k0(m+8) and 
k1(n+m) together is not fine because of the first equation. However, fixing
all the k0(x+m) may be a good idea as it automatically gives the k1(y+n+m) 
corresponding ones. 

Now let's go further. Let i be part of the set {1,9,17,25}. We can write:

u0(i)     = p(i)
v0(i)     = p(i) + p(i+8)
w0(i+m)   = v0(i)
y0(i)     = w0(i) + k0(i) + w0(i-1)*k0(i-1)
y0(i+m)   = w0(i+m) + k0(i+m) + w0(i+m-1)*k0(i+m-1)
y0(i+m)   = p(i) + p(i+8) + k0(i+m) + w0(i+m-1)*k0(i+m-1)
y0(i+m)   = p(i) + p(i+8) + k0(i+m) + (p(i-1) + p(i-1+8))*k0(i+m-1)

u1(i)     = y0(i)
v1(i)     = y0(i) + y0(i+8)
w1(i+n)   = v1(i)
y1(i)     = w1(i) + k1(i) + C1(i)
y1(i)     = w1(i) + k1(i) + w1(i-1)*k1(i-1)
y1(i+n)   = w1(i+n) + k1(i+n) + w1(i-1+n)*k1(i-1+n)
y1(i+n)   = y0(i) + y0(i+8) + k1(i+n) + (y0(i-1) + y0(i+8-1)) * k1(i-1+n)

y1(i+n+m) = y0(i+m) + y0(i+m+8) + k1(i+m+n) 
          + (y0(i+m-1) + y0(i+m+8-1)) * k1(i+m+n-1)

y1(i+n+m) = p(i) + p(i+8) + k0(i+m) + (p(i-1) + p(i-1+8)) * k0(i+m-1)
          + p(i+8) + p(i+16) + k0(i+m+8) 
	  + (p(i+8-1) + p(i-1+16)) * k0(i+m-1+8)
	  + k1(i+n+m)
          + k1(i+m+n-1) * [p(i-1) + p(i+8-1) + k0(i+m-1)]
          + k1(i+m+n-1) * [p(i-1+8) + p(i+16-1) + k0(i+m-1+8)]

y1(i+n+m) = p(i) + k0(i+m) + (p(i-1) + p(i-1+8)) * k0(i+m-1)
          + p(i+16) + k0(i+m+8) + (p(i+8-1) + p(i-1+16)) * k0(i+m-1+8)
          + k1(i+n+m)
          + k1(i+m+n-1)*[p(i-1) + k0(i+m-1)]
          + k1(i+m+n-1)*[p(i-1+16) + k0(i+m-1+8)]

Thanks to the previous system resolution, we have the knowledge of 
k0(i+m+n-1+x) and k1(i+m-1+y) variables. Therefore, we can reduce the 
previous equation to: 

A(i) = k0(i+m) + k0(i+m+8) + k1(i+n+m)           (alpha)

where A(i) is a known value for the attacker.

Remark 1: This equation represents the same system as found in case of i 
being the lowest bit! Therefore all previous remarks remain.

Remark 2: If we hadn't have the knowledge of k0(i+m+n-1+x) and k1(i+m-1+y)
bits then the number of variables would have grown seriously. Moreover we 
would have had to deal with some degree 2 monomials :-/.

We can thus conjecture that the equation alpha will remain true for each i 
part of {a,a+8,a+16,a+24} where 0 <= a < 8.


--[ 7.2 - Generalization and expected complexity

Let's deal with the real bytechain() function now.
As stated before and for DPA_BLOCK_SIZE = 4 we have:

V(0) = U(0) xor U(1)
V(1) = U(1) xor U(2)
V(2) = U(2) xor U(3)
V(3) = U(0) xor U(1) xor U(3)

This is clearly troublesome as the last byte V(3) is NOT calculated like 
V(0), V(1) and V(2). Because of the rotations involved, we wont be able to
know when the bit manipulated is part of V(3) or not.

Therefore, we have to use a general formula:

V(i) = U(i) + U(i+1) + a(i).U(i+2)
where a(i) = 1 for i = 24 to 31

For i part of {0,8,16,24} we have:

u0(i)     = p(i)
v0(i)     = p(i) + p(i+8)  + a0(i).p(i+16)
w0(i+m)   = v0(i)
y0(i)     = w0(i) + k0(i)   + C0(i)
y0(i+m)   = w0(i+m) + k0(i+m) + C0(i+m)
y0(i+m)   = p(i) + p(i+8) + a0(i).p(i+16) + k0(i+m) + C0(i+m) /*carry(0) = 0*/
y0(i+m)   = p(i) + p(i+8)  + a0(i).p(i+16) + k0(i+m)

So in the second round:

u1(i)     = y0(i)
v1(i)     = y0(i) + y0(i+8) + a1(i).y0(i+16)
w1(i+n)   = v1(i)
y1(i)     = w1(i) + k1(i) + C1(i)
y1(i+n)   = w1(i+n) + k1(i+n) + C1(i+n)
y1(i+n)   = y0(i) + y0(i+8) + a1(i).y0(i+16) + k1(i+n) + C1(i+n)
y1(i+n+m) = y0(i+m) + y0(i+m+8) + a1(i+m).y0(i+m+16) + k1(i+n+m)

y1(i+n+m) = p(i) + p(i+8) + a0(i).p(i+16) + k0(i+m)
          + p(i+8) + p(i+16) + a0(i).p(i+24) + k0(i+m+8)
          + a1(i+m).[p(i+16) + p(i+24) + a0(i).p(i) + k0(i+m+16)] + k1(i+n+m)

y1(i+n+m) = p(i) + a0(i).p(i+16) + k0(i+m)
          + p(i+16) + a0(i).p(i+24) + k0(i+m+8)
          + a1(i+m).[p(i+16) + p(i+24) + a0(i).p(i) + k0(i+m+16)] + k1(i+n+m)

a0(i) is not a problem since we know it. This is coherent with the fact 
that the first operation of the cipher is rbytechain() which is invertible
for the attacker. However, the problem lies in the a1(i+m) variables.

Guessing a1(i+m) is out of question as it would cost us a complexity of 
(2^4)^15 = 2^60 for the 16 rounds! The solution is to consider a1(i+m) as 
an other set of 4 variables. We can also add the equation to our system:

a1(m) + a1(m+8) + a1(m+16) + a1(m+24) = 1
This equation will remain true for other bits.

So what is the global complexity? Obviously with DPA_BLOCK_SIZE = 16 each 
system is composed of 16+1 equations of 16+1 variables (we fixed the 
others). Therefore, the complexity of the resolution is: 
log(17^3) / log(2) ~ 2^13.

We will solve 8 systems since there are 8 bits per byte. Thus the global 
complexity is around (2^13)*8 = 2^16.

Remark: We didn't take into account the calculation of equation as it is 
assumed to be determined using a formal calculation program such as pari-gp 
or magma.


--[ 7.3 - Cardinality of |W

What is the probability of choosing a weak key? We have seen that our weak
key criterion is that for each round, the rotation parameter needs to be 
multiple of 8. Obviously, it happens with 16 / 128 = 1/8 theoretical 
probability per round. Since we consider subkeys being random, the 
generation of rotation parameters are independent which means that the 
overall probability is (1/16)^16 = 1/2^64.

Although a probability of 1/2^64 means a (huge) set of 2^64 weak keys, in
the real life, there are very few chances to choose one of them. In fact,
you probably have much more chances to win lottery ;) However, two facts 
must be noticed:

    - We presented one set of weak keys but there be some more!
    - We illustrated an other weakness in the conception of DPA-128

Remark: A probability of 1/8 per round is completely theoretic as it 
supposes a uniform distribution of hash32() output. Considering the extreme
simplicity of the hash32() function, it wouldn't be too surprising to be
different in practice. Therefore we made a short test to compute the real 
probability (Annexe B).

$ gcc test.hash32.c checksum128.o hash32.o -o test.hash32 -O3 
-fomit-frame-pointer
$ time ./test.hash32
[+] Probability is 0.125204

real 0m14.654s
user 0m14.649s
sys 0m0.000s

$ gp -q
? (1/0.125204) ^ 16
274226068900783.2739747241633
? log(274226068900783.2739747241633) / log(2)
47.96235905375676878381741198
?

This result tells us clearly that the probability of shift being multiple 
of 8 is around 1/2^2.99 ~ 1/8 per round which is assimilated to the 
theoretical one since the difference is too small to be significant. In 
order to improve the measure, we used checksum128() as an input of 
hash32(). Furthermore, we also tried to test hash32() without the "*k &&" 
bug mentioned earlier. Both tests gave similar results which means that the
bug is not important in practice and that checksum128() doesn't seem to be 
particularly skewed. This is a good point for DPA! :-D


--[ 8 - Breaking DPA-based unkeyed hash function

In his paper, Veins also explains how a hash function can be built out of 
DPA. We will analyze the proposed scheme and will show how to completely 
break it.


--[ 8.1 - Introduction to hash functions

Quoting once again the excellent HAC [MenVan]:
"A hash function is a function h which has, as a minimum, the following two
properties:

1. compression - h maps an input x of arbitrary finit bitlength, to an
output h(x) of fixed bitlength n.
2. ease of computation - given h and an input x, h(x) is easy to compute.

In cryptography there are essentially two families of hash functions:

1. The MAC (Message Authentication Codes). They are keyed ones and provides
both authentication (of source) and integrity of messages.
2. The MDC (Modification Detection Code), sometimes referred as MIC. They 
are unkeyed and only provide integrity. We will focus on this kind of 
functions. When designing his hash function, the cryptographer generally 
wants it to satisfy the three properties:

- preimage resistance. For any y, it should not be possible (that is to say
computationally infeasible) to find an x such as h(x) = y. Such a property
implies that the function has to be non invertible.
- 2nd preimage resistance. For any x, it should not be possible to find an
x' such as h(x) = h(x') when x and x' are different.
- collision resistance. It should not be possible to find an x and an x' 
(with x different of x') such that h(x) = h(x').

Remark 1: Properties 1 and 2 and essentials when dealing with binary 
integrity.

Remark 2: The published attacks on MD5 and SHA-0/SHA-1 were dealing with the
third property. While it is true that finding collisions on a hash function
is enough for the crypto community to consider it insecure (and sometimes
leads to a new standard [NIST2007]), for most of usages it still remains 
sufficient.

There are many way to design an MDC function. Some functions are based on
MD4 function such as MD5 or SHA* functions which heavily rely on boolean 
algebra and operations in GF(2^32), some are based on NP problems such as 
RSA and finally some others are block cipher based.

The third category is particularly interesting since the security of the
hash function can be reduced to the one of the underlying block cipher. 
This is of course only true with a good design.


--[ 8.2 - DPAsum() algorithm

The DPA-based hash function lies in the functions DPA_sum() and 
DPA_sum_write_to_file() which can be found respectively in file sum.c and
data.c.

Let's detail them a little bit using pseudo code:

Let M be the message to hash, let M(i) be the i'th 128b block message.
Let N = DPA_BLOCK_SIZE * i + j be the size in bytes of the message where i
and j are integers such as i = N / DPA_BLOCK_SIZE and 0 <= j < 16.
Let C be an array of 128 bits elements were intermediary results of hash
calculation are stored. The last element of this array is the hash of the
message. 

func DPA_sum(K0,M,C):
    
    K0 = key("deadbeef");
    IV = "0123456789abcdef";
    
    C(0) = E( IV , K0);
    C(1) = E( IV xor M(0) , K0);
    
    FOR a = 1 to i-1:
        C(a+1) = E( C(a) xor M(a) , K0);

    if j == 0:
        C(i+1) = E( C(i) xor 000...000 , K0)
    else
        C(i+1) = E( C(i) xor PAD( M(i) );
        C(i+2) = E( C(i+1) xor 000...00S , K0) /* s = 16-j */
    return;

func DPA_sum_write_to_file(C, file):

    write(file,C(last_element));
    return;


--[ 8.3 - Weaknesses in the design/implementation

We noticed several implementation mistakes in the code:

a) Using the algorithm of hash calculation, every element of array C is 
defined recursively however C(0) is never used in calculation. This doesn't
impact security in itself but is somewhat strange and could let us think 
that the function was not designed before being programmed.

b) When the size of M is not a multiple of DPA_BLOCK_SIZE (j is not equal 
to 0) then the algorithms calculates the last element using a xor mask where
the last byte gives information on the size of the original message. 
However, what is included in the padding is not the size of the message in 
itself but rather the size of padding.

If we take the example of the well known Merkle-Damgard construction on 
which are based MD{4,5} and SHA-{0,1} functions, then the length of the 
message was initially appended in order to prevent collisions attacks for
messages of different sizes. Therefore in the DPASum() case, appending j 
to the message is not sufficient as it would be possible to find collisions
for messages of size (DPA_BLOCK_SIZE*a + j) and (DPA_BLOCK_SIZE*b + j) were
obviously a and b are different.

Remark: The fact that the IV and the master key are initially fixed is not
a problem in itself since we are dealing with MDC here.


--[ 8.4 - A (2nd) preimage attack

Because of the hash function construction properties, being given a 
message X, it is trivial to create a message X' such as h(X) = h(X'). This
is called building a 2nd preimage attack.

We built a quick & dirty program to illustrate it (Annexe C). It takes a 
32 bytes message as input and produces an other 32 bytes one with the same 
hash:

$ cat to.hack | hexdump -C
00000000 58 41 4c 4b 58 43 4c 4b 53 44 4c 46 4b 53 44 46 |XALKXCLKSDLFKSDF|
00000010 58 4c 4b 58 43 4c 4b 53 44 4c 46 4b 53 44 46 0a |XLKXCLKSDLFKSDF.|
00000020
$ ./dpa -s to.hack
6327b5becaab3e5c61a00430e375b734
$ gcc break_hash.c *.o -o break_hash -I ./include
$ ./break_hash to.hack > hacked
$ ./dpa -s hacked
6327b5becaab3e5c61a00430e375b734
$ cat hacked | hexdump -C
00000000 43 4f 4d 50 4c 45 54 45 4c 59 42 52 4f 4b 45 4e |COMPLETELYBROKEN|
00000010 3e bf de 93 d7 17 7e 1d 2a c7 c6 70 66 bb eb a3 |>.....~.*..pf...|
00000020

Nice isn't it ? :-) We were able to write arbitrary data in the first 16 
bytes and then to calculate the next 16 bytes so that the 'hacked' file had
the exact same hash. But how did we do such an evil thing?

Assuming the size of both messages is 32 bytes then:

h(Mi) = E(E(Mi(0) xor IV,K0) xor Mi(1),K0)

Therefore, it is obvious that:

h(M1) = h(M2) is equivalent to
E(E(M1(0) xor IV,K0) xor M1(1),K0) = E(E(M2(0) xor IV,K0) xor M2(1),K0)

Which can be reduced to:
E(M1(0) xor IV,K0) xor M1(1) = E(M2(0) xor IV,K0) xor M2(1)

Which therefore gives us:
M2(1) = E(M2(0) xor IV,K0) xor E(M1(0) xor IV,K0) xor M1(1)  [A]

Since M1,IV,K0 are known parameters then for a chosen M2(0), [A] gives us 
M2(1) so that h(M1) = h(M2).

Remark 1: Actually such a result can be easily generalized to n bytes 
messages. In particular, the attacker can put anything in his message and
"correct it" using the last blocks (if n >= 32).

Remark 2: Of course building a preimage attack is also very easy. We 
mentioned previously that we had for a 32 bytes message:
h(Mi) = E(E(Mi(0) xor IV,K0) xor Mi(1),K0)

Therefore, Mi(1) = E^-1(h(Mi),K0) xor E(Mi(0) xor IV,K0)     [B]

The [B] equation tells us how to generate Mi(1) so that we have h(Mi) in
output. It doesn't seem to be really a one way hash function does it ? ;-)
Building a hash function out of a block cipher is a well known problem in
cryptography which doesn't only involve the security of the underlying 
block cipher. One should rely on one of the many well known and heavily 
analyzed algorithms for this purpose instead of trying to design one.


--[ 9 - Conclusion

We put into evidence some weaknesses of the cipher and were also able to 
totally break the proposed hash function built out of DPA. In his paper, 
Veins implicitly set the bases of a discussion to which we wish to deliver
our opinion. We claim that it is necessary to understand properly 
cryptology. The goal of this paper wasn't to illustrate anything else but 
that fact. Being hacker or not, paranoid or simply careful, the rule is the
same for everybody in this domain: nothing should be done without reflexion.


--[ 10 - Greetings

#TF crypto dudes for friendly and smart discussions and specially X for 
giving me a lot of hints. I learned a lot from you guys :-)
#K40r1 friends for years of fun ;-) Hi all :)
Finally but not least my GF and her kindness which is her prime 
characteristic :> (However if she finds out the joke in the last sentence
I may die :|)


--[ 11 - Bibliography

[DPA128] A Polyalphabetic Substitution Cipher, Veins, Phrack 62.
[FakeP63] Keeping 0day Safe, m1lt0n, Phrack(.nl) 63.
[MenVan] Handbook of Applied Cryptography, Menezes, Oorschot & Vanstone.
[Knud99] Correlation in RC6, L. Knudsen & W. Meier.
[CrypTo] Two balls ownz one, http://fr.wikipedia.org/wiki/Cryptorchidie
[Vaud] An Experiment on DES - Statistical Cryptanalysis, S. Vaudenay.
[Ryabko] Adaptative chi-square test and its application to some 
cryptographic problems, B. Ryabko.
[CourtAlg] How Fast can be Algebraic Attacks on Block Ciphers ?, Courtois.
[BiSha90] Differential Cryptanalysis of DES-like Cryptosystems, E. Biham 
& A. Shamir, Advances in Cryptology - CRYPTO 1990.
[Matsui92] A new method for known plaintext attack of FEAL cipher, Matsui
& A. Yamagishi, EUROCRYPT 1992.
[NSA2007] Just kidding ;-)
[SHOUP] NTL library, V. Shoup, http://www.shoup.net/ntl/
[NIST2007] NIST, http://www.csrc.nist.gov/pki/HashWorkshop/index.html, 2007


--[ Annexe A - Breaking the linearised version

8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
/* Crappy C/C++ source. I'm in a hurry for the paper redaction so don't 
 * blame me toooooo much please ! :> */

#include <iostream>
#include <fstream>
#include <openssl/rc4.h>
#include <NTL/ZZ.h>
#include <NTL/ZZ_p.h>
#include <NTL/mat_GF2.h>
#include <NTL/vec_GF2.h>
#include <NTL/GF2E.h>
#include <NTL/GF2XFactoring.h>
#include "dpa.h"

using namespace NTL;

void
S_E2 (unsigned char *key, unsigned char *block, unsigned int s)
{
    int i;
    for (i = 0; i < DPA_BLOCK_SIZE; ++i)
    {
        block[i] ^= (key[i] ^ s) % 256;
    }
    return;
}

void
E2 (unsigned char *key, unsigned char *block, unsigned int shift)
{
    rbytechain (block);
    rbitshift (block, shift);
    S_E2 (key, block, shift);
}

void 
DPA_ecb_encrypt (DPA_KEY * key, unsigned char * src, unsigned char * dst)
{
    int j;
    memcpy (dst, src, DPA_BLOCK_SIZE);
    for (j = 0; j < 16; j++)
        E2 (key->subkey[j].key, dst, key->subkey[j].shift);
    return;
}

void affichage(unsigned char *chaine)
{
    int i;
    for(i=0; i<16; i++)
        printf("%.2x",(unsigned char )chaine[i]);
    printf("\n");
}

unsigned char test_p[] = "ABCD_ABCD_12____";
unsigned char test_c1[16];
unsigned char test_c2[16];
DPA_KEY key;
RC4_KEY rc4_key;

struct vect {
    unsigned char plaintxt[16];
    unsigned char ciphertxt[16];
};

struct vect toto[128];
unsigned char src1[16], src2[16];
unsigned char block1[16], block2[16];

int main()
{

    /* Key */
    unsigned char str_key[] = " _323DFF?FF4cxsdÃ©&";
    DPA_set_key (&key, str_key, DPA_KEY_SIZE);
    
    /* Init our RANDOM generator */
    char time_key[16];
    snprintf(time_key, 16, "%d%d",(int)time(NULL), (int)time(NULL));
    RC4_set_key(&rc4_key, strlen(time_key), (unsigned char *)time_key);
   
    /* Let's crypt 16 plaintexts */
    printf("[+] Generating the plaintexts / ciphertexts\n");
    
    int i=0;
    int a=0;
    for(; i<128; i++)
    {
         RC4(&rc4_key, 16, src1, src1); // Input is nearly random :)
         DPA_ecb_encrypt (&key, src1, block1);
         RC4(&rc4_key, 16, src2, src2); // Input is nearly random :)
         DPA_ecb_encrypt (&key, src2, block2);
         
	 for(a=0;a<16; a++)
         {
             toto[i].plaintxt[a] = src1[a] ^ src2[a];
             toto[i].ciphertxt[a] = block1[a] ^ block2[a];
         }
    }

    /* Now the NTL stuff */

    printf("[+] NTL stuff !\n");
    vec_GF2 m2(INIT_SIZE,128);
    vec_GF2 B(INIT_SIZE,128);
    mat_GF2 M(INIT_SIZE,128,128);
    mat_GF2 Mf(INIT_SIZE,128,128); // The final matrix !
    clear(Mf);
    clear(M);
    clear(m2);
    clear(B);

    /* Lets fill M correctly */
    
    int k=0;
    int j=0;
    for(k=0; k<128; k++) // each row !
    {
        for(i=0; i<16; i++)
        {
            for(j=0; j<8; j++)
                M.put(i*8+j,k,(toto[k].plaintxt[i] >> j)&0x1);
        }
    }

    GF2 d;
    determinant(d,M);

    /* if !det then it means the vector were linearly linked :'( */
   
   if(IsZero(d))
   {
       std::cout << "det(M) = 0\n" ;
       exit(1);
   }

   /* Let's solve the 128 system :) */
   
    printf("[+] Calculation of Mf\n");
    for(k=0; k<16; k++)
    {
        for(j=0; j<8; j++)
        {
            for(i=0; i<128; i++)
            {
                 B.put(i,(toto[i].ciphertxt[k] >> j)&0x1);
            }
            solve(d, m2, M, B);

#ifdef __debug__
            std::cout << "m2 is " << m2 << "\n";
#endif

            int b=0;
            for(;b<128;b++)
                Mf.put(k*8+j,b,m2.get(b));
        }
    }

#ifdef __debug__
    std::cout << "Mf = " << Mf << "\n";
#endif

    /* Now that we have Mf, let's make a test ;) */

    printf("[+] Let's make a test !\n");
    bzero(test_c1, 16);
    bzero(test_c2, 16);
    char DELTA_X[16];
    char DELTA_Y[16];
    bzero(DELTA_X, 16);
    bzero(DELTA_Y, 16);
    DPA_ecb_encrypt (&key, test_p, test_c1);

    // DELTA_X !
    unsigned char U0[] = "ABCDEFGHABCDEFG1";
    unsigned char Y0[16];
    DPA_ecb_encrypt (&key, U0, Y0);
    
    for(i=0; i<16; i++)
    {
        DELTA_X[i] = test_p[i] ^ U0[i];
    }

    // DELTA_Y !
    vec_GF2 X(INIT_SIZE,128);
    vec_GF2 Y(INIT_SIZE,128);
    clear(X);
    clear(Y);
    for(k=0; k<16; k++)
    {
         for(j=0; j<8; j++)
         {
             X.put(k*8+j,(DELTA_X[k] >> j)&0x1);
         }
    }

    Y = Mf * X;

#ifdef __debug__
    std::cout << "X = " << X << "\n";
    std::cout << "Y = " << Y << "\n";
#endif

    GF2 z;
    for(k=0; k<16; k++)
    {
        for(j=0; j<8; j++)
        {
            z = Y.get(k*8+j);
            if(IsOne(z))
                DELTA_Y[k] |= (1 << j);
        }
    }

    // test_c2 !

    for(i=0; i<16; i++)
        test_c2[i] = DELTA_Y[i] ^ Y0[i];

    /* Compare the two vectors */
    
    if(!memcmp(test_c1,test_c2,16))
        printf("\t=> Well done boy :>\n");
    else
        printf("\t=> Hell !@#\n");

#ifdef __debug__
    affichage(test_c1);
    affichage(test_c2);
#endif
    return 0;
}
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -


--[ Annexe B - Probability evaluation of (hash32()%8 == 0)

8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define NBR_TESTS 0xFFFFF

int main()
{
    int i = 0, j = 0;
    char buffer[16];
    int cmpt = 0;
    int rand = (time_t)time(NULL);
    float proba = 0;
    srandom(rand);
    for(;i<NBR_TESTS;i++)
    {
        for(j=0; j<4; j++)
        {
            rand = random();
            memcpy(buffer+4*j,&rand,4);
        }
        checksum128 (buffer, buffer, 16);
        if(!(hash32(buffer,16)%8))
            cmpt++;
    }
    proba = (float)cmpt/(float)NBR_TESTS;
    printf("[+] Probability is around %f\n",proba);
    return 0;
}
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -


--[ Annexe C - 2nd preimage attack on 32 bytes messages

8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "dpa.h"

void
E2 (unsigned char *key, unsigned char *block, unsigned int shift)
{
    rbytechain (block);
    rbitshift (block, shift);
    S_E (key, block, shift);
}

void 
DPA_ecb_encrypt (DPA_KEY * key, unsigned char * src, unsigned char * dst)
{
    int j;
    memcpy (dst, src, DPA_BLOCK_SIZE);
    for (j = 0; j < 16; j++)
        E2 (key->subkey[j].key, dst, key->subkey[j].shift);
    return;
}

void affichage(unsigned char *chaine)
{
    int i;
    for(i=0; i<16; i++)
        printf("%.2x",(unsigned char )chaine[i]);
    printf("\n");
}

int main(int argc, char **argv)
{
    DPA_KEY key;
    unsigned char str_key[] = "deadbeef";
    unsigned char IV[] = "0123456789abcdef";
    unsigned char evil_payload[] = "COMPLETELYBROKEN";
    unsigned char D0[16],D1[16];
    unsigned char final_message[32];
    int fd_r = 0;
    int i = 0;
    
    if(argc < 2)
    {
        printf("Usage : %s <file>\n",argv[0]);
        exit(EXIT_FAILURE);
    }

    DPA_set_key (&key, str_key,8);
    if((fd_r = open(argv[1], O_RDONLY)) < 0)
    {
        printf("[+] Fuck !@#\n");
        exit(EXIT_FAILURE);
    }

    if(read(fd_r, D0, 16) != DPA_BLOCK_SIZE)
    {
        printf("Too short !@#\n");
        exit(EXIT_FAILURE);
    }

    if(read(fd_r, D1, 16) != DPA_BLOCK_SIZE)
    {
        printf("Too short 2 !@#\n");
        exit(EXIT_FAILURE);
    }
    close(fd_r);
    memcpy(final_message, evil_payload, DPA_BLOCK_SIZE);
    blockchain(evil_payload, IV);
    DPA_ecb_encrypt (&key, evil_payload, evil_payload);
    blockchain(D0,IV);
    DPA_ecb_encrypt (&key, D0, D0);
    blockchain(D0,D1);
    blockchain(evil_payload, D0);
    memcpy(final_message+DPA_BLOCK_SIZE, evil_payload, DPA_BLOCK_SIZE);
    
    for(i=0; i<DPA_BLOCK_SIZE*2; i++)
        printf("%c",final_message[i]);
    return 0;
}
8<- - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - -