433-448 Applied Cryptography and Coding
Research Project
Semester 1, 2001

Jacinta Richardson $<$jarich@perltraining.com.au$>$

Abstract:

Due to the prevalence of the RSA cryptography scheme, prime numbers are one of the biggest areas of interest in both the mathematical fields within Number Theory and the computer science fields of cryptography and security. This paper looks at what prime numbers are, why they are so important, and how to generate them for use in RSA keys.

What is a prime number?

The formal definition of a prime number is as follows:

A positive integer $p$ is prime if $p \ne 1$ and if the only positive divisors of $p$ are 1 and $p$.

Examples of prime numbers include 2,3,5,7,11,13. There are infinitely many prime numbers.

The use of primes

RSA, and all other asymmetric ciphers, depend upon the existence of some kind of trapdoor function. This function is supposed to be practically irreversible, although reversibility might be possible in theory.

The RSA cipher uses the fact that, while it is easy to compute the product of two primes, factorising a large number into its prime factors quickly seems to be essentially impossible. There are factorisation methods available but all of these take a long time for large numbers. Whilst the time necessary to factor a large number is longer than the usefulness of the data encrypted we consider this problem unsolvable.

Since factorisation is hard, testing for primality should, intuitively, also be hard - as we would need to determine that the number only has two factors, 1 and itself. However, as I will explain, this need not be the case.

Finding prime numbers

The naive way

A naive way to test a number for primality is to check every number between one and it to see whether it is a factor, stopping if such exists. An algorithm for this is:

TRUE=1;
FALSE=0;
/* input: n - number to test for primality
 * output: FALSE or TRUE (whether n is prime)
 */
isPrime(n) =
{
        for(i=2, n,
                /* if i is factor, return FALSE */
                if(n % i == 0,
                        return(FALSE);
                );
        );
        return(TRUE);
}

This algorithm involves N steps for a number N, if N is prime. It is expensive and guaranteed to be very slow if N is large. It will, however, always return the correct result.

Some improvements

If we recognise that the first factor a composite number has is less than or equal to its square then we can limit the maximum number of values we have to test to determine primality. If we also realise that either a number is even - and so the number 2 is a factor - or all of its factors are odd, we can halve the remaining test space. An improved algorithm for testing primality is:

TRUE=1;
FALSE=0;
/* input: n - number to test for primality
 * output: FALSE or TRUE (whether n is prime)
 */
isPrime(n) =
{
        if(n % 2 == 0,
                return(FALSE);
        );

        for(i=3, sqrt(n),
                if(n % i == 0,
                        return(FALSE);
                );
                i = i+1;        
        );
        return TRUE;
}
This algorithm involves $\sqrt{N}/2$ steps for a number N, if N is prime. It is less expensive than the naive way, but will still be very slow if N is large. It will also return the correct result for any N.

Sieve of Eratosthenes

One old but still very effective ways to list all the primes up to a given number (L) is the sieve of Eratosthenes. This uses the following steps.

  1. List all natural numbers from 2 until L.
  2. 2 is prime, mark all multiples of 2 (that are greater than 2) up until L.
  3. The first remaining unmarked number is prime. Mark all multiples of this number less than L.
  4. Repeat step 3.

This could be coded as follows:

/* input: L - limit to which to find all primes.
 * output: array of size L of all primes from 2 to L.
 */
allPrimes(L) =
{
                /* step 1 */
        for(i=0, L,
                a[i] = i+1;
        );
        i = 0;

        while(i < L,
                /* skip over "marked" values */
                while(a[i] == 0,
                        i = i + 1;
                );

                /* this number is prime.  mark
                   all multiples */
                for(j=2*(i+1), L, 
                        a[j] = 0;
                        j = j+i;
                );
        );
        return(a);
}

This method involves $L^2$ steps for a limit L. In those steps it provides us with a list of every prime up to L. This sieve is not usually computed just to test if a given number is prime. However once this list is generated all numbers less than $L$ can be tested for primality cheaply (a constant time lookup). This is what a number of mathematical packages for computers do.

Complexity

None of the above three methods is fast enough to be practical for use in generating or testing large primes for use in RSA. If our prime is in the order of $10^{200}$ which our RSA primes should be, then the first method requires $10^{200}$ operations and our second requires about $10^{100}$ operations.

Erathosthene's Sieve is also impractical for numbers of this size. Caching all of the prime numbers up to $10^{201}$ would take up much too much memory, and generating them would take hundreds of years.

We need a better way to determine whether a number is prime.

Fermat's Theorem

Fermat's Theorem allows us to test a number for primality without testing it for factors. It states:

Let $p$ be a prime and let $a$ be an integer which is not a multiple of $p$. Then
$a^{p-1} \equiv 1$ mod $p$

That is; if a number, $p$, is prime then for all integers $a$ (with greatest common divisor of $a$ and $p$ being 1), $a^{p-1}$ must be 1 in modulo $p$. This is provably true for all primes and gives us a great way of answering whether or not a number is not prime. An algorithm for this is:

FALSE=0;
MAYBE=2;
/* input: n - number to test for primality
 * output: FALSE or MAYBE (whether n is prime)
 */
isPrime(n) =
{
        local(r,s,t);

                /* pick a = 2 
        r = n-1;
        s = 2;
        t = 1;
                /* determine 2^(n-1) mod n */
        while(r != 0,

                if(r % 2 == 0,
                        r = r/2;
                        s = s*s % n;
                , /* else */
                        r = (r-1) /2;
                        t = s*t % n;
                        s = s*s % n;
                );
        );

                /* t = 2^(n-1) */
        if(t == 1,
                return(MAYBE);
        , /* else */
                return(FALSE);
        );
}

This algorithm involves $log_2(p-1)$ steps. It is inexpensive and relatively fast even for large $p$. It does not, however, guarantee that the number is prime. The next subsection discusses this.

Pseudoprimes

Fermat's theorem says that for all $a$, $a^{p-1} \equiv 1$ mod $p$. Our algorithm above only checks $a=2$. A composite number $n$ is said to be a pseudoprime to base $a$ if $a^{n-1} \equiv 1$ mod $n$.

We need to test for more values of $a$ than just $a=2$. How should we pick these $a$ and how many $a$ do we need to test? If we need to test every number from 1 to $n$ then we're doing $n*log_2(n-1)$ many steps and this is more expensive than our naive approach above.

We can pick random values for $a$ and test a given number of these to determine if $n$ is ``prime enough'' however, this still cannot guarantee that the number is prime. In fact we can be sure some composite numbers will get through.

Carmichael numbers

There are some composite numbers which, for almost any $a$, $a^{n-1} \equiv 1$ mod $n$. These numbers are called Carmichael numbers. Formally:

A composite number $n$ is a Carmichael number if $a^{n-1} \equiv 1$ mod $n$, for all $a$ with the greater common divisor of $a$ and $n$ being 1.

The existence of pseudoprimes and Carmichael numbers make the use of Fermat's theorem useless, in guaranteeing a number is prime, in this simple form. Almost any choice of $a$ will have the property that it's greatest common divisor with $n$ will be 1. If not, we've just found a factor of $n$, so $n$ is not prime.

We need to look at another useful property of primes to replace Fermat's theorem.

Rabin Miller algorithm

The Rabin Miller algorithm uses the property of strong pseudoprimes in checking for primality. We can determine whether a number is a strong pseudoprime to a given base using the following theorem:

Let $n$ be a positive integer and $a$ be a number randomly chosen between 1 and $n$. Let $n - 1 = 2^st$ where $t$ is an odd integer. If either:
where $0 <= i <= s$. Then $n$ is either a prime number or a strong pseudo-prime to base $a$.

The algorithm can be written as follows:

PROBABLY=1;
/* input: n - number to test for primality
 *        numtries - number of base 'a's to try.
 * output: FALSE or MAYBE (whether n is prime)
 */
isPrime(n, numtries) =
{
        local(t,s,a,b,c,i);

                /* check that the number
                 * is not even */
        if(n%2 == 0,
                return(FALSE);
        );

        t = n-1;
        s = 0;

                /* determine s, t */
        while(t % 2 == 0,
                s = s+1;
                t = t/2;
        );

                /* try numtries many values of a */
        for(i = 1, numtries,
                        /* pick a random number a, 
                         * similar in size to n */
                a = random(n);

                        /* check against Fermat's 
                         * theorem for primality */
                if(fastpower(a, n-1, n) != 1,
                        return(FALSE);
                );

                        /* check it's either strong
                         * pseudoprime or prime */
                b = fastpower(a, t, n);
                j = 0;

                        /* determine (a^t)^(2^i) 
                          for every i <= s */
                while(((c = b^2 % n) != 1 && i <= s),
                        b = c;
                        i = i+1;
                );

                if(i == s || (b == 1 && i > 0),
                        return(FALSE);
                );
        );

        return(PROBABLY);
}

/* input: a - the base to power.
 *        k - the exponent
 *        m - modulos
 * output: a^k mod m
 */
fastpower(a,k,m) =   
{
        local(r,s,t);

        r = k;
        s = a;
        t = 1;

        while(r != 0,

                if(r % 2 == 0,
                        r = r/2;
                        s = s^2 % m;
                , /* else */
                        r = (r-1)/2;
                        t = s*t % m;
                        s = s^2 % m;
                );
        );
        return(t);
}

We thus now have a method for testing primality. The algorithm determines with certainty whether a number is composite but can only establish with high-probability if a number is prime. The probability of error is no worse than $(1/4)^k$ where $k$ is the number of bases we test. If $k > 50$ then the probability of error is $<10^{-30}$ which is much smaller than chances of hardware error on your computer.

This algorithm is also very fast. The powering modulo $n$ takes $log_2(n-1)$ for each call to fast power, and this is repeated $k$ times. Hence this function requires no more than $2*k*log_2(n-1)$ operations. Since $k$ need not be higher than 50, this is fast even for large $n$.

How to generate primes

Due to the number of very clever and effective ways of factorising a composite number $n$ we have to ensure that primes used in RSA are strong primes. A strong prime is one that, if it occurs as a factor of a larger number $n$ makes the factorisation of $n$ difficult. The qualities of this prime depend on the current state of the art of factorisation techniques, and on the size of the relevant numbers. Currently, a strong prime is a prime $p$ such that:

The first condition is design to resist the Pollard $p - 1$ factorisation method. The second condition is designed to resist an analogous attack using $p + 1$. The third condition is of a similar nature.

The easiest method of generating prime numbers is to pick a random seed and check all numbers greater than that until we have located a prime. Since the Prime number Theorem states that the number of primes less than some value $n$ is $\approx n/log_{10}(n+1)$ and there are infinitely many primes, we are guaranteed to discover a prime number fairly quickly.

These primes will not necessarily be good for use with RSA as we have no evidence that they are necessarily strong primes. However these numbers can be used in our determination of stronger primes. Also, if our prime numbers are to be used in RSA it is important that the random seed be as random as possible. Otherwise anyone who uses the same random function can generate our primes and the purpose of strong primes is destroyed.

The following code generates large primes suitable for use in RSA.

FALSE=0;
TRUE=1;
/*  input: size of desired prime.  ie 10^200.
 *        certainty of primality desired.  k value for test.  
 *        ie 50.
 *        certainty of p -1 factors desired.  k value for test.  
 *        This can afford to be lower than above.  ie 10.
 *  output: a probably prime number.
 */
generate_RSA_prime(size, certainty, fcert) =
{
        local(r,p1,p2,n,t,s);

        r = random(size);

                /* find the next prime from r.  (need 
                 * really only be strong pseudo-prime) */
        r = next_prime(r, fcert);   

                /* test each of 2r + 1, 4r + 1, 6r + 1 ..
                 * for primality */
        n = 2;
        while(isPrime(n*r + 1, fcert) == FALSE,
                n = n+2;
        );


        p1 = n*r + 1;

                /* find next prime from s.  as with r */
        s = random(size);
        p2 = next_prime(s, fcert);

                /* find t such that:
                 *      t = 1 mod p1
                 *      t = -1 mod 4p2
                 */
        t = chinese_remainder(1, p1, -1, 4*p2);

                /* so test t + 4*p1*p2, t + 8*p1*p2,
                 * t + 12*p1*p2 ... for primality */
        n = 4;
        while(isPrime(t + n*p1*p2, certainty) == FALSE
                n = n + 4;
        );

        return(t + n*p1*p2);
}
                
               
/* input: size of desired prime.  ie 10^200.
 *        certainty of primality desired.  k value for 
          test.  ie 50.
 *  output: a probably prime number.
 */
next_prime(size, certainty) =
{
        a = random(size);

                /* make sure a is odd */
        if(a % 2 == 0,
                a = a+1;
        );

                /* keep looking at odd numbers
                 * until we find a prime */
        while(isPrime(a, certainty) == FALSE,
                a = a+2;
        );

        return(a);
}

/* input: a, p, b, q such that:
 *      x = a mod p
 *      x = b mod q
 * output: x mod p*q as determined by the previous 
 *         congruences.
 */
chinese_remainder(a, p, b, q) =
{
        local(s,t,x,array);

                /* find s,t such that s*p + t*q = 1 */
        array = euclidean_alg(p,q);
        s = array[1];
        t = array[2];

        x = (a*(t*p) + b*(s*q)) % (p*q);

        return(x);
}

/* input: p1, p2 to find: a*p1 + b*p2 = 1
 * output: (a,b)
 */ 
euclidean_alg(p1 p2) =
{
        local(x,y,a,b,c,d,t1,t2,r,q);

                /* set up x, y */
        y = p1;
        x = p2;
                /* set up matrix variables */
        a = 1; b = 0; c = 0; d = 1;

                /* until y is zero *.
        while(y != 0,    /* set remainder */
                r = x % y;
                q = (x - r) / y;

                        /* change x, y*/
                x = y;
                y = r;
                t1 = a;
                t2 = b;
                        /* calculate new matrix variables*/
                a = c;
                b = d;
                c = t1 -( q * c );
                d = t2 - ( q * d);
        );

                        /* now have a*p1 + b*p2 = 1*/
        return([a,b]);
}

Further Considerations

As well as ensuring that our primes are strong primes we have to ensure that we select $p$ and $q$ such that they are about the same bit length and sufficiently large. This is to avoid the elliptic curve factoring. If we select size in our call to generate_RSA_prime to be the same in both cases, this should be covered.

Another restriction is that $p - q$ should not be too small. If $p - q$ is small then $p \approx \sqrt{pq}$. Thus $n$ can be factorised by trial division by odd integers near $\sqrt{n}$. If we're choosing $p$ and $q$ at random, then the probability that $p - q$ will be appropriately large will be high.

Code to generate two suitable RSA primes is as follows:

/*  input: size of desired prime.  ie 10^200.
 *        certainty of primality desired.  (k value for test.)  
 *           ie 50.
 *        certainty of p -1 factors desired.
 *           (This can afford to be lower than above.  ie 10.)
 *        minimum allowable difference between p, q produced.
 *  output: matrix of p, q.
 */
make_primes(size, cert1, cert2, diff) =
{
        local(p, q);

        p = generate_RSA_prime(size, cert1, cert2);
        q = generate_RSA_prime(size, cert1, cert2);

                        /* ensure |p-q| > diff */
        while(abs(p-q) > diff,
                q = generate_RSA_prime(size, cert1, cert2);
        );

        return([p,q]);
}

Conclusion

This paper has covered what prime numbers are, why they are useful in RSA, and how to find them. Naive ways of primality testing have been contrasted with more advanced version and code has been provided for a prime generator.

The generation of reasonable random numbers is crucial to the success of this generator. If a typical random number is generator is used anyone with access to a copy of it could determine the primes generated. There has been a lot of study into good sources for random numbers including measuring milliseconds between keyboard interrupts, changes in CPU temperature and others. A good combination of these might be sufficient for this problem but is outside of the scope of this project.

Likewise there are requirements on good primes for use in cryptographical schemes like RSA. Primes need to be chosen such that their product is not susceptible to the many factorisation attacks. If these things are not checked, the resulting $n = pq$, with $p,q$ generated by these algorithms, might be much easier to factorise than desirable. The provided algorithm attempts to address these problems.

Note

Each of the algorithms above are written as working code in Pari-GP. Pari-GP is a software package for computer-aided number theory. Pari is free and documentation and source can be found at http://www.parigp-home.de/

About this document ...

433-448 Applied Cryptography and Coding
Research Project
Semester 1, 2001

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -dir Primes -split 0 -no_navigation essay.tex

The translation was initiated by Jacinta Richardson on 2002-09-02


Jacinta Richardson 2002-09-02