A Faster Sieve of Eratosthenes

SievePic

The Sieve of Eratosthenes is a beautifully elegant way of finding all the prime numbers up to any limit. The goal of this post is to implement the algorithm as efficiently as possible in standard Python 3.5, without resorting to importing any modules, or to running it on faster hardware.

Eratosthenes was a Greek scholar who lived in Alexandria (276BC to 194BC) in the so-called Hellenistic period. He was working about a century after Alexander, and about a century before the Romans arrived to impose their cultural desert and call it peace. And then do nothing with the body of knowledge they discovered. Literally. For over 1,600 years, if you count Constantinople. Not a damn thing.

So much for overly religious, centralised, bureaucratic superstates, obsessed with conquest. But I digress.

Eratosthenes was a quiet achiever in Hellenistic Alexandria who, for his day job, was chief librarian of the great library of Alexandria. But he was also a poet, mathematician, astronomer and geographer. Among his many achievements, he founded of the discipline of geography and calculated of the circumference of the earth.

But it was his brilliantly simple way of calculating prime numbers that I’ll be looking at today. His algorithm can be described as follows:

1. Starting from 2, eliminate all its multiples.
2. Move up to the first number you find that has not yet been eliminated and eliminate all its multiples.
3. Repeat step 2 until you reach the square root of your maximum, then stop.

This can be easily implemented in any programming language. In Python, it can be implemented most obviously, although a little clumsily, using Python lists:

a = [2]
a += list(range(3, maximum+1, 2))
for i in range(3, int(maximum**0.5) + 1, 2):
    for j in range(i * i, maximum + 1, i * 2):
        if j in a:
            a.remove(j)

Although this is quite easy to follow, it’s also painfully slow. On my quad MacMini system, just counting up to 10,000 took so long I gave up waiting.(1)

Luckily, Python has the perfect data structure for subtracting collections of data: sets. Without going into the inner workings of hashes, these have the amazing quality that it takes the same time to find any member of the set, regardless of how large the set is, or where the item lies in the set. Using a set to implement the same algorithm as above looks like this:

a = set(range(3, maximum+1, 2)) 
a |= set([2]) # same as "a += b" for sets, ie a union.
p = 3
while p < maximum ** 0.5:
    a —= set(range(p * p, maximum+1, 2 * p))
    p += 2
    while p not in a:
        p += 2

The performance increase over using lists is phenomenal. It found all the primes up to 100 million in 31.2 seconds. All 26,355,867 primes up to 500 million were found in a blisteringly fast 6 minutes 30 seconds. After 1 billion, I started getting excessive page fault swaps to disk, and it crashed when I tried to run it on 2 billion.

Which got me wondering: if there a faster way of doing this on my desktop? And a way of finding bigger primes?

I couldn’t think of a way to improve my code using sets, or to scale it to find larger primes, so I cast around for a different method. That was when I discovered the Boolean list approach. It’s essentially the same algorithm, just using different structures in Python. The advantages of this approach are that:

A. It should use less memory.
B. It should be fast.
C. It should be scalable beyond the memory limit of any computer system, by breaking up large numerical search fields into smaller manageable chunks.

The code to do this is commonly written as follows. This came from the site Rosetta Code:

def iprimes_upto(limit):
    # As this function contains a yield cmd, technically it’s a generator
    is_prime = [False] * 2 + [True] * (limit - 1)   # set up the Boolean list

    for n in range(int(limit ** 0.5 + 1.5)):   
        if is_prime[n]:		
            for i in range(n * n, limit + 1, n):
                is_prime[i] = False

    for i in range(limit + 1):		
        if is_prime[i]:
            yield i
    return

prime_set = list(iprimes_upto(upper_limit))  # create the list of the primes

Remarkably, this was found to take almost exactly the same time as the Python sets approach, completing the count to 100 million in 33.7 seconds. Above 500 million, it was actually faster. But this implementation can still be improved in a number of ways.

1. First, by breaking up the find-and-filter structure into 2 stages and removing first the evens, then removing the rest later from a much-reduced data set.

Taking the central filter section of code:

for n in range(int(limit ** 0.5 + 1.5)):
    if is_prime[n]:
        for i in range(n * n, limit + 1, n):
            is_prime[i] = False

This becomes:

for i in range(4, limit + 1, 2):
    is_prime[i] = False

for n in range(3, int(limit ** 0.5 + 1.5), 2):
    if is_prime[n]:
        for i in range(n * n, limit + 1, 2*n):
            is_prime[i] = False

Because the even numbers are now eliminated first, the second stage filter can now rip through the rest using double the step size.

Speed improvement: for primes up to 100 million, the time fell from 33.7s to 25.2s, around 25% faster.

 

2. Similarly, the same trick can be performed on the output generator stage, by splitting the yield statements of the generator section into 2 stages, to allow the second step to go twice as fast. Taking the yield statement:

 
for i in range(limit + 1):
    if is_prime[i]:
        yield i

It now becomes this:

for i in range(3):
    if is_prime[i]:
        yield i

for i in range(3, limit + 1, 2):
    if is_prime[i]:
        yield i 

This cuts down the time to find primes up to 100 million from 25.2s to 21.0s, making the code 16% faster.

3. By combining the initial Boolean mask creation with the first stage of the filter created in (1) above, we can also eliminate the even numbers before we start the filtering process. Taking the first two steps in the Boolean filter:

is_prime1 = [False] * 2 + [True] * (limit - 1)
for i in range(4, limit + 1, 2): 
    is_prime[i] = False

These two steps are now combined into one initial mask creation step which flags all the even numbers as not-prime:

is_prime = [False, False, True, True] + [False, True] * int((limit - 2)/2)
is_prime = is_prime[:limit+1]

This further shaves about 16% off the algorithm up to 100 million, reducing it from 21.0s to 17.6s.

So by using simple programming methods, the execution time has been halved, from 33.7s to 17.6s. First, the primes were filtered in 2 stages instead of one, then the generator was made to run twice as fast by only checking the higher odd numbers, then a more complex initial mask was used, one that eliminated higher even numbers before the filtering even started.

There are quite a few other things that can be done to speed up Eratosthenes’ sieve in Python, but the goal here was to implement the algorithm more efficiently in standard Python without resorting to importing any modules, or to faster hardware.

In the next post, I’ll be exploring what else can be done in Python to improve the speed of Eratosthenes’ wonderfully simple algorithm, and what can be done to scale this method to get around a system’s memory limit, the goal being to find larger primes nearer Python’s sys.maxsize.

 

NOTES

(1) Note that in Python 3, range is the same as arange in Python 2, and performs essentially as a generator. So there’s no need to import it from the Python Numpy package, as you do in Python 2.

 

 

4 Replies to “A Faster Sieve of Eratosthenes”

  1. Wow! This could be one particular of the most beneficial blogs We have ever arrive across on this subject. Basically Wonderful. I am also an expert in this topic therefore I can understand your hard work.

Leave a Reply

Your email address will not be published. Required fields are marked *