Eratosthenes 2: Swifter, Further, Cooler

Human & alien hands

 

This post describes the process I used to design an algorithm that allows you to implement a modified Sieve of Eratosthenes to bypass the memory limitations of your computer and, in the process, to find big primes well beyond your 64-bit computer’s supposed numerical limit of 2.0e63 (9.223e18). Beyond that, with this algorithm, the only limitation is the speed of your CPU.

This is the second such post on finding big primes using the Sieve of Eratosthenes in Python. In the previous post, I compared the times for implementing the sieve using standard Python 3.5 to eliminate non-primes from all the numbers up to a user-specified limit using:

  • range statements to index and eliminate non-primes from a Python list,
  • Python set subtraction to subtract the sets of multiples, and
  • stepped range statements to eliminate non-primes from a Python list of Boolean True/False values, corresponding to the integer’s prime status.

I then tried to bypass the slower sections of code by:

  • Removing even numbers at the setup stage of the sieve, and
  • Tuning the generator to look only for odd primes.

The reason we remove even numbers first is that they are the easiest to identify. We humans find them so easy to spot because 2 is one of the two prime factors for the number of fingers we have, the other being 5. If instead we had a prime number of fingers, such as 11, we would have no idea which numbers were “even” or “odd”. Note that primes don’t care about what base you’re using, they’re still prime.

Interestingly, if we had 3 fingers on each hand we would count in base 6, and we would be really good at spotting numbers that divide by 2 & 3 – they would all end in 0, 2, 3 or 4, and we’d find numbers that ended in 1 or 5 really weird. In fact, if we ignore 1 as prime (apparently it no longer is – it’s been Plutoed), then nine out of the first ten numbers in base 6 that end in 1 or 5 are prime (5, 7, 11, 13, 17, 19, 23, 29 & 31). Somewhere on an alien world there’s a six-fingered species who are uncannily good at spotting prime numbers. But I digress.

All of the methods used in the previous post ran into the same problem, that of the system memory. For example, if you try to create large Boolean lists (the fastest and the least memory-hungry of the above methods), it’s easy to see when your computer starts to struggle, because no matter how much memory you have, at some point it will be swamped by the size of the object you need to handle, even if it consists of only 0s and 1s:

>>> x=[True] * 10**9
>>>
>>> x=[True] * 10**10
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

But I had a feeling that if I could find some way of getting around this problem the sky would be the limit, since I already knew that sys.maxsize in Python, the largest number your CPU is supposed to be able to handle, is not actually a barrier to large integer calculations in the latest versions of Python. Just type 2**128 into your Python 3 interpreter to see what I mean. If I could prove this was the case for big prime calculations in Python, and in the process eliminate memory size as a limitation, the only hindrance to calculating large primes would be my computer’s CPU speed.

So how to write a program to calculate big primes, one that doesn’t actually care about how much memory you have, only how fast your CPU is? Since the page swap/sigkill errors happened at different limits for lists and sets, I knew that the size of the limits must not be the problem. It had to be something to do with the size of the Python objects the computer was attempting to create and hold in memory to perform the sieve eliminations.

1. Using Numpy

The first thing to do was to start using Numpy. From the times I’d used it on other projects, I knew this would make a huge difference. As expected, it delivered an astonishing increase in speed. As you probably know, the trick with Numpy is to use Numpy masking, not for-loops. Here’s how NOT to create, scan and modify a Numpy array:

 

Using Numpy in this way gave no improvement over the best time from the previous post:

Searching for primes up to 100,000,000...
Number of primes is 5,761,455
Last prime is 99,999,989
Completed the search in 0 hours 0 mins 17.7 seconds. 

And here’s how to use Numpy’s masking feature. Note how it looks like a Python slice, but rather than producing a subset of the array, it’s masking the original, making changes to the Numpy data structure in situ:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Program description: runs a sieve of Eratosthenes up to an upper limit using
numpy slicing, timing the results.
 
Created 4 March, 2018 
Author: matta_idlecoder at protonmail dot com
"""
import timeit
import numpy as np
 
timing_it = True
 
def iprimes_upto(limit):
    """Improvements to Eratosthenes Sieve using numpy masking
    """
 
    is_prime = np.array(([False, False, True, True] + [False, True] *
                         int((limit - 2)/2)), dtype=bool)
    is_prime = is_prime[:limit+1]
 
    root_limit = int(limit ** 0.5) + 1
    for n in range(3, root_limit, 2):
        if is_prime[n]:
            is_prime[n*n : limit+1 : 2*n] = False
 
    lower_limit = limit - 201 if limit % 2 == 0 else limit - 200
    for i in range(lower_limit, limit + 1, 2):
        if is_prime[i]:
            yield i
 
    return
 
#==============================  MAIN   ===================================
 
upper_limit = int(input('\nEnter the number you want to find all the primes up to: '))
print("\nSearching for primes up to {:,d}...".format(upper_limit))
 
if timing_it:
    start_time = timeit.default_timer()
 
prime_set = list(iprimes_upto(upper_limit))
 
print ("Number of primes is {:,d}".format(len(prime_set)))
print ("Last prime is {:,d}".format(prime_set[-1]))
 
if timing_it:
    prime_time = timeit.default_timer() - start_time
    hours, mins = divmod(prime_time, 3600)
    mins, secs = divmod(mins, 60)
    print('\nCompleted the search in {:,} hours {:,} mins {:0,.1f} seconds.'.
            format(int(hours), int(mins), secs))

 

The performance of this code was four times faster than the previous methods tried:

Searching for primes up to 100,000,000...
Number of primes is 5,761,455
Last prime is 99,999,989
Completed the search in 0 hours 0 mins 4.2 seconds.

Searching for primes up to 1,000,000,000...
Number of primes is 50,847,534
Last prime is 999,999,937
Completed the search in 0 hours 1 mins 3.6 seconds.

Searching for primes up to 2,000,000,000…
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

 

The next alteration to make was to change this line:

prime_list = list(iprimes_upto(upper_limit)

to this:

max_prime = max(iprimes_upto(upper_limit))

The reason for this is as follows: the memory limit now seemed to be kicking in just above 1.0e9. Using debug feature of my IDE, the problem was found to occur when the stream of primes returned by the generator iprimes_upto() was converted into a list. This makes sense. The manageable limit was therefore extended dramatically by only looking for the maximum prime found, thus avoiding creating prime_set as a list. Unfortunately, there was still the problem of the memory limit, only now it was occurring at a much higher number:

Searching for primes up to 14,000,000,000...
Last prime is 13,999,999,991
Completed the search in 0 hours 15 mins 42.2 seconds.

Searching for primes up to 15,000,000,000…
Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

So although it seemed clear that the solution required Numpy without using lists, the problem was still the size of the Python object being manipulated, which would always hit the system memory limit. However, the good news was that the reachable prime limit had now been pushed up to 1.4e10.

2. Using a List of Numpy Arrays

This approach came from the idea that a list of objects in Python is supposed to be just a list of addresses. So, logically (you would think) a list of smaller, more manageable arrays should only be represented in memory a list of the memory addresses of the arrays. If you only mask one array at a time, the frantic page swapping should end. Not so. This hit the same memory limits as the single array Numpy method, with the additional result of slowing it down for smaller searches:

Searching for primes up to 10,000,000...
Initial array length tried was 5,000,000
Actual array length used was 5,000,000
The number of arrays used to make the limit manageable was 2
The biggest prime found was 9,999,991
Completed the search in 0 hours 0 mins 2.9 seconds.

The code to implement this was as follows:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created 8 March, 2018
 
Program description: uses a generator to create a sieve of Eratosthenes up to
an upper limit, attempting to bypass the hardware limit on the maximum list
size by splitting the upper limit into manageable lists, and timing the results.
 
Author: matta_idlecoder at protonmail dot com
"""
import timeit
import numpy as np
import datetime
 
listing_primes = True
timing_it = True
 
def init_prime_arrays(maximum):
    """Returns list of Boolean numpy arrays, initialised for eliminating primes
 
    Value at each index reflects whether the index is prime or not
    Values 0.. are correct, with all higher evens eliminated
    """
 
    list_of_arrays = []
    init_max_array_length = 5*10**6   
    maximum += 1 if (maximum % 2 == 1) else 0  # make maximum even
 
    num_arrays = 1
    while init_max_array_length * num_arrays < maximum:
        num_arrays *= 2
 
    each_array_length = int(maximum / num_arrays)
 
    first_array = np.ones(each_array_length, dtype=bool)
    first_array[4::2] = False  # set every second item to False, starting from 4
    list_of_arrays.append(first_array)
 
    distance_covered = each_array_length - 1 # value of last index, not the length
    last_value = first_array[-1]
 
    while distance_covered < maximum - 1:
        new_array = np.ones(each_array_length, dtype=bool)
        first_false = 0 if last_value else 1
        new_array[first_false::2] = False
        last_value = new_array[-1]
        list_of_arrays.append(new_array)
        distance_covered += each_array_length
 
    list_of_arrays[0][0], list_of_arrays[0][1]  = False, False
 
    print("Initial array length tried was {:,d}".format(init_max_array_length))
    print("Actual array length used was {:,d}".format(each_array_length))
    print("The number of arrays used to make the limit manageable was "
          "{:,d}".format(len(list_of_arrays)))
    return list_of_arrays
 
 
def iprimes_upto(limit):
    """Sieve of Eratosthenes using a list of arrays and a Generator
    """
    is_prime = init_prime_arrays(limit)
    number_arrays = len(is_prime)
    array_length = len(is_prime[0])
 
    # Sieve of Eratosthenes, nested into a list of np arrays.
    # Even numbers were eliminated when the boolean arrays were set up:
    for n in range(3, int(limit ** 0.5 + 1.5), 2):
        array_num, index = divmod(n, array_length)
        if is_prime[array_num][index]:
            for i in range(n * n, limit + 1, n * 2):
                array_num, index = divmod(i, array_length)
                is_prime[array_num][index] = False
 
    # Generator:
    last_array_length = len(is_prime[-1])
    prime_check_index = last_array_length - 1000
 
    # New in this version - don't bother to find the primes in the earlier 
    # arrays. Just check all the numbers in the prime check window: 
    for m in range(prime_check_index, last_array_length):
        if is_prime[-1][m]:
            n = (number_arrays - 1) * array_length + m
            yield (n)
    return    
 
#==============================  MAIN   ===================================
 
upper_limit = 1001
while upper_limit < 10**4 or upper_limit % 1000 != 0:
    upper_limit = int(
        input(
            "\nEnter a number >= 10,000 that is divisible by 1,000 that you want to find the nearest primes to: "))
    if upper_limit % 1000 != 0:
        print("Enter a number divisible by 1,000.")
    if upper_limit < 10**4:
        print("Try using a bigger number.")
 
print("\nSearching for the nearest primes to {:,}...\n".format(upper_limit))
 
if timing_it:
    print('Starting the search at',
              datetime.datetime.now().strftime("%H:%M:%S"), "\n")
    start_time = timeit.default_timer()
 
prime_set = list(iprimes_upto(upper_limit))
 
if not prime_set:
    print ("There were no primes found in the last 1,000 numbers below {:,}".format(upper_limit))
 
elif listing_primes:
    print ("\nThe last 20 primes are:\n")
    for prime1, prime2 in zip(prime_set[-20::][::2], prime_set[-20::][1::2]):
        print ("{:<35,}{:<,}".format(prime1, prime2))
else:
    prime_max = max(prime_set)
    print("The biggest prime found was {:,d}".format(prime_max))
 
if timing_it:
    prime_time = timeit.default_timer() - start_time
    days, hours = divmod(prime_time, 86400)
    hours, mins = divmod(hours, 3600)
    mins, secs = divmod(mins, 60)
    print(
        '\nCompleted the search in {:,} days {:,} hours {:,} mins {:0,.1f} seconds.'.format(
            int(days), int(hours), int(mins), secs))

 

The structure of the code was essentially:

for each prime: 
    eliminate all its multiples wherever they are 
    in all the arrays

The speed problem at smaller limits clearly had something to do with all the divmods on every loop, necessary to locate all the multiples of n within the list of arrays. It was also possibly related to the rapid indexing between arrays.

And the problem with the persistent maximum limit had to lie with the fact that even though only a section of the limit was being masked at any time, all the integers being masked up to the limit were being loaded as one connected Python object. Which was not supposed to happen, as far as my understanding of how Python works goes. But there it was.

3. Using Discrete Numpy Arrays

I decided that both problems would be solved by performing all the Numpy masks on each of TWO smaller Numpy arrays in turn. The key step was to break up the Python list of two arrays into two individual Numpy arrays which were not part of the same Python object. This would prove to be the breakthrough that would lead to a number of discoveries.

From section (1) above, I knew that the maximum reachable limit for binary Numpy arrays on my 16GB MacMini was 1.4e10. This was the threshold above which my Numpy prime sieve was turning my system into a zombie. Two Numpy arrays were therefore created and called array0, and array1, each 1.0e10 long.

This was relatively straightforward to write:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created 10 March, 2018
 
Program description: uses a generator to create a pseudo-sieve of Eratosthenes
up to an upper limit, bypassing the memory limit on the maximum list size by
splitting the upper limit into manageable arrays, and timing the results.
 
Author: matta_idlecoder at protonmail dot com
 
This version:
- uses 2 unrelated numpy arrays to extend size of max primes that can be found.
-
"""
 
import timeit
import datetime
import sys
import numpy as np
 
timing_it = True
listing_primes = True
 
def init_prime_arrays(maximum):
    """Returns 1 or 2 discrete Boolean numpy arrays, initialised for odd primes
 
    If it creates 2 massive numpy arrays, they are unconnected. 
    """
    init_max_array_length = 10**10
    maximum += 1 if (maximum % 2 == 1) else 0  # make maximum even
 
    num_arrays = 1
    if init_max_array_length * num_arrays < maximum:
        num_arrays *= 2
    each_array_length = int(maximum / num_arrays)
 
    if each_array_length > 10**10:
        print("\nYour number is too large for this version of the program.")
        print("Enter a number <= {:,}\n".format(10**9))
        sys.exit(2)
 
    array_0 = np.ones(each_array_length, dtype=bool)
    array_0[::2] = False
    array_0[1:3]  = False, True
    last_value = array_0[-1]
 
    if num_arrays == 2:
        array_1 = np.ones(each_array_length, dtype=bool)
        first_false = 0 if last_value else 1
        array_1[first_false::2] = False
    else:
        array_1 = []
 
    print("Initial array length tried was {:,d}".format(init_max_array_length))
    print("Actual array length used was {:,d}".format(each_array_length))
    print("The number of arrays used to make the limit manageable was "
          "{:,d}".format(num_arrays))
 
    return (array_0, array_1, num_arrays)
 
 
def iprimes_upto(limit):
    """Pseudo sieve of Eratosthenes, using a Generator
 
    Attempts to use two unrelated numpy arrays to reach bigger primes faster.
    """
    is_prime_0, is_prime_1, number_arrays = init_prime_arrays(limit)
    array_length = len(is_prime_0)
    root_limit = int(limit ** 0.5 + 1.5)
    n = 3
 
# Sieve of Eratosthenes, using binary masking across TWO numpy arrays.
# Avoids use of for-loops:
    array_index = 0
    while array_index < number_arrays:
        while n < root_limit:
            n2 = n * 2
            if array_index == 0:        # first loop of array and this n
                if not is_prime_0[n]:  # skip to next odd n if already False
                    n += 2
                    continue  
                # trick here is to make the array size large enough so that
                # √limit always lies within the first array. This is checked in
                # init_prime_arrays() above:
                is_prime_0[n :: n2] = False
                is_prime_0[n] = True
 
            elif array_index == 1:
                # Finds the indices in the second array:
                rem_2n = (array_index * array_length) % n2
                rem_2n += n if (rem_2n <= n) else -n
                start_index = n2 - rem_2n
                is_prime_1[start_index:: n2] = False
            n += 2
        array_index += 1
        n = 3 # reset for second array
 
# Generator:
    first_index = array_length - 1000
    last_array = is_prime_0 if number_arrays == 1 else is_prime_1
 
    for m in range(first_index, array_length):
        if last_array[m]:
            n = ((number_arrays - 1) * array_length) + m
            yield (n)
 
    return
 
#==============================  MAIN   ===================================
 
upper_limit = 1001
while upper_limit < 10**4 or upper_limit % 1000 != 0:
    upper_limit = int(
        input(
            "\nEnter a number >= 10,000 that is divisible by 1,000 that you want to find the nearest primes to: "))
    if upper_limit % 1000 != 0:
        print("Enter a number divisible by 1,000.")
    if upper_limit < 10**4:
        print("Try using a bigger number.")
 
print("\nSearching for the nearest primes to {:,}...\n".format(upper_limit))
 
if timing_it:
    print('Starting the search at',
              datetime.datetime.now().strftime("%H:%M:%S"), "\n")
    start_time = timeit.default_timer()
 
prime_set = list(iprimes_upto(upper_limit))
 
if not prime_set:
    print ("There were no primes found in the last 1,000 numbers below {:,}".format(upper_limit))
 
elif listing_primes:
    print ("\nThe last 20 primes are:\n")
    for prime1, prime2 in zip(prime_set[-20::][::2], prime_set[-20::][1::2]):
        print ("{:<35,}{:<,}".format(prime1, prime2))
else:
    prime_max = max(prime_set)
    print("The biggest prime found was {:,d}".format(prime_max))
 
if timing_it:
    prime_time = timeit.default_timer() - start_time
    days, hours = divmod(prime_time, 86400)
    hours, mins = divmod(hours, 3600)
    mins, secs = divmod(mins, 60)
    print(
        '\nCompleted the search in {:,} days {:,} hours {:,} mins {:0,.1f} seconds.'.format(
            int(days), int(hours), int(mins), secs))

 

This was a success, and I could now find primes up to 2.0e10:

Searching for the nearest primes to 20,000,000,000...

Starting the search at 19:09:41 

Initial array length tried was 10,000,000,000
Actual array length used was 10,000,000,000
The number of arrays used to make the limit manageable was 2

The last 20 primes are:

19,999,999,603                19,999,999,609
19,999,999,693                19,999,999,717
19,999,999,729                19,999,999,757
19,999,999,759                19,999,999,769
19,999,999,771                19,999,999,777
19,999,999,783                19,999,999,841
19,999,999,847                19,999,999,861
19,999,999,871                19,999,999,883
19,999,999,889                19,999,999,913
19,999,999,939                19,999,999,967

Completed the search in 0 days 0 hours 11 mins 33.2 seconds.

I was still a long way short of 2.0e19, but at least I had proved two things, namely that:

  • the object size in memory had been the problem, and
  •  it was no longer a problem.

4. Using an Offset Sieve®

This breakthrough led me to a series of insights and some fast decisions. If you’re only after the biggest primes, rather than all the primes:

  • Having split the limit into two arrays, why waste time even masking the lower array? Why not just run an offset sieve (1) on the highest array?
  • Indeed, why not split it into a hundred arrays and only mask the highest array?
  • Even better, why bother even doing that? Why not run an offset sieve on only the 1,000 numbers below the limit you are seeking? The chance of it not finding any primes was slim, as I knew from studying prime gaps.

If I was right, this would be a much faster way of finding big primes, orders of magnitude larger and than any of the previous methods had found. Hopefully, it would also smash the supposed 1.0e19 integer limit on 64-bit computers and, more importantly, finding big primes would no longer be limited by system memory, only the CPU speed.

The results of this approach were truly astonishing:

Searching for the nearest primes to 1,000,000,000,000,000...

The last 1,000 numbers were checked from 
999,999,999,999,000 to 999,999,999,999,999.

The last 20 primes are:

999,999,999,999,103           999,999,999,999,109
999,999,999,999,163           999,999,999,999,199
999,999,999,999,283           999,999,999,999,353
999,999,999,999,389           999,999,999,999,409
999,999,999,999,491           999,999,999,999,521
999,999,999,999,571           999,999,999,999,577
999,999,999,999,643           999,999,999,999,659
999,999,999,999,809           999,999,999,999,827
999,999,999,999,877           999,999,999,999,883
999,999,999,999,947           999,999,999,999,989

Completed the search in 0 hours 0 mins 17.0 seconds.

The code to do this is surprisingly compact:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created 12 March, 2018
 
Program description: uses a generator to perform an offset sieve of 
Eratosthenes on a small range of integers below a user-specified upper limit. 
 
Author: matta_idlecoder at protonmail dot com
"""
import timeit
import datetime
import numpy as np
 
timing_it = True
listing_primes = True
 
def init_prime_array(maximum):
    """Returns a small Boolean numpy array, initialised for eliminating primes
 
    """
    max_array_length = 1000  
    maximum += 1 if (maximum % 2 == 1) else 0  
    num_arrays = int(maximum / max_array_length)
 
    LastArray = np.ones(max_array_length, dtype=bool)
    LastArray[0::2] = False
 
    print("The last {:,d} numbers were checked from {:,} to {:,}.".
          format(max_array_length, max_array_length*(num_arrays - 1),
                 max_array_length*num_arrays - 1))
 
    return LastArray, num_arrays
 
 
def iprimes_upto(limit):
    """Offset sieve of Eratosthenes, using a Generator.
 
    Checks all the odds, not only prime multiples, but only in a small window
    below the maximum, usually set to 1000 wide.
    """
    is_prime, number_arrays = init_prime_array(limit)
    array_length = len(is_prime)
    root_limit = int(limit ** 0.5 + 1.5)
    n = 3
 
    prime_window_index = number_arrays - 1
    window_start = prime_window_index * array_length
    while n < root_limit:
        # This mapping finds the first index to eliminate in an offset array.
        # It doesn't matter if the first 10**15 arrays beneath it have not been
        # masked - or even created:
        n2 = n * 2
        rem_2n = window_start % n2
        rem_2n += n if (rem_2n <= n) else -n
        start_index = n2 - rem_2n
        is_prime[start_index:: n2] = False
        n += 2
 
    # Generator:
    for m in range(array_length):
        if is_prime[m]:
            n = window_start + m
            yield (n)
 
    return
 
#==============================  MAIN   ===================================
 
upper_limit = 1001
while upper_limit < 10**4 or upper_limit % 1000 != 0:
    upper_limit = int(
        input(
            "\nEnter a number >= 10,000 that is divisible by 1,000 that you want to find the nearest primes to: "))
    if upper_limit % 1000 != 0:
        print("Enter a number divisible by 1,000.")
    if upper_limit < 10**4:
        print("Try using a bigger number.")
 
print("\nSearching for the nearest primes to {:,}...\n".format(upper_limit))
 
if timing_it:
    print('Starting the search at',
              datetime.datetime.now().strftime("%H:%M:%S"), "\n")
    start_time = timeit.default_timer()
 
prime_set = list(iprimes_upto(upper_limit))
 
if not prime_set:
    print ("There were no primes found in the last 1,000 numbers below {:,}".format(upper_limit))
 
elif listing_primes:
    print ("\nThe last 20 primes are:\n")
    for prime1, prime2 in zip(prime_set[-20::][::2], prime_set[-20::][1::2]):
        print ("{:<35,}{:<,}".format(prime1, prime2))
else:
    prime_max = max(prime_set)
    print("The biggest prime found was {:,d}".format(prime_max))
 
if timing_it:
    prime_time = timeit.default_timer() - start_time
    days, hours = divmod(prime_time, 86400)
    hours, mins = divmod(hours, 3600)
    mins, secs = divmod(mins, 60)
    print(
        '\nCompleted the search in {:,} days {:,} hours {:,} mins {:0,.1f} seconds.'.format(
            int(days), int(hours), int(mins), secs))

 

This exercise had therefore been a success.

The final test was to prove to myself that the 64-bit data bus barrier on my computer was no longer a barrier to calculating big primes. Here’s the proof:

Searching for the nearest primes to 10,000,000,000,000,000,000...

The last 1,000 numbers were checked from 
9,999,999,999,999,999,000 to 9,999,999,999,999,999,999.

The last 20 primes are:

9,999,999,999,999,999,329     9,999,999,999,999,999,343
9,999,999,999,999,999,349     9,999,999,999,999,999,403
9,999,999,999,999,999,419     9,999,999,999,999,999,433
9,999,999,999,999,999,491     9,999,999,999,999,999,497
9,999,999,999,999,999,557     9,999,999,999,999,999,569
9,999,999,999,999,999,673     9,999,999,999,999,999,679
9,999,999,999,999,999,701     9,999,999,999,999,999,719
9,999,999,999,999,999,751     9,999,999,999,999,999,817
9,999,999,999,999,999,877     9,999,999,999,999,999,919
9,999,999,999,999,999,943     9,999,999,999,999,999,961

Completed the search in 0 hours 32 mins 1.6 seconds.

This meant that memory was now no longer the issue. In fact, there was no CPU swap activity at all, as the object being masked was only an array of 1,000 large numbers. So, theoretically, given enough time, the above code could calculate any prime on any hardware.

Strictly speaking, although this is still a sieve, the code has evolved to a point where it is no longer a Sieve of Eratosthenes. With a normal Sieve of Eratosthenes, you start from 2 and eliminate all its multiples, moving to the next number that has not yet been eliminated and repeating the process again and again. So, you find yourself eliminating in turn all the multiples of 3, 5, 7, 11, 13, 17, 19, etc. But since we are not masking array0, we don’t have the luxury of knowing which odd numbers in the lower array were prime. The algorithm therefore eliminates all multiples of ALL odd numbers up to the square root of the limit, but only in a small window of numbers below the limit we seek. It’s not the most elegant algorithm, but it’s certainly fast.

Conclusion

Finally, running my patented Offset Sieve® on my current Mac desktop, the code was benchmarked to see how it performed as the limit increased. Each time the limit was increased by a factor of ten it was found to take approximately 3.2 times longer to perform the calculation.  On any system, then, the increase in execution time will not exceed a time proportional to 3.2^log₁₀n.  In terms of big O algorithm analysis, we can simplify this, knowing that:


So we can say the offset odd prime mask algorithm is roughly O(√n).

Based on this measurement, here are the projected times for finding the largest primes up to the given limits (again, running on my MacMini):

Times to Calculate the Top Primes to Each Limit
Limit Time to Calc Status
1.0e13 1.4s  timed
1.0e14 4.3s  timed (=3.07 x 1.0e13)
1.0e15 13.5s  timed (=3.14 x 1.0e14)
1.0e16 42.8s  timed (=3.17 x 1.0e15)
1.0e17 2m 16s  timed (=3.18 x 1.0e16)
1.0e18 7m 29s  timed (=3.30 x 1.0e17)
1.0e19* 24m 38s  timed (=3.29 x 1.0e18)
1.0e20 1h 21m  projected
1.0e21 4h 28m  projected
1.0e22 14h 45m  projected
1.0e23 2d 41m  projected
1.0e24 6d 16h  projected
1.0e25 22d 2h  projected
1.0e26 10.5 weeks  projected

*Exceeds sys.maxsize (= 9.223e18)

Of course, a high-performance, number-crunching server in a cloud farm could do this in a fraction of the time, or calculate much more massive primes in the same time. After all, all it’s doing is masking a small binary array, over and over again.

As with previous mathematical postings, I’m sure this has been done before. If not, my agent can be contacted regarding my Fields Medal acceptance speech via the usual channels on the contact page.

OK, gotta go.  Off to see if SETI have found any six-fingered aliens.

 

 

Notes

(1) You heard it here first. Via the eavesdropping app I have on your phone, I shall be charging royalties for every time you use the phrase “Offset Prime Sieve”.

Leave a Reply

Your email address will not be published. Required fields are marked *