Faster Image Transforms With Cython

Graphical depiction of going from Python to Cython

This is the second post on how to accelerate Python with Cython. The previous post used Cython to declare static C data types in Python, to speed up the runtime of a prime number generator. In this post we shall be modifying a more complex program, one that performs an image transform on a map. This will allow me to demonstrate some more advanced Cython techniques, such as importing C functions from the C math library, using memory views of Numpy arrays and turning off the Python global interpreter lock (GIL).

As with the previous post, we shall be making a series of Cython modifications to Python code, noting the speed improvement with each step. As we go, we’ll be using the Cython compiler’s annotation feature to see which lines are converted into C at each stage, and which lines are still using Python objects and functions. And as we tune the code to run at increasingly higher speeds, we shall be profiling it to see what’s still holding us up, and where to refocus our attention.

Although I will be using Python 3 on a Mac, the instructions I give will mostly be platform agnostic:  I will assume you have installed Cython on your system (on Windows, Linux or OS/X) and have followed and understood the installation and testing steps in my previous post. This will be essential if you are to follow the steps I outline below. As stated in the previous post, Cython is not for Python beginners.

Continue reading “Faster Image Transforms With Cython”

Parallel Python – 1: Prime Numbers

Graphical depiction of Python running in parallel on 8 CPUs

With the impending demise of Moore’s Law, multiple cores are a common manufacturers’ workaround for improving hardware performance, whether or not your installed apps can use the parallel architecture.

And with each new release of Python, parallel programming gets even easier. But the degree to which your code can use your multiple cores will depend on the kind of problem you are trying to solve, on the implementation of Python you are running and, as it turns out, how truly parallel the underlying architecture of your system actually is.

The goal of this series of posts is to see how adaptable some of my existing code is to take advantage of multi-core hardware, to see what changes need be made to scale it, and to measure the performance improvements from the exercise. Continue reading “Parallel Python – 1: Prime Numbers”