Creating Custom Slugs in Django

Linotype Slug

Often it’s convenient to begin a Django project with a simple URL scheme, using the primary model’s database key as a reference to each item, to make it easy to retrieve each item and display it in a web page. Later, a decision is usually made to change the URL scheme to a more secure form, one that hides the database primary keys from users. To do this, you have a choice: generating URL ‘slugs’ from the names of the items you are showing, or encrypting the names using a scheme such as UUIDs or hashIDs.

Slugs

The term slug comes from the late 19th century, and was the name given to a whole line of text, cast in hot metal by Linotype machines (‘type-casting’), which were then assembled into a page of slugs for high volume printing.(1) 

Django, which was first developed by a newspaper, has inherited the term to describe a way of simplifying names and phrases to use in web page URLs. There are now several slug implementations around: some systems drop the articles ‘a’ and ‘the’, while others like Django leave them in. The effect is always to remove punctuation, replace non-English letters and capitals with lowercase, and to replace the spaces with hyphens so that, for example, a book title such as Finnegan’s Wake becomes simply /finnegans-wake/.

The Problem

But what if your database item names alone are not unique? Newspaper articles, reprints of books, remakes of movies and real people often have the same name, which make slugs of their names unsuitable for unique identifiers.

One solution is to combine item names with their database primary keys. This makes the URLs unique, but leaves the problem that it still indirectly gives away the size of your database. A better option is to use a combination of important information about each item to create unique URLs, which can then be used to retrieve the item from the database. These may be fields that are unique together, or fields that together give the user a short summary of the saved data, such as the type of story, the genre or its creation date. A book or newspaper website might combine the author, genre and date, whereas for movies the title combined with the year of release is usually enough to create a unique identifier.  

This post describes how to convert a pre-existing Django project that uses simple numeric primary keys for the URLs, to a custom slug URL scheme, one that combines multiple model fields to generate unique slug-based URLs. It will cover not only the changes needed in the model, but also your templates, views, URL scheme, forms and unit tests. 

To show all the steps necessary to convert to a slug-based URL system, I will use the example of a movie database project I worked on recently, to itemise the changes necessary for an established Django project. The slug system I will implement will combine the title and year of each movie to create more secure and readable URLs. Although I have used only two fields, the steps are extensible to any number of fields, and applicable to any project where you want to create a custom, multi-field, slug-based URL system. 

Note that once the changes are made, the primary database model will still have the same system of primary keys. The goal is to create a parallel, non-numeric system for uniquely identifying, retrieving and displaying database items using only text slugs.

Using Primary Keys in URLs

An outline of my main model, before I introduced slugs, was as follows:

class Movie(models.Model):
    Title = models.CharField(max_length=MAX_TITLE)
    Year = models.PositiveSmallIntegerField()
    Director = models.CharField(max_length=MAX_DIRECTOR)
    Actors = models.TextField(max_length=MAX_ACTORS)
    #etc
    ...

    class Meta:
        ordering = ['Title']

    def __str__(self):
        return "{} ({})".format(self.Title, self.Year)

    def get_absolute_url(self):
        return reverse('main_app:movie_detail', args=[str(self.id)])

    def save(self, *args, new_title=True, **kwargs):
        if new_title:
            if Movie.objects.filter( Title=self.Title, 
                                     Year=self.Year).exists():
                return False
        super(Movie, self).save(*args, **kwargs)
        return True

Note that the override for the save() method ensured that the title and year combination was not already in the database.

The main view for my movie data was the function-based movie detail page in views.py, whose job was to retrieve and display a valid movie record using the primary key:

def MovieDetailPage(request, pk):
    try:
        movie = Movie.objects.get(pk=pk)
    except Movie.DoesNotExist:
        raise Http404("Sorry, that movie in not in the database.")

    template_name = 'main_app/movie_detail.html'
    context  = { 'movie'      :  movie,
                 'directors'  :  movie.Director.split(', '),
                 'actors'     :  movie.Actors.split(', '),
                 'genres'     :  movie.Genre.split(', '),
                 'writers'    :  writers,
                    }
    return render(request, template_name, context)

This was called via a primary key URL in urls.py:

app_name = 'main_app'
urlpatterns = [
	...
    path('movie/<int:pk>/', MovieDetailPage, name='movie_detail'),
	...
]

The core of the changes I will now describe will be to introduce a unique slug field into the Movie model, to provide an alternative, textual method for retrieving and displaying movie items from the database.

The Changes to Make

  1. First, write your slug-maker and test it on the fields you want to use, checking that it’s doing what you want it to do. The best place to do this is in the Django shell:
>>> from main_app.models import Movie
>>> from django.utils.text import slugify
>>> movie.Title = "The Good, the Bad and the Ugly"
>>> movie.Year = 1966
>>> slugify('-'.join([movie.Title, str(movie.Year)]), allow_unicode=False)

'the-good-the-bad-and-the-ugly-1966'

If I’d wanted to, I could have added the director or any other field to the slug. As it stands, though, it is sufficient as a unique identifier for a film.

 

2. Next, create the new slug field in your Django model, remembering to modify your get_absolute_url()  and save() override methods:

class Movie(models.Model):
    Title = models.CharField(max_length=MAX_TITLE)
    Year = models.PositiveSmallIntegerField()
    Director = models.CharField(max_length=MAX_DIRECTOR)
    Actors = models.TextField(max_length=MAX_ACTORS)
    ...
    slug = models.SlugField(
        default='',
        editable=False,
        max_length=MAX_TITLE,
    )

    def get_absolute_url(self):
        return reverse('main_app:movie_detail', 
                        args=[self.slug])

    def save(self, *args, new_title=True, **kwargs):
        if new_title:
            if Movie.objects.filter(Title=self.Title, 	
                             Year=self.Year).exists():
                return False
            self.slug = slugify('-'.join([self.Title, 
                                str(self.Year)]), 											
                                allow_unicode=False)
        super(Movie, self).save(*args, **kwargs)
        return True

Note that the slugs are only created when the movie items are first created. This means that pre-existing movie items in the database will have their slug attributes set to the Null string “”. This will require you to write a utility to set them. More on this later.

Some of you may be wondering why not just create a property method, which would generate the correct slug from the title and year each time it was called? This would avoid having to write a utility to create all the slugs for movies already in the database. This would only need the @property decorator to wrap the slug method, so as not to break any code that might call on Movie.slug:

class Movie(models.Model):
    ...
    @property
    def slug(self):
        ...

That’s the slug creation part taken care of, but what about retrieving objects? Unfortunately,  calling movie.slug may now work, but any lines of Django ORM code that might try to retrieve model objects by saying:

movie = Movie.objects.get(pk=pk)

will now have to say:

movie = Movie.objects.get(slug=slug)

and will not work, because the Django ORM will be unable to issue SQL commands to the database using a field that doesn’t exist. The slug would actually exist in Python, not the SQL database, which wouldn’t know anything about it.

The simplest solution is to create the slug as another model field, and to write a short utility to create the slugs for existing movies (see later).

 

3. Remember to run your migration commands once you have made these changes:

$ ./manage.py makemigrations
$ ./manage.py migrate

 

4. Next, change all your URLs that use /<int:pk>/ to say /<str:slug>/. For example, in urls.py:

path('movie/<int:pk>/', MovieDetailPage, name='movie_detail'),

now becomes:

path('movie/<str:slug>/', MovieDetailPage, name='movie_detail'),

 

5. You will then need to change all your corresponding view functions that require a primary key to expect a slug, so that, for example:

def MovieDetailPage(request, pk):

would become:

def MovieDetailPage(request, slug):

Which means all calls to this function now need to use the slug,  rather than the primary key:

return MovieDetailPage(request, movie_obj.pk)

becomes:

return MovieDetailPage(request, movie_obj.slug)

 

6. Next, you will have to change all your templates that link to your detail page via your primary key to use the new slug-based URL. For example, if your are using URL shortcuts in the Django template language, these now need to pass the slug, not the primary key:

<a href="{% url 'main_app:movie_detail' film.pk %}">

becomes:

<a href="{% url 'main_app:movie_detail' film.slug %}"> 

 

7. If you are using forms to check newly entered model data, and using the ModelForm in particular to leverage your model fields, you might want to check that you’re not telling your form to display __all__ fields. Best practice is to name explicitly only the model fields you want to see in your form, so that you’re not exposing all the fields to the user (this can be an embarrassing mistake). But if you’re not using __all__, there should be no changes to make in your form:

class MovieForm(ModelForm):
    class Meta:
        model = Movie
        # fields = __all__  # DO NOT USE THIS LINE
        fields = ['Title', 'Year', 'Director', 'Actors',
                     'Genre', 'Plot', 'Writer']

    def clean_Title(self):
        #etc
        ...

 

8.  As explained earlier, if you have pre-existing items in your database, their slug fields will initially only be empty strings. How to set them? The easiest way is to create your new slugs by running a simple utility on all the objects already in your database. As it turns out, regardless of how many fields you use to generate your slug, it only takes a few lines of Python in the Django shell to do this:

>>> from main_app.models import Movie
>>> from django.utils.text import slugify
>>> for movie in Movie.objects.all():
	# INSERT YOUR OWN SLUG GENERATOR HERE:
...     movie.slug = slugify('-'.join([movie.Title, str(movie.Year)]), allow_unicode=False)
...     movie.save(new_title=False)

 

9. You should now be ready to go, but as a responsible engineer, you have learned always to test samples of real data too. The good news is now that you have created all your slugs in your live database, you can now run your complex integration unit tests on samples of your real data. But since your main database model has changed, your complex integration tests will no longer work. To create a new sample of your real database, a test fixture, you will have to re-run the commands you used to create them. For my testing, I created a new sample of eight non-sequential instances from my main model, plus all the data from my other models:

$ ./manage.py dumpdata main_app.Movie --pks 1,2,3,4,6,7,9,10  > main_app/fixtures/movie_db.json

$ ./manage.py dumpdata --exclude main_app.Movie  > main_app/fixtures/other_tables.json

All your complex integration tests should now all work just fine on your test fixtures with the new slug-based URLs:

class ComplexIntegrationTests(TestCase):
    fixtures = ['movie_db.json', 'other_tables.json']
    def test_deleting_a_movie(self):
        response = self.client.get(
                    reverse('main_app:delete_movie', 
                        args=['2001-a-space-odyssey-1968']))
        self.assertEqual(response.status_code, 200)
        self.assertTemplateUsed( ...
        self.assertContains(...

 

Conclusion

And that’s pretty much it. Your database primary keys will now now be hidden from prying eyes, as will be the size of your database, and your URLs will be infinitely more readable and user-friendly.

 

Software Used

Django 3.0.1
Python 3.7.1
Pipenv 2018.11.26
PostgreSQL 12.1
Django Debug Toolbar 2.2

 

References

(1) Linotype machines were a pretty amazing invention for 1884, and were used right up until the 1970s, an incredible lifespan for any technology. In their day they were state-of-the-art, and the bridge between nineteenth century manual typesetting and the phototypesetters of the late twentieth century. 

Linotype Machine

They were essentially a combination of typewriter, mould-maker and molten metal caster, able to convert line after line of a typesetter’s keystrokes instantly into linear metal moulds, which were immediately used to cast printing slugs from hot metal (see top photo), each ready to print a line of text. Think all-in-one typewriter, mould-maker and hot metal caster. 

The invention of Linotype machines did not automate printing, just the creation of printing plates. They removed the need for manual typesetting, a job held by a young Mark Twain (Sam Clemens) in Hannibal, Missouri, only a few hours’ drive from where Django began in Lawrence, Kansas, continuing the tradition of American Midwest publishing.

If you’re into the history of communication and publishing technology, of which Django is a part, you should find them fascinating for another reason: it wasn’t long before someone had the great idea to hook one up to a telegraph line and control it remotely, which led to the TeleTypeSetter.

Using iPython for Sys Admin

Python on mainframe terminal

Manipulating files and programs using Unix shell file programs can often be a bit of a pain, even for seasoned programmers. This can be due to how infrequently you use them, or because you are often moving between OS/X, Windows and Linux, and their subtle differences can often trip you up. 

I used to be fairly proficient at them, but nowadays find I use them so rarely that I often have to revise what I used to know, even to achieve the most basic tasks. For many coders, the Unix shell programming language has become like an obscure language you only brush up on when you need to speak to a distant relative at Christmas time.

Fortunately, if you know and love Python, most of what you need to do with the Unix shell for filename searching, for-loops and file permissions can easily be done with iPython, without having to spend hours revising what you first learnt to do in Unix shell years ago. 

Continue reading “Using iPython for Sys Admin”

Faster Image Transforms With Cython

Python to Cython

This is the second post on how to accelerate Python with Cython. The previous post used Cython to declare static C data types in Python, to speed up the runtime of a prime number generator. In this post we shall be modifying a more complex program, one that performs an image transform on a map. This will allow me to demonstrate some more advanced Cython techniques, such as importing C functions from the C math library, using memory views of Numpy arrays and turning off the Python global interpreter lock (GIL).

As with the previous post, we shall be making a series of Cython modifications to Python code, noting the speed improvement with each step. As we go, we’ll be using the Cython compiler’s annotation feature to see which lines are converted into C at each stage, and which lines are still using Python objects and functions. And as we tune the code to run at increasingly higher speeds, we shall be profiling it to see what’s still holding us up, and where to refocus our attention.

Although I will be using Python 3 on a Mac, the instructions I give will mostly be platform agnostic:  I will assume you have installed Cython on your system (on Windows, Linux or OS/X) and have followed and understood the installation and testing steps in my previous post. This will be essential if you are to follow the steps I outline below. As stated in the previous post, Cython is not for Python beginners.

Continue reading “Faster Image Transforms With Cython”

From Python To Cython

This longer post will show you some of the coding skills you’ll need for turning your existing Python code into the Python-C hybrid we call Cython. In doing so, we’ll be digging into some C static data types, to see how much faster Python code will run, and restructuring some Python code along the way for maximum speed.

With Cython, all the benefits of Python are still yours – easily readable code, fast development cycles, powerful high level commands, maintainability, a suite of web development frameworks, a huge standard library for data science, machine learning, imaging, databases and security, plus easy manipulation of files, documents and strings. You should still use Python for all these things – these are what Python does best. But you should also consider combining them with Cython to speed up the computationally intensive Python functions that needs to be fast. Continue reading “From Python To Cython”

Parallel Python – 1: Prime Numbers

With the impending demise of Moore’s Law, multiple cores are a common manufacturers’ workaround for improving hardware performance, whether or not your installed apps can use the parallel architecture.

And with each new release of Python, parallel programming gets even easier. But the degree to which your code can use your multiple cores will depend on the kind of problem you are trying to solve, on the implementation of Python you are running and, as it turns out, how truly parallel the underlying architecture of your system actually is.

The goal of this series of posts is to see how adaptable some of my existing code is to take advantage of multi-core hardware, to see what changes need be made to scale it, and to measure the performance improvements from the exercise. Continue reading “Parallel Python – 1: Prime Numbers”

Transverse Mercator With Python

Mercator mapWith global warming melting the icecaps and opening up the poles for oil exploration and tourism, I think it’s time for a new standard wall map, one that shifts those distorted map regions away from major land masses, and places the polar regions where we can see them. That way, our cruise ship and oil tanker captains can navigate more easily through the clear, blue Arctic Ocean, unimpeded by any tiresome ice-pack.

I particularly love that oil companies want to use the new Arctic Ocean sea lanes to transport their oil to market faster. Is it irony, or some form of rare, extinction-level stupidity that only comes around one every few thousand years? Hard to tell. But I digress.

Continue reading “Transverse Mercator With Python”

The Toolkit – Updated

SpannersAlas, dear Windows, it was not to be. I’m afraid I’ve been seeing other platforms. Specifically, I’ve been spending time with OS/X behind your back.  It was just too painful to be with you.  All those arguments, the shouting, the hair-pulling, the throwing things across the room.

Sure, you’re a lot less volatile than you used to be. And you don’t do the tearful breakdown thing any more. Yes, I know I can do almost anything with you that I can with OS/X, but everything just takes longer. OK, you want me to be honest? Fine. I find you excruciatingly frustrating to be with.  Why is it always ME navigating around YOUR moods? I mean, why is it that after 25 years, everything with you is STILL a workaround?

Continue reading “The Toolkit – Updated”

Eratosthenes 2: Swifter, Further, Cooler

Human & alien hands

 

This post describes the process I used to design an algorithm that allows you to implement a modified Sieve of Eratosthenes to bypass the memory limitations of your computer and, in the process, to find big primes well beyond your 64-bit computer’s supposed numerical limit of 2.0e63 (9.223e18). Beyond that, with this algorithm, the only limitation is the speed of your CPU.

Continue reading “Eratosthenes 2: Swifter, Further, Cooler”

A Faster Sieve of Eratosthenes

SievePic

The Sieve of Eratosthenes is a beautifully elegant way of finding all the prime numbers up to any limit. The goal of this post is to implement the algorithm as efficiently as possible in standard Python 3.5, without resorting to importing any modules, or to running it on faster hardware.

Eratosthenes was a Greek scholar who lived in Alexandria (276BC to 194BC) in the so-called Hellenistic period. He was working about a century after Alexander, and about a century before the Romans arrived to impose their cultural desert and call it peace. And then do nothing with the body of knowledge they discovered. Literally. For over 1,600 years, if you count Constantinople. Not a damn thing.

So much for overly religious, centralised, bureaucratic superstates, obsessed with conquest. But I digress.

Continue reading “A Faster Sieve of Eratosthenes”

Using AppleScript To Launch Python

OK, one for the Mac users. Continuing the  theme of user interfaces, here’s a simple but powerful way of using AppleScript to create a user interface for your Python programs and shell scripts and sending the results to just about any application installed on your Mac.

This solution has the advantage over Python’s native Tkinter in that the development time is much faster, and uses the speech synthesis features of OS/X to make your code much easier to use for the non-technical, elderly or visually impaired.

Continue reading “Using AppleScript To Launch Python”

Finding Primes Using Complex Numbers

Pythagoras In LegoWith complex numbers, I always feel as if I’m getting a glimpse of something truly awesome that lies hidden within mathematics. The first time I understood how they worked, I thought it was some form of magic.

I get the same feeling with prime numbers. Like many, I’ve looked at them from all angles – prime gaps, large primes, prime densities, prime sieves – and they continue to fascinate. A few months ago I was thumbing through Henry Warren’s programmers cookbook Hacker’s Delight (A) and discovered a whole chapter on the various formulas for (some of) them. Mind-bending stuff.

Continue reading “Finding Primes Using Complex Numbers”

GUI Template For Python: Part 2

This is the second of two posts on how to quickly create a Tkinter dashboard for your command line Python programs. The Tkinter widgets and programming techniques it introduces are a sequel to the previous post.

So far, you have an interactive graphical way of opening a file to analyse it in some way with your own logic, entering text to use as triggers or search strings, setting your own program flags on/off using check boxes, switching between two or more mutually exclusive program flags using radio boxes, controlling access to widgets and the variables they control, calling your own logic, and saving your results in a new file.

This post will build on these skills by showing how to create a dashboard to accept numerical input, perform different kinds of type- and value-checking, and select multiple input files simultaneously using a Tkinter GUI file selector. The solution will be multi-platform, and is shown running above on (from left) Windows 10, Mac OS/X and Linux Mint above. This post will explain how to create the same thing for your own program. Before proceeding, make sure you’ve read and understood the previous post.

Continue reading “GUI Template For Python: Part 2”

GUI Template For Python: Part 1

The next problem I needed to solve was to come up with a simple graphical user interface (GUI) template as a front-end for configuring and launching any Python code or module I may wish to write or run. Initial impetus: I didn’t want to have to write a user interface from scratch every time I wrote some Python code to manipulate text, data or files. Bonus reason: if I made the GUI template generic enough, others might be able to use it to create their own user interfaces.

This would solve a problem that occurs in many technical fields. A university professor may have a post-doc researcher on her team, one who has written a complex command line program performing, e.g. image processing, AI or genetic analysis. At some stage, there may be some highly repetitive tests that can be performed by someone less technical, freeing up the researcher. She wouldn’t want him running these repetitive command-line tests with code only he knows how to run or, worse, sitting around designing complex user interfaces for others to use it. It would be better to get an intern or research assistant (or even a temp) to run the tests using a GUI that the researcher can knock up in a day or two. This would free him up to concentrate on his research. And finish it faster. Continue reading “GUI Template For Python: Part 1”

Analysis Tool For Literary Texts

The first problem I wanted to solve was to write a short program that would allow me to perform basic textual analysis of any work of literature.

I wanted to be able to study the richness of different authors’ language by looking at how they used neologisms (their own made up words), pseudo-archaisms, invented their own contractions for authentic speech, or used hyphenated compound words, etc. I also wanted to be able to list all the characters and place names (proper nouns) mentioned in a text.

Continue reading “Analysis Tool For Literary Texts”

The Toolkit

SpannersOK, first things first. What tools will I be using?

After talking to a good friend who is an experienced coder, I decided on the following:

The IDE

Spyder, running Python 3. It seems to have everything I need, including a good debugger, a variable explorer, hot-linking to function definitions, auto-completion typing, Matplotlib, QT, plus a choice of either a Python and iPython console (each with their different strengths). The bundle I went with is Spyder for WinPython-64bit (WinPython-64bit-3.4.4.3Qt5). The QT will be useful later.

Continue reading “The Toolkit”