Creating Custom Slugs in Django

Linotype Slug

Often it’s convenient to begin a Django project with a simple URL scheme, using the primary model’s database key as a reference to each item, to make it easy to retrieve each item and display it in a web page. Later, a decision is usually made to change the URL scheme to a more secure form, one that hides the database primary keys from users. To do this, you have a choice: generating URL ‘slugs’ from the names of the items you are showing, or encrypting the names using a scheme such as UUIDs or hashIDs.

Slugs

The term slug comes from the late 19th century, and was the name given to a whole line of text, cast in hot metal by Linotype machines (‘type-casting’), which were then assembled into a page of slugs for high volume printing.(1) 

Django, which was first developed by a newspaper, has inherited the term to describe a way of simplifying names and phrases to use in web page URLs. There are now several slug implementations around: some systems drop the articles ‘a’ and ‘the’, while others like Django leave them in. The effect is always to remove punctuation, replace non-English letters and capitals with lowercase, and to replace the spaces with hyphens so that, for example, a book title such as Finnegan’s Wake becomes simply /finnegans-wake/.

The Problem

But what if your database item names alone are not unique? Newspaper articles, reprints of books, remakes of movies and real people often have the same name, which make slugs of their names unsuitable for unique identifiers.

One solution is to combine item names with their database primary keys. This makes the URLs unique, but leaves the problem that it still indirectly gives away the size of your database. A better option is to use a combination of important information about each item to create unique URLs, which can then be used to retrieve the item from the database. These may be fields that are unique together, or fields that together give the user a short summary of the saved data, such as the type of story, the genre or its creation date. A book or newspaper website might combine the author, genre and date, whereas for movies the title combined with the year of release is usually enough to create a unique identifier.  

This post describes how to convert a pre-existing Django project that uses simple numeric primary keys for the URLs, to a custom slug URL scheme, one that combines multiple model fields to generate unique slug-based URLs. It will cover not only the changes needed in the model, but also your templates, views, URL scheme, forms and unit tests. 

To show all the steps necessary to convert to a slug-based URL system, I will use the example of a movie database project I worked on recently, to itemise the changes necessary for an established Django project. The slug system I will implement will combine the title and year of each movie to create more secure and readable URLs. Although I have used only two fields, the steps are extensible to any number of fields, and applicable to any project where you want to create a custom, multi-field, slug-based URL system. 

Note that once the changes are made, the primary database model will still have the same system of primary keys. The goal is to create a parallel, non-numeric system for uniquely identifying, retrieving and displaying database items using only text slugs.

Using Primary Keys in URLs

The outline of my main movie model, before I introduced slugs, was as follows:

class Movie(models.Model):
    Title = models.CharField(max_length=MAX_TITLE)
    Year = models.PositiveSmallIntegerField()
    Director = models.CharField(max_length=MAX_DIRECTOR)
    Actors = models.TextField(max_length=MAX_ACTORS)
    #etc
    ...

    class Meta:
        ordering = ['Title']

    def __str__(self):
        return "{} ({})".format(self.Title, self.Year)

    def get_absolute_url(self):
        return reverse('main_app:movie_detail', args=[str(self.id)])

    def save(self, *args, new_title=True, **kwargs):
        if new_title:
            if Movie.objects.filter( Title=self.Title, 
                                     Year=self.Year).exists():
                return False
        super(Movie, self).save(*args, **kwargs)
        return True

Note that the override for the save() method ensured that the title and year combination was not already in the database.

The main view for my movie data was the function-based movie detail page in views.py, whose job was to retrieve and display a valid movie record using the primary key:

def MovieDetailPage(request, pk):
    try:
        movie = Movie.objects.get(pk=pk)
    except Movie.DoesNotExist:
        raise Http404("Sorry, that movie in not in the database.")

    template_name = 'main_app/movie_detail.html'
    context  = { 'movie'      :  movie,
                 'directors'  :  movie.Director.split(', '),
                 'actors'     :  movie.Actors.split(', '),
                 'genres'     :  movie.Genre.split(', '),
                 'writers'    :  writers,
                    }
    return render(request, template_name, context)

This was called via a primary key URL in urls.py:

app_name = 'main_app'
urlpatterns = [
	...
    path('movie/<int:pk>/', MovieDetailPage, name='movie_detail'),
	...
]

The core of the changes I will now describe will be to introduce a unique slug field into the Movie model, to provide an alternative, textual method for retrieving and displaying movie items from the database.

The Changes to Make

  1. First, write your slug-maker and test it on the fields you want to use, checking that it’s doing what you want it to do. The best place to do this is in the Django shell:
>>> from main_app.models import Movie
>>> from django.utils.text import slugify
>>> movie.Title = "The Good, the Bad and the Ugly"
>>> movie.Year = 1966
>>> slugify('-'.join([movie.Title, str(movie.Year)]), allow_unicode=False)

'the-good-the-bad-and-the-ugly-1966'

If I’d wanted to, I could have added the director or any other field to the slug. As it stands, though, it is sufficient as a unique identifier for a film.

 

2. Next, create the new slug field in your Django model, remembering to modify your get_absolute_url()  and save() override methods:

class Movie(models.Model):
    Title = models.CharField(max_length=MAX_TITLE)
    Year = models.PositiveSmallIntegerField()
    Director = models.CharField(max_length=MAX_DIRECTOR)
    Actors = models.TextField(max_length=MAX_ACTORS)
    ...
    slug = models.SlugField(
        default='',
        editable=False,
        max_length=MAX_TITLE,
    )

    def get_absolute_url(self):
        return reverse('main_app:movie_detail', 
                        args=[self.slug])

    def save(self, *args, new_title=True, **kwargs):
        if new_title:
            if Movie.objects.filter(Title=self.Title, 	
                             Year=self.Year).exists():
                return False
            self.slug = slugify('-'.join([self.Title, 
                                str(self.Year)]), 											
                                allow_unicode=False)
        super(Movie, self).save(*args, **kwargs)
        return True

Note that the slugs are only created when the movie items are first created. This means that pre-existing movie items in the database will have their slug attributes set to the Null string “”. This will require you to write a utility to set them. More on this later.

Some of you may be wondering why not just create a property method, which would generate the correct slug from the title and year each time it was called? This would avoid having to write a utility to create all the slugs for movies already in the database. This would only need the @property decorator to wrap the slug method, so as not to break any code that might call on Movie.slug:

class Movie(models.Model):
    ...
    @property
    def slug(self):
        ...

That’s the slug creation part taken care of, but what about retrieving objects? Unfortunately,  calling movie.slug may now work, but any lines of Django ORM code that might try to retrieve model objects by saying:

movie = Movie.objects.get(pk=pk)

will now have to say:

movie = Movie.objects.get(slug=slug)

and will not work, because the Django ORM will be unable to issue SQL commands to the database using a field that doesn’t exist. The slug would actually exist in Python, not the SQL database, which wouldn’t know anything about it.

The simplest solution is to create the slug as another model field, and to write a short utility to create the slugs for existing movies (see later).

 

3. Remember to run your migration commands once you have made these changes:

$ ./manage.py makemigrations
$ ./manage.py migrate

 

4. Next, change all your URLs that use /<int:pk>/ to say /<str:slug>/. For example, in urls.py:

path('movie/<int:pk>/', MovieDetailPage, name='movie_detail'),

now becomes:

path('movie/<str:slug>/', MovieDetailPage, name='movie_detail'),

 

5. You will then need to change all your corresponding view functions that require a primary key to expect a slug, so that, for example:

def MovieDetailPage(request, pk):

would become:

def MovieDetailPage(request, slug):

Which means all calls to this function now need to use the slug,  rather than the primary key:

return MovieDetailPage(request, movie_obj.pk)

becomes:

return MovieDetailPage(request, movie_obj.slug)

 

6. Next, you will have to change all your templates that link to your detail page via your primary key to use the new slug-based URL. For example, if your are using URL shortcuts in the Django template language, these now need to pass the slug, not the primary key:

<a href="{% url 'main_app:movie_detail' film.pk %}">

becomes:

<a href="{% url 'main_app:movie_detail' film.slug %}"> 

 

7. If you are using forms to check newly entered model data, and using the ModelForm in particular to leverage your model fields, you might want to check that you’re not telling your form to display __all__ fields. Best practice is to name explicitly only the model fields you want to see in your form, so that you’re not exposing all the fields to the user (this can be an embarrassing mistake). But if you’re not using __all__, there should be no changes to make in your form:

class MovieForm(ModelForm):
    class Meta:
        model = Movie
        # fields = __all__  # DO NOT USE THIS LINE
        fields = ['Title', 'Year', 'Director', 'Actors',
                     'Genre', 'Plot', 'Writer']

    def clean_Title(self):
        #etc
        ...

 

8.  As explained earlier, if you have pre-existing items in your database, their slug fields will initially only be empty strings. How to set them? The easiest way is to create your new slugs by running a simple utility on all the objects already in your database. As it turns out, regardless of how many fields you use to generate your slug, it only takes a few lines of Python in the Django shell to do this:

>>> from main_app.models import Movie
>>> from django.utils.text import slugify
>>> for movie in Movie.objects.all():
	# INSERT YOUR OWN SLUG GENERATOR HERE:
...     movie.slug = slugify('-'.join([movie.Title, str(movie.Year)]), allow_unicode=False)
...     movie.save(new_title=False)

 

9. You should now be ready to go, but as a responsible engineer, you have learned always to test samples of real data too. The good news is now that you have created all your slugs in your live database, you can now run your complex integration unit tests on samples of your real data. But since your main database model has changed, your complex integration tests will no longer work. To create a new sample of your real database, a test fixture, you will have to re-run the commands you used to create them. For my testing, I created a new sample of eight non-sequential instances from my main model, plus all the data from my other models:

$ ./manage.py dumpdata main_app.Movie --pks 1,2,3,4,6,7,9,10  > main_app/fixtures/movie_db.json

$ ./manage.py dumpdata --exclude main_app.Movie  > main_app/fixtures/other_tables.json

All your complex integration tests should now all work just fine on your test fixtures with the new slug-based URLs:

class ComplexIntegrationTests(TestCase):
    fixtures = ['movie_db.json', 'other_tables.json']
    def test_deleting_a_movie(self):
        response = self.client.get(
                    reverse('main_app:delete_movie', 
                        args=['2001-a-space-odyssey-1968']))
        self.assertEqual(response.status_code, 200)
        self.assertTemplateUsed( ...
        self.assertContains(...

 

Conclusion

And that’s pretty much it. Your database primary keys will now now be hidden from prying eyes, as will be the size of your database, and your URLs will be infinitely more readable and user-friendly.

 

Software Used

Django 3.0.1
Python 3.7.1
Pipenv 2018.11.26
PostgreSQL 12.1
Django Debug Toolbar 2.2

 

References

(1) Linotype machines were a pretty amazing invention for 1884, and were used right up until the 1970s, an incredible lifespan for any technology. In their day they were state-of-the-art, and the bridge between nineteenth century manual typesetting and the phototypesetters of the late twentieth century. 

Linotype Machine

They were essentially a combination of typewriter, mould-maker and molten metal caster, able to convert line after line of a typesetter’s keystrokes instantly into linear metal moulds, which were immediately used to cast printing slugs from hot metal (see top photo), each ready to print a line of text. Think all-in-one typewriter, mould-maker and hot metal caster. 

The invention of Linotype machines did not automate printing, just the creation of printing plates. They removed the need for manual typesetting, a job held by a young Mark Twain (Sam Clemens) in Hannibal, Missouri, only a few hours’ drive from where Django began in Lawrence, Kansas, continuing the tradition of American Midwest publishing.

If you’re into the history of communication and publishing technology, of which Django is a part, you should find them fascinating for another reason: it wasn’t long before someone had the great idea to hook one up to a telegraph line and control it remotely, which led to the TeleTypeSetter.

     

Using iPython for Sys Admin

Python appearing to run on a mainframe terminal

Manipulating files and programs using Unix shell file programs can often be a bit of a pain, even for seasoned programmers. This can be due to how infrequently you use them, or because you are often moving between OS/X, Windows and Linux, and their subtle differences can often trip you up. 

I used to be fairly proficient at them, but nowadays find I use them so rarely that I often have to revise what I used to know, even to achieve the most basic tasks. For many coders, the Unix shell programming language has become like an obscure language you only brush up on when you need to speak to a distant relative at Christmas time.

Fortunately, if you know and love Python, most of what you need to do with the Unix shell for filename searching, for-loops and file permissions can easily be done with iPython, without having to spend hours revising what you first learnt to do in Unix shell years ago. 

Continue reading “Using iPython for Sys Admin”

Faster Image Transforms With Cython

Graphical depiction of going from Python to Cython

This is the second post on how to accelerate Python with Cython. The previous post used Cython to declare static C data types in Python, to speed up the runtime of a prime number generator. In this post we shall be modifying a more complex program, one that performs an image transform on a map. This will allow me to demonstrate some more advanced Cython techniques, such as importing C functions from the C math library, using memory views of Numpy arrays and turning off the Python global interpreter lock (GIL).

As with the previous post, we shall be making a series of Cython modifications to Python code, noting the speed improvement with each step. As we go, we’ll be using the Cython compiler’s annotation feature to see which lines are converted into C at each stage, and which lines are still using Python objects and functions. And as we tune the code to run at increasingly higher speeds, we shall be profiling it to see what’s still holding us up, and where to refocus our attention.

Although I will be using Python 3 on a Mac, the instructions I give will mostly be platform agnostic:  I will assume you have installed Cython on your system (on Windows, Linux or OS/X) and have followed and understood the installation and testing steps in my previous post. This will be essential if you are to follow the steps I outline below. As stated in the previous post, Cython is not for Python beginners.

Continue reading “Faster Image Transforms With Cython”

From Python To Cython

Graphical depiction of going from Python to Cython

This longer post will show you some of the coding skills you’ll need for turning your existing Python code into the Python-C hybrid we call Cython. In doing so, we’ll be digging into some C static data types, to see how much faster Python code will run, and restructuring some Python code along the way for maximum speed.

With Cython, all the benefits of Python are still yours – easily readable code, fast development cycles, powerful high level commands, maintainability, a suite of web development frameworks, a huge standard library for data science, machine learning, imaging, databases and security, plus easy manipulation of files, documents and strings. You should still use Python for all these things – these are what Python does best. But you should also consider combining them with Cython to speed up the computationally intensive Python functions that needs to be fast. Continue reading “From Python To Cython”