Often it’s convenient to begin a Django project with a simple URL scheme, using the primary model’s database key as a reference to each item, to make it easy to retrieve each item and display it in a web page. Later, a decision is usually made to change the URL scheme to a more secure form, one that hides the database primary keys from users. To do this, you have a choice: generating URL ‘slugs’ from the names of the items you are showing, or encrypting the names using a scheme such as UUIDs or hashIDs.
Slugs
The term slug comes from the late 19th century, and was the name given to a whole line of text, cast in hot metal by Linotype machines (‘type-casting’), which were then assembled into a page of slugs for high volume printing.(1)
Django, which was first developed by a newspaper, has inherited the term to describe a way of simplifying names and phrases to use in web page URLs. There are now several slug implementations around: some systems drop the articles ‘a’ and ‘the’, while others like Django leave them in. The effect is always to remove punctuation, replace non-English letters and capitals with lowercase, and to replace the spaces with hyphens so that, for example, a book title such as Finnegan’s Wake becomes simply /finnegans-wake/.
The Problem
But what if your database item names alone are not unique? Newspaper articles, reprints of books, remakes of movies and real people often have the same name, which make slugs of their names unsuitable for unique identifiers.
One solution is to combine item names with their database primary keys. This makes the URLs unique, but leaves the problem that it still indirectly gives away the size of your database. A better option is to use a combination of important information about each item to create unique URLs, which can then be used to retrieve the item from the database. These may be fields that are unique together, or fields that together give the user a short summary of the saved data, such as the type of story, the genre or its creation date. A book or newspaper website might combine the author, genre and date, whereas for movies the title combined with the year of release is usually enough to create a unique identifier.
This post describes how to convert a pre-existing Django project that uses simple numeric primary keys for the URLs, to a custom slug URL scheme, one that combines multiple model fields to generate unique slug-based URLs. It will cover not only the changes needed in the model, but also your templates, views, URL scheme, forms and unit tests.
To show all the steps necessary to convert to a slug-based URL system, I will use the example of a movie database project I worked on recently, to itemise the changes necessary for an established Django project. The slug system I will implement will combine the title and year of each movie to create more secure and readable URLs. Although I have used only two fields, the steps are extensible to any number of fields, and applicable to any project where you want to create a custom, multi-field, slug-based URL system.
Note that once the changes are made, the primary database model will still have the same system of primary keys. The goal is to create a parallel, non-numeric system for uniquely identifying, retrieving and displaying database items using only text slugs.
Using Primary Keys in URLs
The outline of my main movie model, before I introduced slugs, was as follows:
class Movie(models.Model): Title = models.CharField(max_length=MAX_TITLE) Year = models.PositiveSmallIntegerField() Director = models.CharField(max_length=MAX_DIRECTOR) Actors = models.TextField(max_length=MAX_ACTORS) #etc ... class Meta: ordering = ['Title'] def __str__(self): return "{} ({})".format(self.Title, self.Year) def get_absolute_url(self): return reverse('main_app:movie_detail', args=[str(self.id)]) def save(self, *args, new_title=True, **kwargs): if new_title: if Movie.objects.filter( Title=self.Title, Year=self.Year).exists(): return False super(Movie, self).save(*args, **kwargs) return True
Note that the override for the save() method ensured that the title and year combination was not already in the database.
The main view for my movie data was the function-based movie detail page in views.py, whose job was to retrieve and display a valid movie record using the primary key:
def MovieDetailPage(request, pk): try: movie = Movie.objects.get(pk=pk) except Movie.DoesNotExist: raise Http404("Sorry, that movie in not in the database.") template_name = 'main_app/movie_detail.html' context = { 'movie' : movie, 'directors' : movie.Director.split(', '), 'actors' : movie.Actors.split(', '), 'genres' : movie.Genre.split(', '), 'writers' : writers, } return render(request, template_name, context)
This was called via a primary key URL in urls.py:
app_name = 'main_app' urlpatterns = [ ... path('movie/<int:pk>/', MovieDetailPage, name='movie_detail'), ... ]
The core of the changes I will now describe will be to introduce a unique slug field into the Movie model, to provide an alternative, textual method for retrieving and displaying movie items from the database.
The Changes to Make
- First, write your slug-maker and test it on the fields you want to use, checking that it’s doing what you want it to do. The best place to do this is in the Django shell:
>>> from main_app.models import Movie >>> from django.utils.text import slugify >>> movie.Title = "The Good, the Bad and the Ugly" >>> movie.Year = 1966 >>> slugify('-'.join([movie.Title, str(movie.Year)]), allow_unicode=False) 'the-good-the-bad-and-the-ugly-1966'
If I’d wanted to, I could have added the director or any other field to the slug. As it stands, though, it is sufficient as a unique identifier for a film.
2. Next, create the new slug field in your Django model, remembering to modify your get_absolute_url() and save() override methods:
class Movie(models.Model): Title = models.CharField(max_length=MAX_TITLE) Year = models.PositiveSmallIntegerField() Director = models.CharField(max_length=MAX_DIRECTOR) Actors = models.TextField(max_length=MAX_ACTORS) ... slug = models.SlugField( default='', editable=False, max_length=MAX_TITLE, ) def get_absolute_url(self): return reverse('main_app:movie_detail', args=[self.slug]) def save(self, *args, new_title=True, **kwargs): if new_title: if Movie.objects.filter(Title=self.Title, Year=self.Year).exists(): return False self.slug = slugify('-'.join([self.Title, str(self.Year)]), allow_unicode=False) super(Movie, self).save(*args, **kwargs) return True
Note that the slugs are only created when the movie items are first created. This means that pre-existing movie items in the database will have their slug attributes set to the Null string “”. This will require you to write a utility to set them. More on this later.
Some of you may be wondering why not just create a property method, which would generate the correct slug from the title and year each time it was called? This would avoid having to write a utility to create all the slugs for movies already in the database. This would only need the @property decorator to wrap the slug method, so as not to break any code that might call on Movie.slug:
class Movie(models.Model): ... @property def slug(self): ...
That’s the slug creation part taken care of, but what about retrieving objects? Unfortunately, calling movie.slug may now work, but any lines of Django ORM code that might try to retrieve model objects by saying:
movie = Movie.objects.get(pk=pk)
will now have to say:
movie = Movie.objects.get(slug=slug)
and will not work, because the Django ORM will be unable to issue SQL commands to the database using a field that doesn’t exist. The slug would actually exist in Python, not the SQL database, which wouldn’t know anything about it.
The simplest solution is to create the slug as another model field, and to write a short utility to create the slugs for existing movies (see later).
3. Remember to run your migration commands once you have made these changes:
$ ./manage.py makemigrations $ ./manage.py migrate
4. Next, change all your URLs that use /<int:pk>/ to say /<str:slug>/. For example, in urls.py:
path('movie/<int:pk>/', MovieDetailPage, name='movie_detail'),
now becomes:
path('movie/<str:slug>/', MovieDetailPage, name='movie_detail'),
5. You will then need to change all your corresponding view functions that require a primary key to expect a slug, so that, for example:
def MovieDetailPage(request, pk):
would become:
def MovieDetailPage(request, slug):
Which means all calls to this function now need to use the slug, rather than the primary key:
return MovieDetailPage(request, movie_obj.pk)
becomes:
return MovieDetailPage(request, movie_obj.slug)
6. Next, you will have to change all your templates that link to your detail page via your primary key to use the new slug-based URL. For example, if your are using URL shortcuts in the Django template language, these now need to pass the slug, not the primary key:
<a href="{% url 'main_app:movie_detail' film.pk %}">
becomes:
<a href="{% url 'main_app:movie_detail' film.slug %}">
7. If you are using forms to check newly entered model data, and using the ModelForm in particular to leverage your model fields, you might want to check that you’re not telling your form to display __all__ fields. Best practice is to name explicitly only the model fields you want to see in your form, so that you’re not exposing all the fields to the user (this can be an embarrassing mistake). But if you’re not using __all__, there should be no changes to make in your form:
class MovieForm(ModelForm): class Meta: model = Movie # fields = __all__ # DO NOT USE THIS LINE fields = ['Title', 'Year', 'Director', 'Actors', 'Genre', 'Plot', 'Writer'] def clean_Title(self): #etc ...
8. As explained earlier, if you have pre-existing items in your database, their slug fields will initially only be empty strings. How to set them? The easiest way is to create your new slugs by running a simple utility on all the objects already in your database. As it turns out, regardless of how many fields you use to generate your slug, it only takes a few lines of Python in the Django shell to do this:
>>> from main_app.models import Movie >>> from django.utils.text import slugify >>> for movie in Movie.objects.all(): # INSERT YOUR OWN SLUG GENERATOR HERE: ... movie.slug = slugify('-'.join([movie.Title, str(movie.Year)]), allow_unicode=False) ... movie.save(new_title=False)
9. You should now be ready to go, but as a responsible engineer, you have learned always to test samples of real data too. The good news is now that you have created all your slugs in your live database, you can now run your complex integration unit tests on samples of your real data. But since your main database model has changed, your complex integration tests will no longer work. To create a new sample of your real database, a test fixture, you will have to re-run the commands you used to create them. For my testing, I created a new sample of eight non-sequential instances from my main model, plus all the data from my other models:
$ ./manage.py dumpdata main_app.Movie --pks 1,2,3,4,6,7,9,10 > main_app/fixtures/movie_db.json $ ./manage.py dumpdata --exclude main_app.Movie > main_app/fixtures/other_tables.json
All your complex integration tests should now all work just fine on your test fixtures with the new slug-based URLs:
class ComplexIntegrationTests(TestCase): fixtures = ['movie_db.json', 'other_tables.json'] def test_deleting_a_movie(self): response = self.client.get( reverse('main_app:delete_movie', args=['2001-a-space-odyssey-1968'])) self.assertEqual(response.status_code, 200) self.assertTemplateUsed( ... self.assertContains(...
Conclusion
And that’s pretty much it. Your database primary keys will now now be hidden from prying eyes, as will be the size of your database, and your URLs will be infinitely more readable and user-friendly.
Software Used
Django 3.0.1
Python 3.7.1
Pipenv 2018.11.26
PostgreSQL 12.1
Django Debug Toolbar 2.2
References
(1) Linotype machines were a pretty amazing invention for 1884, and were used right up until the 1970s, an incredible lifespan for any technology. In their day they were state-of-the-art, and the bridge between nineteenth century manual typesetting and the phototypesetters of the late twentieth century.
They were essentially a combination of typewriter, mould-maker and molten metal caster, able to convert line after line of a typesetter’s keystrokes instantly into linear metal moulds, which were immediately used to cast printing slugs from hot metal (see top photo), each ready to print a line of text. Think all-in-one typewriter, mould-maker and hot metal caster.
The invention of Linotype machines did not automate printing, just the creation of printing plates. They removed the need for manual typesetting, a job held by a young Mark Twain (Sam Clemens) in Hannibal, Missouri, only a few hours’ drive from where Django began in Lawrence, Kansas, continuing the tradition of American Midwest publishing.
If you’re into the history of communication and publishing technology, of which Django is a part, you should find them fascinating for another reason: it wasn’t long before someone had the great idea to hook one up to a telegraph line and control it remotely, which led to the TeleTypeSetter.