7 min read

Mistakes happen, and that's part of a learning process. In a large project like Django, it can be hard to spot a mistake. Thanks to it being open source, anyone can see the code and fix the mistakes they see. In this post, I'll explain how I found a vulnerability in contrib.postgres.fields.JSONField that allowed SQL injections to be performed.

If you're familiar with Python's DB-API, you may have used the .execute() method. It's a method that can be used to prepare and execute a database operation (query or command). Parameters can be passed into the method and will be bound to the variables in the operation.

In general, the parameters should be separated from the operation string. That means instead of formatting the parameters directly into the string, we should use a placeholder specified by the paramstyle. This is to avoid SQL injection attacks that exploit the quotation marks around values. The Django docs give a quick explanation regarding this.

So, instead of doing something like the following:

>>> query = 'SELECT * FROM myapp_dog WHERE name = %s' % name
>>> connection.cursor.execute(query)

or the following:

>>> query = "SELECT * FROM myapp_dog WHERE name = '%s'"

We should do:

>>> query = 'SELECT * FROM myapp_dog WHERE name = %s'
>>> connection.cursor.execute(query, [name])

On Django, this process is very crucial in the implementation of lookups and transforms. In fact, Django breaks it down into process_lhs and process_rhs methods to process the query string and the parameters of the left-hand-side (LHS) and right-hand-side (RHS) of a query's WHERE clause. There's a section on Django docs that explains this. It doesn't really go in depth, but it should give you the gist of what's happening under the hood. That's where I spent most of my time working on this project, but I don't think I can explain it any clearer than the docs. You just need to read the docs and get your hands dirty with some hands-on if you really want to understand.

While working on the KeyTransform, I noticed something off in the as_sql() method. As you can see there, if there isn't more than one key transform applied and the string raises ValueError when casted into int, the code incorrectly quotes the key_name and formats the lookup directly into the string. Basically, it does exactly the two things I explained above that shouldn't be done. So, I tried to find a way to exploit this behavior.

As explained before, we need to exploit the quotation marks. Normally, the following lookup:

>>> query = Dog.objects.filter(data__breed='labrador')

gets translated into the following SQL query:

SELECT * FROM myapp_dog WHERE (data -> 'breed') = 'labrador'

We need to come up with a special string for the KeyTransform on the data JSONField. So, the string that follows data__ should be special such that it meets the following conditions:

  • It raises a ValueError when casted into int. This is easy, just make sure it's not an index lookup (data__1, data__42, etc. don't meet this condition).
  • It only counts as one KeyTransform: it can't contain double underscore (__) that's followed by a string that's not registered as a lookup. So, something like data__breed, data__breed__startswith, or data__breed__contains meet the condition, but data__owner__name doesn't.
  • It should escape the quotation mark and contain another condition that evaluates to true. So, it should contain something like OR 1=1. This should be... "impossible".

Keyword arguments must be valid identifiers, therefore we can't include a quote nor a space. We can't do something like:

>>> query = Dog.objecs.filter(data__breed' something OR 1=1 something'='labrador')

as that would obviously raise a SyntaxError.


We can pass kwargs using a dictionary. Any string is a valid key in a dictionary.

So, it's possible to do this:

>>> kwargs = {"data__breed' = 'a') OR 1 = 1 OR ('d": 'x'}
>>> query = Dog.objects.filter(**kwargs)

Which will be translated into the following SQL query:

SELECT * FROM myapp_dog WHERE (data -> 'breed' = 'a') OR 1 = 1 OR ('d') = '"x"'

As you expect, the query will return all Dog objects because the condition 1 = 1 is always true.

In order for all this to happen in real life, someone had to use contrib.postgres.JSONField and provide a way for users to construct their own key-value pair that will be made into a dictionary and passed as kwargs in the query. It is unlikely, but it's possible. We can never be too careful when it comes to security. Besides, it's best practice to pass values using parameters when we're using DB-API anyway.

I contacted the Django security team ([email protected]) and they were pretty quick to respond. One and a half week later, security releases were issued which included a fix for the vulnerability. I'm very proud to see my name on it. Yay!

However, it turns out that the fix wasn't perfect. It spawned a bug that's split into two tickets (one because it's a regression, the other's not). The regression happened because the fix was based on the code that handles nested KeyTransform and that code already had the bug. The usage of KeyTransform that's affected by this bug is actually undocumented, though.

The fixes (single and nested) were quite simple. The parameters were in the wrong order, so the fixes were just to flip them. Funnily, I already did so when I implemented KeyTransform for other database backends. It didn't mean that the undocumented usage was already working out of the box, though.

While fixing the implementation so that the new tests pass, I had to change my approach of using JSON_EXTRACT and JSON_UNQUOTE on MySQL and MariaDB for the KeyTransform implementation. In the end, it's very simple and cleaner than my previous implementation.

For MariaDB, I always use JSON_UNQUOTE(JSON_EXTRACT(...)), while it's just JSON_EXTRACT on MySQL unless it's chained with a lookup expecting a text LHS. Previously on MariaDB, it's only wrapped with JSON_UNQUOTE if the RHS is a string. However, doing JSON_UNQUOTE on MariaDB doesn't do any harm if the RHS isn't a string because every JSON value is a string anyway (unlike MySQL, it doesn't have a native JSON type). So, I decided to always use JSON_UNQUOTE. This eliminates the need to check whether the RHS is a string. It also unifies the code for the exact lookup between MySQL and MariaDB.

I think that's the last major revision I did before GSoC finishes today. There may be some other ones, but that depends on whether other Django folks find mistakes in my PR or not. For now, my GSoC project is pretty much completed and I'll probably only have one more post coming to recap the whole journey. The final announcement is on September 4 (or maybe 3 in where you live), so let's see if I'm going to pass the final evaluation.

Until then, see you!