Andy McKay

Dec 31, 2010

Django and Bleach


Bleach is a HTML whitelist and sanitizer library written by James Socol. At mozilla we use it on the addons and support sites. Chances are you'll need it on pretty much any site that accepts user input, ensuring that the HTML you are outputting is nice and safe.

Under the hood bleach uses the html5lib. As an aside, I've been running html5lib sanitisation using a homegrown library on App Engine for a while now and it's been great.

Installing bleach is as simple as:

pip install bleach

The place to do user validation is a form, so let's take a simple model and form combination:

from django.db import models

class Todo(models.Model):
    text = models.CharField(max_length=255)
from django.forms import ModelForm
from bleach import Bleach
from todo.models import Todo

bleach = Bleach()

class TodoForm(ModelForm):
    def clean_text(self):
        return bleach.clean(self.cleaned_data.get('text', ''))

    class Meta:
        model = Todo

And that's it, our Todo model is now going to be nicely sanitised (assuming you use a django form for validating all user input, which you should do).

So here's a quick test:

import unittest
from todo.forms import TodoForm

class TestBleach(unittest.TestCase):
    def test_todo(self):
        data = "<b>bold</b> <script>alert('hello')</script> "
        expect = "<b>bold</b> &lt;script&gt;alert('hello')&lt;/script&gt;"
        form = TodoForm({'text':data})
        assert form.is_valid()
        assert form.cleaned_data['text'] == expect

We can see that the nice bold tag is not escaped. But the very nasty script tag is. On the way you also get tag balancing:

>>> bleach.clean('bar</a>')
u'bar'
>>> bleach.clean('<a>bar')
u'<a>bar</a>'

For more information including notes on the link creation function see the readme. Probably my favourite part of bleach is the tests.

Check out bleach on github and give it a spin for your next Django project.