Andy McKay

Jun 08, 2010

Taskqueues in App Engine


App Engine has task queues as an experimental feature. There's nothing fancy here, it's just a way of queueing something up to be run later.

We've probably all written a few queues in our time, the task queue is just a continuation of that. Add a URL to be executed a certain point and a scheduler will come along and try all the urls in the queue. For Arecibo we use this to do the post processing on an error. When an error comes into Arecibo we write it to the datastore quickly, we then add in one task queue that will: figure out the grouping, figure out any notifications, figure out the user agent (surprisingly a bit expensive). That post processing will be done async via the task.

First thing that I noticed about the queue is that the development server doesn't run the task queue automatically meaning that all my unit tests failed. So here's a way to do it immediately if you are not in production:

if os.environ.get('SERVER_SOFTWARE', '').startswith('Dev'):
    # send the signal, otherwise we have to clicking buttons
    # to process the queue
    send_signal(None, self.id)
else:
    # enqueue the send notification
    # if development
    taskqueue.add(url=reverse("error-created", args=[self.id,]))

In Django we've got a URL "error-created" that maps to send_signal. That's a method that first figures out if this object has had send_signal run before. We tell this by the use of a Boolean property on the object. It's a bit annoying to have a property for that, but it could be sent by the task queue multiple times. There might be a cleaner way to cope with that.

def send_signal(request, pk):
    error = Error.get(pk)        
    if not error.create_signal_sent:
        error.create_signal_sent = True
        error.save()
        error_created.send(sender=error.__class__, instance=error)
        return HttpResponse("Signal sent")
    return HttpResponse("Signal not sent")

It then sends off the custom Django signal, when then triggers all the signal processing we've assigned in Django to that error. Because the view is only going to be executed by the App Engine scheduler, we can just return a simple string.

The end result is that the unit tests work and it works in production and keeps the load on that initial write nice and low.