Django File Upload Handling Examples
I have been working on a multi-user blogging and publishing platform using Django 1.0 lately. Naturally this requires the backend to be able to handle file uploads. A lot of things have changed from Django 0.96 which I am still using on some legacy code. One of those changes is the way Django 1.0 handles file uploads. Most of the changes done were made to allow Django apps to handle large files without soaking up too much memory.
So what has changed? The most visible change is that there are now at least two separate API’s that you have to work with. You have the File API and the Storage API. The File API which exposes the File class provides a thin wrapper around Python file objects. The Storage API, on the other hand, exposes a base class Storage that you can use to implement custom storage facilities. There is another API that provides FileUploadHandler. This will allow you to customize the way Django handles the uploaded files in their “raw” form. For most purposes, the File and Storage API will suffice.
This post is meant to supplement the information found in the “Handling Uploaded Files” section of the Django File Uploads documentation. You will still need to refer to documentation.
The examples I will be using here implement basic file uploads handling in Django 1.0. The examples provided in the official Django documentation, as of 18 Feb 2009, do not give you an idea of how the File and Storage can be used together. Please note that these examples will not work with Django versions before 1.0.
The Model
Here is the model that we will be using for this example:
from django.db import models
class Attachment(models.Model):
attached_file = models.FileField(upload_to='attachments')
mimetype = models.CharField(max_length=64, editable=False)
created = models.DateTimeField(auto_now_add=True, editable=False)
updated = models.DateTimeField(auto_now=True, auto_now_add=True, editable=False)
We have marked the 3 fields that are not editable so that the corresponding form fields will not be created for them by Django’s automatic admin and django.forms.ModelForm.
The Form
Normally you would just use django.forms.ModelForm to create the form. Like so:
from django.forms import ModelForm
class AttachmentForm(ModelForm):
class Meta:
model = Attachment
This will take care of most use-cases where you have a simple app with one form per model. Taking it a bit further, you could just make use of Django 1.0’s excellent “automatic admin interface”. This is what you should normally do when you are just starting out with Django. However, there are times when you just want a bit more control over the way things are done. In this context, we will create a form class manually to illustrate how you would use Django’s File and Storage API.
from django import forms
class AttachmentForm(forms.Form):
attached_file = forms.FileField()
def __init__(self, bound_object=None, *args, **kwargs):
super(AttachmentForm, self).__init__(*args, **kwargs)
self.bound_object = bound_object
self.is_updating = False
if self.bound_object:
self.is_updating = True
def save(self):
if not self.is_updating:
self.bound_object = Attachment()
# Retrieve the UploadedFile object for the attached_file field.
uploaded_file = self.cleaned_data['attached_file']
# Clean up the filename before storing it.
import re
stored_name = re.sub(r'[^a-zA-Z0-9._]+', '-', uploaded_file.name)
# Save the file and its metadata.
self.bound_object.attached_file.save(stored_name, uploaded_file)
self.bound_object.mimetype = uploaded_file.content_type
In the example above, we used the save() method of the Attachment object’s attached_file field. The save() method’s second argument takes any object that implements the File class’ methods. All file uploads handled by Django’s default upload handlers are UploadedFile objects. Since UploadedFile is a subclass of File we can use it directly in the save() method of our file field.
By default the uploaded file will be saved on the file system under the directory structure specified by settings.MEDIA_ROOT plus the upload_to. So if you have settings.MEDIA_ROOT set to /webapp/media_root adding our upload_to setting for the model above, the file for attached_file will be saved under the directory /webapp/media_root/attached_file. Normally this directory will also be mapped to the appropriate public URL on your webserver’s configuration.
There are times however when you want to change the default storage behavior. This is where the Storage API comes in. For example we want to be able to save files in a different root directory. The problem is that you can only specify one settings.MEDIA_ROOT for your entire app. To do this, we need to pass the parameter storage to our attached_file field.
Here is our model again, this time we specify a custom storage location other than MEDIA_ROOT.
from django.core.files.storage import FileSystemStorage
from django.db import models
attachment_file_storage = FileSystemStorage(location='/webapp/attachments_root', base_url='/attachments')
class Attachment(models.Model):
attached_file = models.FileField(upload_to='attachments', storage=attached_file_storage)
mimetype = models.CharField(max_length=64, editable=False)
created = models.DateTimeField(auto_now_add=True, editable=False)
updated = models.DateTimeField(auto_now=True, auto_now_add=True, editable=False)
With our modified model, calling the save() method of the attached_file field will save the file under /webapp/attachments_root/attachments.
It does not end with customizing storage locations on the filesystem however. With the Storage API you can customize how files are actually saved by creating a subclass of Storage that overrides the appropriate API methods as documented here and here.
Django FileUploadHandler API
By default Django 1.0 handles file uploads using either django.core.files.uploadhandler.TemporaryFileUploadHandler or django.core.files.uploadhandler.MemoryFileUploadHandler. TemporaryFileUploadHandler is the default for files larger than settings.FILE_UPLOAD_MAX_MEMORY_SIZE. This handler streams the uploaded file to a temporary file saved under settings.FILE_UPLOAD_TEMP_DIR. It then wraps this file with a TemporaryUploadFile object. For smaller files, Django uses the MemoryUploadFileHandler. This handler wraps the uploaded file with a StringIO object. Either way, you get an object with a file-like interface. The only difference is where the file’s data is actually stored.
The FileUploadHandler allows you to customize the default behavior by using subclasses of FileUploadHandler and inserting them into the settings.FILE_UPLOAD_HANDLERS list. This is documented in the “Upload Handlers” section of the Django manual.
Performing Additional Processing
There are cases when you need to do more than just save an uploaded file. For instance, in a photo album application, you might want to use PIL to resize uploaded images or create thumbnails before you save them. In the examples above, we did not do any post-processing of the uploaded files and just saved them right away. It really depends on what you intend to do with the uploaded file. The most common way is to just read the file into a variable using the read() method. You must be careful when you do this however since you will overload your server if the uploaded file is too large. Use the size propery of the File object to test if a file is too large for your system to handle. You would normally do this in your form’s clean() method.
A more efficient way of applying post-processing is to stream the file and do it lazily. PIL’s Image.open() is a good example. Using PIL as an example, here is how you would apply post-processing to uploaded images. This example also shows how to use ContentFile.
from PIL import Image
def postprocess_image(uploaded_file):
img = Image.open(uploaded_file)
# ... do image post-processing and manipulation ...
return img
Elsewhere, in one of your form’s methods (implemented with self.bound_object as the example above)…
def save_uploaded_photo(self):
uploaded_image = self.cleaned_data['uploaded_image']
# We assume that uploaded_image.size has been checked in
# the form's clean() method and that we are good to go.
img = postprocess_image(uploaded_image)
if (img.format == 'JPEG'):
processed_image = ContentFile(img.tostring('jpeg', img.mode))
elif (img.format == 'PNG'):
# PIL Image.tostring() does not support PNG encoding for some reason.
imgstr = StringIO()
img.save(imgstr, 'PNG')
imgstr.reset()
processed_image = ContentFile(imgstr.read())
elif (img.format == 'GIF'):
processed_image = ContentFile(img.tostring('gif', img.mode))
# Clean up the filename before storing it.
import re
stored_name = re.sub(r'[^a-zA-Z0-9._]+', '-', uploaded_image.name)
self.bound_object.image.save(stored_name, processed_image)
The above code is overly simplified and does not perform any additional checking of the file data or any exception handling. Do note that in the versions of PIL I am using, Image.tostring() does not support PNG encoding. This is why some additional code was needed.
At first glance, it may be a bit overwhelming. But the new API allows for greater flexibility in handling file uploads. It allows you to implement anything from custom storage to a CDN to upload progress meters. So learning how to use the new API is well worth the effort.
Please post any corrections in the comments below. Thanks.
