Managing Django Version Migration

Wed 18 June 2014

CEDA was a fairly early adopter of Django, introducing it as the technology for the CMIP5 Questionnaire. Back then we were on Django 1.2 and there have been 3 major releases since then. We have now replaced most of our internal systems with Django apps and it has been a huge benefit to the group but inevitbly the way we build our Django projects has evolved with the Django versions. Recently I started working on a replacement to our catalogue application and at the same time I inherited the PIMMS django application which was an evolution of the original CMIP5 Questionnaire codebase. Trying to reconcile how these two projects were designed, and push them through our automated deployment environment, has left me baffled at times, particularly by how Django projects should be laid out on the filesystem and how this structure maps to Python packages.

Between Django 1.3 and Django 1.4 the structure created by django-admin.py changed so that the directory containing manage.py was no longer a python package in the sense of it containing an __init__.py file. Below are two examples of a clean project created with django-admin.py before and after the change. Notice that in Django 1.3.7 mysite is a python package (contains __init__.py) which has the the apps inside it as sub-packages. In Django 1.4 there is a mysite project directory and a mysite python package; the app packages are not sub-packages of the mysite package.

dj_1.3.7                    dj_1.4/
└── mysite                  └── mysite
    ├── __init__.py             ├── manage.py
    ├── manage.py               ├── myapp1
    ├── myapp1                  │   ├── __init__.py
    │   ├── __init__.py         │   ├── models.py
    │   ├── models.py           │   ├── tests.py
    │   ├── tests.py            │   └── views.py
    │   └── views.py            ├── myapp2
    ├── myapp2                  │   ├── __init__.py
    │   ├── __init__.py         │   ├── models.py
    │   ├── models.py           │   ├── tests.py
    │   ├── tests.py            │   └── views.py
    │   └── views.py            └── mysite
    ├── settings.py                 ├── __init__.py
    └── urls.py                     ├── settings.py
                                    ├── urls.py
                                    └── wsgi.py

Informal documentation on the web, and deployed applications in the wild, often uses a combination of the 1.3.7 and 1.4 versions where the apps are sub-packages of a mysite package but are imported as top-level packages.

In the 1.3 structure the package hierarchy is ambiguous. Each app appears to be a sub-package of the project package. However, the apps are also importable as top level packages. E.g. the interpreter can import the myapp1 package either as mysite.myapp1 or myapp1. The assumption is that the mysite package directory has been inserted manually onto sys.path, either implicitly by running python within that directory or explicitly within the wsgi.py script. In my view this breaks a fundamental assumption of the python import system: that there is only 1 package hierarchy.

To demonstrate it's dangers lets create some random state inside one of our apps in the Django 1.3.7 structure above.

# dj_1.3.7/mysite/myapp1/__init__.py
import random
module_state = random.random()

In Django 1.3 deployments this module is importable as 2 separate package paths, myapp1 and mysite.myapp1. We are not talking about relative imports here (which are a whole extra layer of complexity), python thinks of these as different packages. What happens when we try to import both?

# From dj_1.3.7/mysite/
$ python manage.py shell
Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import mysite.myapp1
>>> import myapp1
>>> myapp1.module_state
0.8994767988891682
>>> mysite.myapp1.module_state
0.9579010259583026
>>>

Oh dear! we have 2 copies of the same module. If your application mixes the two styles of importing myapp1 different parts of the app will have different objects. Any shared state will not be shared everywhere. Chaos could ensue. Therefore I'm very relieved that Django sorted things out for version 1.4.

As well as resolving this confusion the django developers had another motivation to change the project structure, reusable apps. Django 1.5 recommends eventually distributing apps entirely separately from projects. This involves creating a separate setup.py for each app and for the project package, thus creating multiple distributions. Although this is optional, for simplicity I will call this the 1.5 stucture. The 1.5 stucture would look something like what is shown below, with 3 python distributions side by side, each with their own setup.py. We would then deploy the project using 3 tarballs and a requirements.txt in the project referencing each app.

dj_1.5/
├── myapp1
│   ├── myapp1
│   │   ├── __init__.py
│   │   ├── models.py
│   │   ├── tests.py
│   │   └── views.py
│   └── setup.py
├── myapp2
│   ├── myapp2
│   │   ├── __init__.py
│   │   ├── models.py
│   │   ├── tests.py
│   │   └── views.py
│   └── setup.py
└── mysite
    ├── manage.py
    ├── mysite
    │   ├── __init__.py
    │   ├── settings.py
    │   ├── urls.py
    │   └── wsgi.py
    ├── requirements.txt        # references myapp1 and myapp2 distributions
    └── setup.py

The reality on the ground is a messy mixture of all these approaches. At CEDA we have 1.3, 1.4 and 1.5 flavours but we also have 1.3 flavours that have been patched up to look a bit like 1.4 and hybrids that aren't quite like any of them because when we started it wasn't clear what the correct structure was.

Therefore I'm recommending this strategy for cleaning up the mess of confusing package structures:

  • If a site is running without problems and doesn't need any maintenance for other reasons, leave it alone. We can't afford to make work for ourselves.
  • When working on the code if possible move to structure 1.4 or 1.5. I.e. app directories outside the project package. A mixture of 1.4 and 1.5 is fine where some apps are separate distributions and some are inside the project distribution. Consider how likely it is we will want to use an app in multiple projects before deciding where to put it. If it's reusable, use the 1.5 structure.
  • Otherwise, if you keep the 1.3 structure, make sure all apps are imported using their full package name, i.e. as subpackages of the project package. Be aware of the duplicate module problem
  • Try to name your projects and apps in a way that emphasises what they are. E.g. catalogsite and catalogapp. This helps prevent confusion when viewing the directory structure.

Category: Software Tagged: python django ceda

Comments