Friends don't let friends not use Django

It's difficult for most of us to evaluate the multitude of web frameworks and microframeworks available in Python, to choose the ideal framework for their project. Naturally, proponents of each microframework will pitch in with comments about how much simpler it is than the alternatives, how much less boilerplate there is, or how much faster it runs at scale. Likewise, seasoned users of each microframework will have their favoured approaches for caching, storage, deployment, and the myriad other components needed for web development. They appreciate having the flexibility to choose and integrate those components to their own liking.

The numerous arguments in favour of one framework or another muddy the waters somewhat; in most cases, starting your project with Django will be a safe decision that you will probably not regret.

In bullet points:

  • Newbies should choose Django. It will keep you secure and it will teach you a lot.
  • If you aren't completely sure what direction your project will develop, use Django. You'll soon hit use cases where you'll want some of the extras that are built into Django.
  • If you know what you're doing with Python web libraries, and you have a fairly comprehensive vision of what your app is doing - maybe you know you want high performance in particular areas, or more simplicity for a subset of use cases, then choose your framework on the basis of that knowledge.
  • Do experiment with and learn the other frameworks.

Django ships with good, solid "batteries" included - libraries that fulfil all sorts of requirements that come up time and again in web programming. Moreover, the Django community has produced thousands of packages (5413 at the time of writing) that fill other gaps. That's many thousands more packages than for any other Python framework. In my projects I invariably find myself wanting some of these components at some point down the line, when requirements arise that we'd never forseen (For example, i18n requirements sometimes come up later in a project's life, when you want to roll out to new regions). Certainly, neither Django's batteries included nor the community packages will be suitable in every use case, but they get you up and running quickly.

One argument made in favour of microframeworks is that they offers the flexibility to choose alternative components. I don't think it's particularly difficult to swap out components in Django for alternatives more suited to a specific need - I've done so many times. Django is explictly coupled, unlike, say, Ruby on Rails, so you can simply call alternative functions to use different storages or template engines etc.

Note however, that Django's integrated-by-default components will also be secure-by-default; any home-rolled integrations may not be. For example, Django applications using Django>=1.2 are protected from CSRF attacks by default. Any microframework that doesn't pre-integrate form generation and request dispatcher components won't be able to say that. This is true whether you're integrating things with a microframework or using non-standard components in Django.

There are a couple of other arguments that I've heard:

  • "Django is slower than x" - maybe, but don't optimise prematurely.
  • "Django doesn't scale as well as x" - scale is a much more complicated problem than "use this tool and you'll be alright". Approaches to scaling Django will be comparable to approaches to scaling any other framework.
  • "Django isn't well-suited to client-side HTML5 apps" - this is true, but it isn't particularly bad at them either. Also don't underestimate the numer of additional pages and components needed to productise your core app, even if it's a rich HTML5 app made of static assets and XHR.

I hope this unmuddies the waters a little, especially for beginners. Of course, I'm not advocating anything other than "use the right tool for the job", but until you're sure exactly what the job entails, it doesn't hurt to have a comprehensive toolbox at your disposal.

Design your organisation for Conway's Law

Conway's Law states that:

"Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations."

In other words, the structure of a program or system is likely to evolve to mirror the management structure of the organisation. Even with a couple of teams working on a small project you may end up with various layers of shims and wrappers to make code written by team A interface with team B's preferred way of doing things.

The schism between Dev and Ops teams that is regularly cited in the DevOps movement is another example of Conway's Law in action. The message there is simple: get developers and operations to collaborate on common business goals (eg. frequent, reliable deployments) or else their competing priorities, poorly communicated, will cause friction that risks the business' ability to deliver on any goals. The excellent The Phoenix Project describes several potential communication gaps other than between Dev and Ops, such as between compliance and developers, and information security and operations, and tells a parable about how close cross-team collaboration avoids a series of potential disasters.

There are various solutions to the problem. In the original magazine article in which Melvin Conway introduced the idea he went on to propose a solution:

"This criterion creates problems because the need to communicate at any time depends on the system concept in effect at that time. Because the design which occurs first is almost never the best possible, the prevailing system concept may need to change. Therefore, flexibility of organization is important to effective design."

Valve's Employee Handbook describes how they have fully embraced the flexible teams approach:

"Why does your desk have wheels? Think of those wheels as a symbolic reminder that you should always be considering where you could move yourself to be more valuable. But also think of those wheels as literal wheels, because that’s what they are, and you’ll be able to actually move your desk with them."

A slightly less radical approach is to attempt to create strong but fixed communication pathways between teams. Spotify, for example, has described having chapters and guilds that encourage collaboration across team boundaries on specific issues, skills or disciplines.

You can apparently also beat Conway's Law not by improving cross-team communication but by ensuring your teams are set up to match the architecture of the technology products you want to produce. A leaked memo from a former Amazon employee that contrasts Amazon's structure with Google's mentions that Jeff Bezos mandated that:

All teams will henceforth expose their data and functionality through service interfaces. [...] Teams must communicate with each other through these interfaces.

Bezos is relying on Conway's Law to ensure the technology is structured well rather than neglecting Conway's Law and letting it create an unexpected architecture. This solution doesn't attempt to address Melvin Conway's observation that "the design which occurs first is almost never the best possible", but if you have an established or proven architecture, perhaps something that offers maintainability or security benefits, you may be able to ensure it is more closely followed by removing the flexibility to interact across the architecture boundaries you want to draw.

Learning Rust

The past weekend I've been writing my first programs in Rust, which seems to be in vogue at the moment. Mentions of Rust keep coming up on /r/programming and it's obvious why: Rust is a very exciting community right now, with very active development and new libraries landing all the time.

Perhaps most persuasive was Jared Forsyth's seemingly balanced discussion of of Go vs Rust which points out several interesting features in Rust that are absent in Go.

The syntax itself is reminiscent of Ruby (but with braces). As a Python programmer, I've never found Ruby that interesting a prospect. I've learned the Ruby language enough to write Puppet providers, but Ruby as a language occupies very much the same space as Python and I've never seen the need to take it further.

Rust offers some of the same syntactic ideas as Ruby but on offer is a highly performant natively-compiled language with static but inferred types. Rust's pointer ownership model allows the compiler to automatically allocate and free objects in most cases without needing reference counting or garbage collection (though both of these are available too). You could perhaps describe it as a low-level language with high-level syntax. So this is a very different proposition to Python and Ruby, and rather different to C and C++ too.

What I've learned of Rust so far comes largely from the Rust tutorial and Rust by Example.

My first Rust program was an implementation of a simple, insecure monoalphabetic substitution ciphers (inspired, perhaps, because I've already written a genetic algorithm in Python to crack them). I'm pleased that the code ends up clean and easy to read. For example, a snippet that encodes a single character with ROT13 might be

// A function that takes a char c, and returns its rot13 substitution
fn rot13_char(c: char) -> char {
    let base = match c {
        'a'..'z' => 'a' as u8,  // If c is in [a-z], set base to 97
        'A'..'Z' => 'A' as u8,  // If c is in [A-Z], set base to 65
        _ => return c  // For all other characters, return c

    let ord = c as u8 - base;  // ord is in the range 0-25
    let rot = (ord + 13) % 26;  // rot13
    (rot + base) as char  // convert back to an ASCII character. Note no
                          // semicolon - this is an implicit return

I also spent some time working out how to call Rust code from Python, which would allow me to use both languages to their strength in the same project. It turns out it isn't hard to do this, by compiling a .so in Rust with the #[no_mangle] annotation on the exported methods, and some simple ctypes magic on the Python side. One downside is that so far I've only worked out how to pass strings as c_char_p which is not optimal either for Rust or Python. Sample code is on Bitbucket.

I could see myself using Rust in some projects in the future, though I'm not likely to stop using Python for the vast majority of applications. The Rust language itself is changing week to week and is probably unsuitable for any production development at this time, but for early adopters it's well worth a look.

Code quality flows from good tools

Delivering high quality code stands on two pillars: the developer's wisdom to write code well, and tools to inform and guide the developer towards better practice. Developers are clever, and will make poor tools work, but the benefits of great tools go beyond making the developers' lives easier, and actively promote higher quality code.

Here are my picks for sharp tools that improve not developer productivity but code quality.

Version Control Hosting

Going beyond just the benefits of keeping code in version control, tools like Rhodecode or Gitorious (both self-hostable) or Github or Bitbucket (SaaS) allow developers to create new repositories so that unwieldy projects can be split, or new tools and supporting apps can be kept disentangled from the existing code.

You really don't want developers to be bound by the architectural decisions made long ago and codified in pre-created repositories that are hard to get changed.

Code Review

The best code review tools let you show uncommitted changes to other developers, provide high-quality diffs that make it easy to read and understand the impact of a change, and let the other developers give detailed feedback on multiple sections of code. With this feedback developers can rapidly turn patches around and resubmit until they are perfect. Pre-commit review means that the committed history doesn't contain unnecessary clutter; each commit will do exactly one thing in as good code as can be achieved.

Code review can catch issues such as potential error cases, security weaknesses, duplicated code or missing tests or documentation. However the benefits of code review go far beyond the direct ability to catch problem code. Once working with code review, developers that to get their code through review they should adapt their coding style to be clearer and more legible, and pre-empt the criticisms that will be levelled by the reviewers. Code review also facilitates mutual learning: more people pay more attention to the new features that go into the codebase, and so understand the codebase better; also inexperienced developers get guidance from the more experienced developers about how their code could be improved.

Some hosted version control systems (eg. Github) have code review built in, or there are self-hosted tools such as ReviewBoard or SaaS tools like Atlassian Crucible.

Linters/Code Style checkers

Thee earliest time you can get feedback about code quality to developers is when the code is being edited. (If you're not a Pythonista, you'll have to translate this to your own language of choice.)

Linters like Pyflakes can be run in the editor to highlight potential problems, while style checkers like highlight coding style violations. Many IDEs will ship with something like this, but if yours doesn't then plugins are usually available.

Pyflakes is good at spotting undeclared and unused variables, and produces relatively few false positives; on the occasions I've tried PyLint I found it pedantic and plain wrong whenever anything vaguely magical happens. You can tailor it back with some configuration but in my opinion it's not worth it. is valuable and worth adopting, even if your own coding style is different (though don't change if your project already has a consistent style). The style promoted by pep8 is pleasantly spaced and comfortable to read, and offers a common standard across the Python community. I've found even the controversial 80-column line length limit useful - long lines are less readable, both when coding and when viewing side-by-side diffs in code review or three-way diff tools.

You might also consider docstring coverage checkers (though I've not seen one integrated with an editor yet). I find docstrings invaluable for commenting the intention that the developer had when they wrote the code, so that if you're debugging some strange issue later you can identify bits of code that don't do what the developer thought they did.

With Python's ast module it isn't hard to write a checker for the kind of bad practice that comes up in your own project.

Test Fixture Injection

Test code has a tendency to sprawl, with some developers happy to copy-and-paste code into dozens of test cases, suites and modules. Big test code becomes slow, unloved and hard to maintain. Of course, you can criticise these practices in code review, but it's an uphill challenge unless you can provide really good alternatives.

The kind of test fixtures your application will need will of course depend on your problem domain, but regardless of your requirements it's worth considering how developers can create the data their tests will depend on easily and concisely - without code duplication.

There are off-the-shelf fixture creation frameworks like factory_boy, which focuses on populating ORM fixtures, and integrated frameworks like Django have test fixture management tools.

However where these are not appropriate, it can be valuable to write the tools you need to make succinct, easily maintained tests. In our project we populate our object database using test objects loaded from YAML. You could also do this in-memory objects if the code required to create them is more complicated or slower than just describing the state they will have when created.

Another approach also in use in our project is to create a DSL that allows custom objects to be created succinctly. A core type in our project is an in-memory tabular structure. Creating and populating these requires a few lines of code, but for tests where tables are created statically rather than procedurally we construct them by parsing a triple-quoted string of the form:

| user | (int) karma | description |
| dave | 5           | hoopy frood |
| bob  | 0           | None        |

This kind of approach has not only simplified our tests but has made them faster, more comprehensive, and more stable.

What tools are most effective at promoting code quality in your projects? Leave a comment below.

Pyweek 18 announced

Pyweek 18 was announced last week, to run from the 11th May to 18th May 2014, midnight to midnight (UTC).

Pyweek is a bi-annual games programming contest in which teams or individuals compete to develop a game, in Python, from scratch, in exactly one week, on a theme that is selected by vote and announced at the moment the contest starts.

The contest offers the opportunity to program alongside other Python programmers on a level playing field, with teams diarising their progress via the site, as well as chatting on IRC (#pyweek on Freenode).

Games are scored by other entrants, on criteria of fun, production and innovation, and it's a hectic juggling act to achieve all three in the limited time available.

It's open to all, and very beginner friendly. You don't need a team, you don't need finely honed artistic ability, and you don't need to set aside the whole week - winning games have been created in less than a day. I'd encourage you to take part: it's a great opportunity to explore your creative potential and learn something new.

Browse (and play) the previous entries at the site.

Pyweek 18 kicks off with the theme voting starting at 2014-05-04 00:00 UTC.

Python imports

Though I've been using Python for 10 years I still occasionally trip over the magic of the import statement. Or rather the fact that it is completely unmagical.

The statement

import lemon.sherbet

does a few simple things, effectively:

  1. Unless it's already imported, creates a module object for lemon and evaluates lemon/ in the namespace of the module object.
  2. Unless it's already imported, creates a module object for sherbet, evaluates lemon/ in the namespace of the module object, and assigns the sherbet module to the name sherbet in lemon.
  3. assigns the lemon module to the name lemon in __main__.

(Obviously, I'm omitting a lot of the details, such as path resolution, sys.modules or import hooks).

This basic mechanism has some strange quirks. Suppose the full source tree contains:

├── lemon
│   ├──
│   ├──
│   ├──
│   └──

And contains

import lemon.curd

At first glance, I find it odd that this code works:

import curd_machine
import lemon.sherbet
  1. I can access lemon, but I didn't explicitly import it. Of course, this happens because the import lemon.sherbet line ultimately puts the lemon module into my current namespace.
  2. I can also access lemon.curd without explicitly importing it. This is simply because the module structure is stateful. Something else assigned the lemon.curd module to the name curd in the lemon module. I've imported lemon, so I can access lemon.curd.

I'm inclined to the view that relying on either of these quirks would be relatively bad practice, resulting in more fragile code, so it's useful to be aware of them.

The former of these quirks also affects Pyflakes. Pyflakes highlights in my IDE variables that I haven't declared. But it fails to spot obvious mistakes like this:

import lemon.sherbet

which when run will produce an error:

AttributeError: 'module' object has no attribute 'soda'

There's still nothing mysterious about this; Pyflakes only sees that lemon is defined, and has no idea whether lemon.soda is a thing.

I think the reason that this breaks in my mind is due to a problem of leaky abstraction in my working mental models. I tend to think of the source tree as a static tree of declarative code, parts of which I can map into the current namespace to use. It isn't this though; it is an in-memory structure being built lazily. And it isn't mapped it into a namespace, the namespace just gets the top level names and my code traverses through the structure.

Maybe I formed my mental models long ago when I used to program more Java, where the import statement does work rather more like I've described. I wonder if people with no experience of Java are less inclined to think of it like I do?

CRC Cards

A lot of the software I've written has never been through any formal design process. Especially with Python, because of the power of the language to let me quickly adapt and evolve a program, I have often simply jumped in to writing code without thinking holistically about the architecture of what I'm writing. My expectation that a good architecture will emerge, at least for the parts where it matters.

This approach may work well if you are programming alone, but is hampered if you are practicing (unit) test-driven development, or are working in a team. Unit tests disincentivise you against refactoring components, or at least slows the process down. I would point out that if unit tests are resolutely hard to write then your code may be badly architected.

Working as a team reduces your ability to have perfect knowledge of all components of the system, which would be required to spot useful refactorings.

In practice I've found that if we don't do any up-front design, we won't ever end up writing great software: some bits will be good, other bits will be driven by expedience and stink, and won't get refactored, and will be a blight on the project for longer than anyone expected.

Class-responsibility-collaboration (CRC) Cards are a lightweight technique for collaboratively designing a software system, which I've used a few times over the past couple of years and which seems to produce good results.

The technique is simple: get the team in a room, write down suggested classes in a system on index cards on a table, then iterate and adapt the cards until the design looks "good". Each card is titled with the name of the class, a list of the responsibilities of the class, and a list of the other classes with which the class will collaborate. The cards can be laid out so as to convey structure, and perhaps differently coloured cards might have different semantics.

One of the original CRC cards drawn by Ward Cunningham.

CRC cards are founded on object-oriented principles, and I don't want our code to be unnecessarily objecty, so I'm quick to point out that not every card will correspond to a Python class. A card may also correspond to a function, a module, or an implied schema for some Python datastructure (eg. a contract on what keys will be present in a dict). I think of them as Component-responsibility-collaboration cards. The rules are deliberately loose. For example, there's no definition of what is "good" or how to run the session.

Running a CRC design session is perhaps the key art, and one that I can't claim to have mastered. Alistair Cockburn suggests considering specific scenarios to evaluate a design. In CRC sessions I've done I've tried to get the existing domain knowledge written down at the start of the session. If there's an existing architecture, write that down first. That's an excellent way to start, because then you just need to refactor and extend it. You could also write down fixed points that you can't or don't want to change right now, perhaps on differently colour cards.

It does seem to be difficult to get everyone involved in working on the cards. Your first CRC session might see people struggling to understand the "rules", much less contribute. Though it harks back to the kind of textbook OO design problems that you encounter in early university courses, even experienced developers may be rusty at formal software design. However, once you get people contributing, CRC allows the more experienced software engineers to mentor the less experienced team members by sharing the kind of rationale they are implicitly using when they write software a certain way.

I think you probably need to be methodical about working through the design, and open about your gut reactions to certain bits of design. Software architecture involves a lot of mental pattern matching as you compare the design on the table to things that have worked well (or not) in the past, so it can be difficult to justify why you think a particular design smells. So speak your mind and suggest alternatives that somehow seem cleaner.

The outcome of a CRC design session is a stack of index cards that represent things to build. With the design fixed, the building of these components seems to be easier. Everyone involved in the session is clear on what the design is, and a summary of the spec is on the card so less refactoring is needed.

I've also found the components are easier to test, because indirection/abstraction gets added in the CRC session than you might not add if you were directly programming your way towards a solution. For example, during design someone might say "We could make this feature a new class, and allow for alternative implementations". These suggestions are added for the elegance and extensibility of their design, but this naturally offers easier mock dependency injection (superior to mock.patch() calls any day).

CRC cards seem to a cheap way to improve the quality of our software. Several weeks' work might be covered in an hour's session. We've not used CRC as often as we could have, but where we have I'm pleased with the results: our software is cleaner, and working on cleaner software makes me a happier programmer.

2013 In Review

I'd like to close 2013 with a retrospective of the year and some thoughts on what I'd like to achieve in 2014.


In March 2013 I decided to leave my contract at luxury phone manufacturer Vertu and take up a contract at Bank of America Merrill Lynch. The two years I spent at Vertu spanned the period where they separated from Nokia and were sold. As part of this separation I was involved in putting in place contemporary devops practices, datacentres, development tools and CI, and leading a team to build exclusive web apps and web services. We got to play with cool new technologies and turn them to our advantage, to deliver, fast.

For example, I spent January and February developing a new version of Vertu's lifestyle magazine Vertu Life using Django. Using ElasticSearch instead of Django's ORM was a great choice: I was not only able to build strong search features but get more value out of the content by adding "More like this" suggestions in many pages. Though Vertu Life is just a magazine, the site allows some personalisation. All such writes went to Redis, so the site was blazingly fast.

Bank of America Merrill Lynch

Joining Bank of America meant moving from Reading to London, and I handed over duties as the convenor of the Reading Python Dojo to Mark East (who has since also joined Bank of America, coincidentally).

Bank of America's big Python project Quartz is a Platform-as-a-Service for writing desktop banking apps and server-side batch jobs, and I joined a team maintaining some of the Quartz reconciliation technology components. Quartz is a complex platform with a lot of proprietary components, and it all seems very alien to software developers until you start to understand the philosophy behind it better.

This was an interesting project to join because it was a somewhat established application with reams of what everyone likes to call "legacy code". Coming into this, I had to learn a lot about how the code works and how Quartz works before being able to spot ways to improve this.

Banking is also a very technical industry and this also presents challenges around communication between bankers and software engineers like me. Agile adoption is in its infancy at Bank of America, but has buy in at the senior management level, which is exciting and challenging.

Quartz is not only a project; it's a large internal community (2000+ developers), so the challenges we face are not just technical but social and political. I've learned that collaboration in a project the size of Quartz requires putting more effort in communication than smaller projects. The natural tendancy is towards towards siloisation and fragmentation. We have got better about doing things in a way that they could be more easily re-used, then talking and blogging about them.


There were Devopsdays conferences in London in March and November, and I look forward to more in 2014. As well as talks covering technical approaches to improving software development and operations, and talks on how to improve cross-business collaboration, Devopsdays offers plenty of opportunities to network, to discuss problems you are tackling and share experiences about approaches that have worked and have not.

In March I gave this talk. I also wrote a blogpost about DevopsDays in November.


Though I'm excited about going to Berlin in 2014, I'm very sorry that Europthon 2013 was the last in Florence. Florence is full of beautiful art and architecture but is also a place to relax in the sunshine with great food and great company, and talk about interesting things (not least, Python, of course).

After two years of lurking at Europython, this year I was organised enough to offer a talk on Programming physics games with Python and OpenGL. People have told me this was well received, though I think I could do with practice at giving talks :)

After Europython, I took a week driving around Tuscany with my girlfriend. Tuscany is beautiful, both the Sienese hill towns and the Mediterranean beach resorts, and the food and wine is excellent. I recommend it. Though perhaps I wouldn't drive my own car down from London again. Italy is a long way to drive.

Pycon UK

At Pycon UK I gave a talk on "Cooking up high quality software", in full chef's whites and in my best dodgy french accent. Hopefully my audience found this humorous and perhaps a little bit insightful. I was talking exclusively in metaphors - well, puns - but I hope some people took away some messages.

I think if I had to sum up those messages I was encouraging developers to think beyond just the skills involved in cooking a dish, but the broader picture of how the kitchen is organised and indeed, everything else that goes on in the restaurant.

Several of the questions were about my assertion that the "perfect dish" requires choosing exactly the right ingredients - and may involve leaving some ingredients out. I was asked if I mean that we should really leave features out. Certainly I do; I think the key to scalable software development is in mitigating complexity and that requires a whole slew of techniques, including leaving features out.

Pycon UK was also notable for the strong education track, which we at Bank of America sponsored, and which invited children and teachers to come in and work alongside developers for mutual education.


PyWeek is a week-long Python games programming contest that I have been entering regularly for the last few years.

This year I entered both the April and the September PyWeek with Arnav Khare, who was a colleague at Vertu.

Our entry in PyWeek 16 (in April) was Warlocks, a simple 2D game with a home-rolled 3D engine and lighting effects. I was pleased with achieving a fully 3D game with contemporary shaders in the week, but we spent too much time on graphical effects and the actual game was very shallow indeed, a simple button-mashing affair where two wizards face each other before hurling a small list of particle-based spells at each other.

I was much happier with out PyWeek 17 entry, Moonbase Apollo, which was a deliberately less ambitious idea. We wanted to add a campaign element to a game that was a cross between Asteroids and Gravitar. A simple space game is easy to write and doesn't require very much artwork. It was a strategy that allows us to have the bulk of the game mechanics written on day 1, so we had the rest of the week to improve production values and add missions.

We were relatively happy with the scores we got for these but neither was a podium finish :-(


So what will I get up to in 2014?

I'm keen to do more Python 3. Alex Gaynor has blogged about lack of Python 3 adoption and I regret that I haven't done much to move towards using Python 3 in my day-to-day coding this year. Bank of America is stuck on Python 2.6. I still feel that Python 3 is the way forward, perhaps now more than ever, now that Django runs under Python 3, but I tend to pick Python 2 by default. I did consider opting for Python 3 as our core technology when the decision arose at Vertu, but at that time some of the libraries we really needed were not available on Python 3. So I chose the safe choice. I think today, I might chose differently.

Load Balancer Testing with a Honeypot Daemon

This is a write up of a talk I originally gave at DevopsDays London in March 2013. I had a lot of positive comments about it, and people have asked me repeatedly to write it up.


At a previous contract, my client had over the course of a few years outsourced quite a handful of services under many different domains. Our task was to move the previously outsourced services into our own datacentre as both a cost saving exercise and to recover flexibility that had been lost.

In moving all these services around, there evolved a load balancer configuration that consisted of

  • Some hardware load balancers managed by the datacentre provider that mapped ports and also unwrapped SSL for a number of the domains. These were inflexible and couldn't cope with the number of domains and certificates we needed to manage.
  • Puppet-managed software load balancers running
    • Stunnel to unwrap SSL
    • HAProxy as a primary load balancer
    • nginx as a temporary measure for service migration, for example, dark launch

As you can imagine there were a lot of moving parts in this system, and something inevitably broke.

In our case, an innocuous-looking change passed through code review that broke transmission of accurate X-Forwarded-For headers. The access control to some of our services was relaxed for certain IP ranges as transmitted with X-Forwarded-For headers. Only a couple of days after the change went in we found the Googlebot had spidered some of our internal wiki pages! Not good! The lesson is obvious and important: you must write tests of your infrastructure.

Unit testing a load balancer

A load balancer is a network service that forwards incoming requests to one of a number of backend services:


A pattern for unit testing is to substitute mock implementations for all components of a system except the unit under test. We can then verify the outputs for a range of given inputs.


To be able to unit test the Puppet recipes for the load balancers, we need to be able to create "mock" network services on arbitrary IPs and ports that the load balancer will communicate with, and which can respond with enough information for the test to check that the load balancer has forwarded each incoming request to the right host with the right headers included.


The first incarnation of tests was clumsy. It would spin up dozens of network interface aliases with various IPs, put a webservice behind those, then run the tests against the mock webservice. The most serious problem with this approach was that it required slight load balancer configuration changes so that the new services could come up cleanly. It also required tests to run as root to create the interface aliases and bind the low port numbers required. It was also slow. It only mocked the happy path, so tests could hit real services if there were problems with the load balancer configuration.

I spent some time researching whether it would be possible to run these mock network services without significantly altering the network stack of the machine under test. Was there any tooling around using promiscuous mode interfaces, perhaps? I soon discovered libdnet and from there Honeyd, and realised this would do exactly what I needed.

Mock services with honeyd

Honeyd is a service intended to create virtual servers on a network, which can respond to TCP requests etc, for network intrustion detection. It does all this by using promiscuous mode networking and raw sockets, so that it doesn't require changes to the host's real application-level network stack at all. The honeyd literature also pointed me in the direction of combining honeyd with farpd so that the mock servers can respond to ARP requests.

More complicated was that I needed to create scripts to create mock TCP services. I needed my mock services to send back HTTP headers, IPs, ports and SSL details so that the test could verify these were as expect. To create service Honeyd requires you to write programs that communicate on stdin and stdout as if these were the network socket (this is similar to inetd). While it is easy to write this for HTTP and a generic TCP socket, it's harder for HTTPS, as the SSL libraries will only wrap a single bi-directional file descriptor. I couldn't find a way of treating stdin and stdout as a single file descriptor. I eventually solved this by wrapping one end of a pipe with SSL and proxying the other end of the pipe to stdin and stdout. If anyone knows of a better solution for this, please let me know.

With these in place, I was able to create a honeyd configuration that reflected our real network:

# All machines we create will have these properties
create base
set base personality "Linux 2.4.18"
set base uptime 1728650
set base maxfds 35
set base default tcp action reset

# Create a standard webserver template
clone webserver base
add webserver tcp port 80 "/usr/share/honeyd/scripts/ --REMOTE_HOST $ipsrc --REMOTE_PORT $sport --PORT $dport --IP $ipdst --SSL no"

# Network definition
bind webserver
bind webserver

This was all coupled an interface created with the Linux veth network driver (after trying a few other mock networking devices that didn't work). With Debian's ifup hooks, I was able to arrange it so that bringing up this network interface would start honeyd and farpd and configure routes so that the honeynet would be seen in prefence to the real network. There is a little subtlety in this, because we needed the real DNS servers to be visible, as the load balancer requires DNS to work. Running ifdown would restore everything to normal.

Writing the tests

The tests were then fairly simple BDD tests against the mocked load balancer, for example:

Feature: Load balance production services

    Scenario Outline: Path-based HTTP backend selection
        Given the load balancer is listening on port 8001
        When I make a request for http://<domain><path> to the loadbalancer
        Then the backend host is <ip>
        And the path requested is <path>
        And the backend request contained a header Host: <domain>
        And the backend request contained a valid X-Forwarded-For header

        | domain             | path      | ip             |
        |      | /         | 42.323.167.197 |
        | | /api/cfe/ |      |
        | | /api/ccg/ |       |

The honeyd backend is flexible and fast. Of course it was all Puppetised as a single Puppet module that added all the test support; the load balancer recipe was applied unmodified. While I set up as a virtual network device for use on development loadbalancer VMs, you could also deploy it on a real network, for example for continuous integration tests or for testing hardware network devices.

As I mentioned in my previous post, having written BDD tests like this it's easier to reason about the system, so the tests don't just catch errors (protecting against losing our vital X-Forwarded-For headers) but give an overview of the load balancer's functions that makes it easier to understand and adapt in a test-first way as services migrate. We were able to make changes faster and more confidently and ultimately complete the migration project swiftly and successfully.

Experiences with BDD

What is BDD?

Behaviour Driven Development (BDD) is a practice where developers collaborate with business stakeholders to develop executable specifications for pieces of development that they are about to start. Test Driven Development (TDD) says that tests should be written before development, but it doesn't say how tests should be written.

BDD builds on TDD by proposing that the first tests should be functional/acceptance tests written in business-oriented language. Using a business-oriented language rather than code allows stakeholders to be involved in verifying that a feature satisfy the business' requirements before work on that feature even commences. You might then do TDD at the unit test level around individual components as you develop.


The tools for BDD have generally come to revolve around Gherkin, a simple structure for natural language specifications.

My favourite description of Gherkin-based tools is given by the Ruby Cucumber website:

  1. Describe behaviour in plain text
  2. Write a step definition
  3. Run Cucumber and watch it fail
  4. Write code to make the step pass
  5. Run Cucumber again and see the step pass

To summarise, a feature might be described in syntax like:

Feature: Fight or flight
    In order to increase the ninja survival rate,
    As a ninja commander
    I want my ninjas to decide whether to take on an
    opponent based on their skill levels

    Scenario: Weaker opponent
        Given the ninja has a third level black-belt
        When attacked by a samurai
        Then the ninja should engage the opponent

You then write code that binds this specification language to a test implementation. Thus the natural language becomes a functional test.

This results in three tiers:

  1. The specification language
  2. The specification language code bindings
  3. The system(s) under test

Python Tools

In Python there are a few tools that implement Gherkin:

Of these I've only had experience with lettuce (we hacked up an internal fork of lettuce with HTML and modified xUnit test output), but outwardly they are similar.

Experiences of implementing BDD

A complaint I've heard a couple of times about BDD as a methodology is that it remains difficult to get the business stakeholders to collaborate in writing or reviewing BDD tests. This was my experience too, though there is a slightly weaker proposition of Specification by Example where the stakeholders are asked just to provide example cases for the developers to turn into tests. This doesn't imply the same bi-directionality and collaboration as BDD.

If you don't get collboration with your stakeholders there are still benefits to be had from BDD techniques if you put yourself in the shoes of the stakeholder and develop the BDD tests you would want to see. It gives you the ability later to step back and see the software at a higher level than as a collection of tested components. You may find this level is easier to reason at, especially for new starters and new team members.

Another complaint is that it seems like it's more work, with the two-step process - first write natural language, then work out how to implement those tests - but in fact, I found it makes it much easier to write tests in the first place. Where in TDD you have to start by thinking what the API looks like, in BDD you start with a simple definition of what you want to see happening. You soon build up a language that completely covers your application domain and the programming work required in creating new tests continues to drop.

Another positive observation is that the three tiers give your tests are protected from inadvertant change as the project developers. While your code might change, and the corresponding specification language code bindings might change, well-written Gherkin features will not need to change. Without using BDD I have encountered situations where functionality was broken because the tests that would have caught it were changed or removed at the same time that the implementation was changed. BDD protects against that.

The natural language syntax is helpful at ensuring that tests are written at a functional level. Writing tests in natural language makes it much more visible when you're getting too much into implementational detail, as you start to require weirdly detailed language and language that the business users would not understand.


There are a couple of pitfalls that I encountered. One is just that the business stakeholders won't be good at writing tests, and so the challenge of collaborating to develop the BDD tests is hard to solve. Just writing something in natural language isn't enough, you need to get on the path of writing tests that take advantage of existing code bindings and that are eminently testable scenarios.

Another pitfall was that you need to ensure that the lines of natural language really are implemented in the code binding by a piece of code that does what it says. Occasionally I saw code that tested not the desired function, but some proxy: assuming that if x, then y, let's test x, because it's easier to test. You really really need to test y, or the BDD tests will erroneously pass when that assumption breaks.