Code quality flows from good tools    Posted:

Delivering high quality code stands on two pillars: the developer's wisdom to write code well, and tools to inform and guide the developer towards better practice. Developers are clever, and will make poor tools work, but the benefits of great tools go beyond making the developers' lives easier, and actively promote higher quality code.

Here are my picks for sharp tools that improve not developer productivity but code quality.

Version Control Hosting

Going beyond just the benefits of keeping code in version control, tools like Rhodecode or Gitorious (both self-hostable) or Github or Bitbucket (SaaS) allow developers to create new repositories so that unwieldy projects can be split, or new tools and supporting apps can be kept disentangled from the existing code.

You really don't want developers to be bound by the architectural decisions made long ago and codified in pre-created repositories that are hard to get changed.

Code Review

The best code review tools let you show uncommitted changes to other developers, provide high-quality diffs that make it easy to read and understand the impact of a change, and let the other developers give detailed feedback on multiple sections of code. With this feedback developers can rapidly turn patches around and resubmit until they are perfect. Pre-commit review means that the committed history doesn't contain unnecessary clutter; each commit will do exactly one thing in as good code as can be achieved.

Code review can catch issues such as potential error cases, security weaknesses, duplicated code or missing tests or documentation. However the benefits of code review go far beyond the direct ability to catch problem code. Once working with code review, developers that to get their code through review they should adapt their coding style to be clearer and more legible, and pre-empt the criticisms that will be levelled by the reviewers. Code review also facilitates mutual learning: more people pay more attention to the new features that go into the codebase, and so understand the codebase better; also inexperienced developers get guidance from the more experienced developers about how their code could be improved.

Some hosted version control systems (eg. Github) have code review built in, or there are self-hosted tools such as ReviewBoard or SaaS tools like Atlassian Crucible.

Linters/Code Style checkers

Thee earliest time you can get feedback about code quality to developers is when the code is being edited. (If you're not a Pythonista, you'll have to translate this to your own language of choice.)

Linters like Pyflakes can be run in the editor to highlight potential problems, while style checkers like highlight coding style violations. Many IDEs will ship with something like this, but if yours doesn't then plugins are usually available.

Pyflakes is good at spotting undeclared and unused variables, and produces relatively few false positives; on the occasions I've tried PyLint I found it pedantic and plain wrong whenever anything vaguely magical happens. You can tailor it back with some configuration but in my opinion it's not worth it. is valuable and worth adopting, even if your own coding style is different (though don't change if your project already has a consistent style). The style promoted by pep8 is pleasantly spaced and comfortable to read, and offers a common standard across the Python community. I've found even the controversial 80-column line length limit useful - long lines are less readable, both when coding and when viewing side-by-side diffs in code review or three-way diff tools.

You might also consider docstring coverage checkers (though I've not seen one integrated with an editor yet). I find docstrings invaluable for commenting the intention that the developer had when they wrote the code, so that if you're debugging some strange issue later you can identify bits of code that don't do what the developer thought they did.

With Python's ast module it isn't hard to write a checker for the kind of bad practice that comes up in your own project.

Test Fixture Injection

Test code has a tendency to sprawl, with some developers happy to copy-and-paste code into dozens of test cases, suites and modules. Big test code becomes slow, unloved and hard to maintain. Of course, you can criticise these practices in code review, but it's an uphill challenge unless you can provide really good alternatives.

The kind of test fixtures your application will need will of course depend on your problem domain, but regardless of your requirements it's worth considering how developers can create the data their tests will depend on easily and concisely - without code duplication.

There are off-the-shelf fixture creation frameworks like factory_boy, which focuses on populating ORM fixtures, and integrated frameworks like Django have test fixture management tools.

However where these are not appropriate, it can be valuable to write the tools you need to make succinct, easily maintained tests. In our project we populate our object database using test objects loaded from YAML. You could also do this in-memory objects if the code required to create them is more complicated or slower than just describing the state they will have when created.

Another approach also in use in our project is to create a DSL that allows custom objects to be created succinctly. A core type in our project is an in-memory tabular structure. Creating and populating these requires a few lines of code, but for tests where tables are created statically rather than procedurally we construct them by parsing a triple-quoted string of the form:

| user | (int) karma | description |
| dave | 5           | hoopy frood |
| bob  | 0           | None        |

This kind of approach has not only simplified our tests but has made them faster, more comprehensive, and more stable.

What tools are most effective at promoting code quality in your projects? Leave a comment below.


Pyweek 18 announced    Posted:

Pyweek 18 was announced last week, to run from the 11th May to 18th May 2014, midnight to midnight (UTC).

Pyweek is a bi-annual games programming contest in which teams or individuals compete to develop a game, in Python, from scratch, in exactly one week, on a theme that is selected by vote and announced at the moment the contest starts.

The contest offers the opportunity to program alongside other Python programmers on a level playing field, with teams diarising their progress via the site, as well as chatting on IRC (#pyweek on Freenode).

Games are scored by other entrants, on criteria of fun, production and innovation, and it's a hectic juggling act to achieve all three in the limited time available.

It's open to all, and very beginner friendly. You don't need a team, you don't need finely honed artistic ability, and you don't need to set aside the whole week - winning games have been created in less than a day. I'd encourage you to take part: it's a great opportunity to explore your creative potential and learn something new.

Browse (and play) the previous entries at the site.

Pyweek 18 kicks off with the theme voting starting at 2014-05-04 00:00 UTC.


Python imports    Posted:

Though I've been using Python for 10 years I still occasionally trip over the magic of the import statement. Or rather the fact that it is completely unmagical.

The statement

import lemon.sherbet

does a few simple things, effectively:

  1. Unless it's already imported, creates a module object for lemon and evaluates lemon/ in the namespace of the module object.
  2. Unless it's already imported, creates a module object for sherbet, evaluates lemon/ in the namespace of the module object, and assigns the sherbet module to the name sherbet in lemon.
  3. assigns the lemon module to the name lemon in __main__.

(Obviously, I'm omitting a lot of the details, such as path resolution, sys.modules or import hooks).

This basic mechanism has some strange quirks. Suppose the full source tree contains:

├── lemon
│   ├──
│   ├──
│   ├──
│   └──

And contains

import lemon.curd

At first glance, I find it odd that this code works:

import curd_machine
import lemon.sherbet
  1. I can access lemon, but I didn't explicitly import it. Of course, this happens because the import lemon.sherbet line ultimately puts the lemon module into my current namespace.
  2. I can also access lemon.curd without explicitly importing it. This is simply because the module structure is stateful. Something else assigned the lemon.curd module to the name curd in the lemon module. I've imported lemon, so I can access lemon.curd.

I'm inclined to the view that relying on either of these quirks would be relatively bad practice, resulting in more fragile code, so it's useful to be aware of them.

The former of these quirks also affects Pyflakes. Pyflakes highlights in my IDE variables that I haven't declared. But it fails to spot obvious mistakes like this:

import lemon.sherbet

which when run will produce an error:

AttributeError: 'module' object has no attribute 'soda'

There's still nothing mysterious about this; Pyflakes only sees that lemon is defined, and has no idea whether lemon.soda is a thing.

I think the reason that this breaks in my mind is due to a problem of leaky abstraction in my working mental models. I tend to think of the source tree as a static tree of declarative code, parts of which I can map into the current namespace to use. It isn't this though; it is an in-memory structure being built lazily. And it isn't mapped it into a namespace, the namespace just gets the top level names and my code traverses through the structure.

Maybe I formed my mental models long ago when I used to program more Java, where the import statement does work rather more like I've described. I wonder if people with no experience of Java are less inclined to think of it like I do?


CRC Cards    Posted:

A lot of the software I've written has never been through any formal design process. Especially with Python, because of the power of the language to let me quickly adapt and evolve a program, I have often simply jumped in to writing code without thinking holistically about the architecture of what I'm writing. My expectation that a good architecture will emerge, at least for the parts where it matters.

This approach may work well if you are programming alone, but is hampered if you are practicing (unit) test-driven development, or are working in a team. Unit tests disincentivise you against refactoring components, or at least slows the process down. I would point out that if unit tests are resolutely hard to write then your code may be badly architected.

Working as a team reduces your ability to have perfect knowledge of all components of the system, which would be required to spot useful refactorings.

In practice I've found that if we don't do any up-front design, we won't ever end up writing great software: some bits will be good, other bits will be driven by expedience and stink, and won't get refactored, and will be a blight on the project for longer than anyone expected.

Class-responsibility-collaboration (CRC) Cards are a lightweight technique for collaboratively designing a software system, which I've used a few times over the past couple of years and which seems to produce good results.

The technique is simple: get the team in a room, write down suggested classes in a system on index cards on a table, then iterate and adapt the cards until the design looks "good". Each card is titled with the name of the class, a list of the responsibilities of the class, and a list of the other classes with which the class will collaborate. The cards can be laid out so as to convey structure, and perhaps differently coloured cards might have different semantics.

One of the original CRC cards drawn by Ward Cunningham.

CRC cards are founded on object-oriented principles, and I don't want our code to be unnecessarily objecty, so I'm quick to point out that not every card will correspond to a Python class. A card may also correspond to a function, a module, or an implied schema for some Python datastructure (eg. a contract on what keys will be present in a dict). I think of them as Component-responsibility-collaboration cards. The rules are deliberately loose. For example, there's no definition of what is "good" or how to run the session.

Running a CRC design session is perhaps the key art, and one that I can't claim to have mastered. Alistair Cockburn suggests considering specific scenarios to evaluate a design. In CRC sessions I've done I've tried to get the existing domain knowledge written down at the start of the session. If there's an existing architecture, write that down first. That's an excellent way to start, because then you just need to refactor and extend it. You could also write down fixed points that you can't or don't want to change right now, perhaps on differently colour cards.

It does seem to be difficult to get everyone involved in working on the cards. Your first CRC session might see people struggling to understand the "rules", much less contribute. Though it harks back to the kind of textbook OO design problems that you encounter in early university courses, even experienced developers may be rusty at formal software design. However, once you get people contributing, CRC allows the more experienced software engineers to mentor the less experienced team members by sharing the kind of rationale they are implicitly using when they write software a certain way.

I think you probably need to be methodical about working through the design, and open about your gut reactions to certain bits of design. Software architecture involves a lot of mental pattern matching as you compare the design on the table to things that have worked well (or not) in the past, so it can be difficult to justify why you think a particular design smells. So speak your mind and suggest alternatives that somehow seem cleaner.

The outcome of a CRC design session is a stack of index cards that represent things to build. With the design fixed, the building of these components seems to be easier. Everyone involved in the session is clear on what the design is, and a summary of the spec is on the card so less refactoring is needed.

I've also found the components are easier to test, because indirection/abstraction gets added in the CRC session than you might not add if you were directly programming your way towards a solution. For example, during design someone might say "We could make this feature a new class, and allow for alternative implementations". These suggestions are added for the elegance and extensibility of their design, but this naturally offers easier mock dependency injection (superior to mock.patch() calls any day).

CRC cards seem to a cheap way to improve the quality of our software. Several weeks' work might be covered in an hour's session. We've not used CRC as often as we could have, but where we have I'm pleased with the results: our software is cleaner, and working on cleaner software makes me a happier programmer.


2013 In Review    Posted:

I'd like to close 2013 with a retrospective of the year and some thoughts on what I'd like to achieve in 2014.


In March 2013 I decided to leave my contract at luxury phone manufacturer Vertu and take up a contract at Bank of America Merrill Lynch. The two years I spent at Vertu spanned the period where they separated from Nokia and were sold. As part of this separation I was involved in putting in place contemporary devops practices, datacentres, development tools and CI, and leading a team to build exclusive web apps and web services. We got to play with cool new technologies and turn them to our advantage, to deliver, fast.

For example, I spent January and February developing a new version of Vertu's lifestyle magazine Vertu Life using Django. Using ElasticSearch instead of Django's ORM was a great choice: I was not only able to build strong search features but get more value out of the content by adding "More like this" suggestions in many pages. Though Vertu Life is just a magazine, the site allows some personalisation. All such writes went to Redis, so the site was blazingly fast.

Bank of America Merrill Lynch

Joining Bank of America meant moving from Reading to London, and I handed over duties as the convenor of the Reading Python Dojo to Mark East (who has since also joined Bank of America, coincidentally).

Bank of America's big Python project Quartz is a Platform-as-a-Service for writing desktop banking apps and server-side batch jobs, and I joined a team maintaining some of the Quartz reconciliation technology components. Quartz is a complex platform with a lot of proprietary components, and it all seems very alien to software developers until you start to understand the philosophy behind it better.

This was an interesting project to join because it was a somewhat established application with reams of what everyone likes to call "legacy code". Coming into this, I had to learn a lot about how the code works and how Quartz works before being able to spot ways to improve this.

Banking is also a very technical industry and this also presents challenges around communication between bankers and software engineers like me. Agile adoption is in its infancy at Bank of America, but has buy in at the senior management level, which is exciting and challenging.

Quartz is not only a project; it's a large internal community (2000+ developers), so the challenges we face are not just technical but social and political. I've learned that collaboration in a project the size of Quartz requires putting more effort in communication than smaller projects. The natural tendancy is towards towards siloisation and fragmentation. We have got better about doing things in a way that they could be more easily re-used, then talking and blogging about them.


There were Devopsdays conferences in London in March and November, and I look forward to more in 2014. As well as talks covering technical approaches to improving software development and operations, and talks on how to improve cross-business collaboration, Devopsdays offers plenty of opportunities to network, to discuss problems you are tackling and share experiences about approaches that have worked and have not.

In March I gave this talk. I also wrote a blogpost about DevopsDays in November.


Though I'm excited about going to Berlin in 2014, I'm very sorry that Europthon 2013 was the last in Florence. Florence is full of beautiful art and architecture but is also a place to relax in the sunshine with great food and great company, and talk about interesting things (not least, Python, of course).

After two years of lurking at Europython, this year I was organised enough to offer a talk on Programming physics games with Python and OpenGL. People have told me this was well received, though I think I could do with practice at giving talks :)

After Europython, I took a week driving around Tuscany with my girlfriend. Tuscany is beautiful, both the Sienese hill towns and the Mediterranean beach resorts, and the food and wine is excellent. I recommend it. Though perhaps I wouldn't drive my own car down from London again. Italy is a long way to drive.

Pycon UK

At Pycon UK I gave a talk on "Cooking up high quality software", in full chef's whites and in my best dodgy french accent. Hopefully my audience found this humorous and perhaps a little bit insightful. I was talking exclusively in metaphors - well, puns - but I hope some people took away some messages.

I think if I had to sum up those messages I was encouraging developers to think beyond just the skills involved in cooking a dish, but the broader picture of how the kitchen is organised and indeed, everything else that goes on in the restaurant.

Several of the questions were about my assertion that the "perfect dish" requires choosing exactly the right ingredients - and may involve leaving some ingredients out. I was asked if I mean that we should really leave features out. Certainly I do; I think the key to scalable software development is in mitigating complexity and that requires a whole slew of techniques, including leaving features out.

Pycon UK was also notable for the strong education track, which we at Bank of America sponsored, and which invited children and teachers to come in and work alongside developers for mutual education.


PyWeek is a week-long Python games programming contest that I have been entering regularly for the last few years.

This year I entered both the April and the September PyWeek with Arnav Khare, who was a colleague at Vertu.

Our entry in PyWeek 16 (in April) was Warlocks, a simple 2D game with a home-rolled 3D engine and lighting effects. I was pleased with achieving a fully 3D game with contemporary shaders in the week, but we spent too much time on graphical effects and the actual game was very shallow indeed, a simple button-mashing affair where two wizards face each other before hurling a small list of particle-based spells at each other.

I was much happier with out PyWeek 17 entry, Moonbase Apollo, which was a deliberately less ambitious idea. We wanted to add a campaign element to a game that was a cross between Asteroids and Gravitar. A simple space game is easy to write and doesn't require very much artwork. It was a strategy that allows us to have the bulk of the game mechanics written on day 1, so we had the rest of the week to improve production values and add missions.

We were relatively happy with the scores we got for these but neither was a podium finish :-(


So what will I get up to in 2014?

I'm keen to do more Python 3. Alex Gaynor has blogged about lack of Python 3 adoption and I regret that I haven't done much to move towards using Python 3 in my day-to-day coding this year. Bank of America is stuck on Python 2.6. I still feel that Python 3 is the way forward, perhaps now more than ever, now that Django runs under Python 3, but I tend to pick Python 2 by default. I did consider opting for Python 3 as our core technology when the decision arose at Vertu, but at that time some of the libraries we really needed were not available on Python 3. So I chose the safe choice. I think today, I might chose differently.


Load Balancer Testing with a Honeypot Daemon    Posted:

This is a write up of a talk I originally gave at DevopsDays London in March 2013. I had a lot of positive comments about it, and people have asked me repeatedly to write it up.


At a previous contract, my client had over the course of a few years outsourced quite a handful of services under many different domains. Our task was to move the previously outsourced services into our own datacentre as both a cost saving exercise and to recover flexibility that had been lost.

In moving all these services around, there evolved a load balancer configuration that consisted of

  • Some hardware load balancers managed by the datacentre provider that mapped ports and also unwrapped SSL for a number of the domains. These were inflexible and couldn't cope with the number of domains and certificates we needed to manage.
  • Puppet-managed software load balancers running
    • Stunnel to unwrap SSL
    • HAProxy as a primary load balancer
    • nginx as a temporary measure for service migration, for example, dark launch

As you can imagine there were a lot of moving parts in this system, and something inevitably broke.

In our case, an innocuous-looking change passed through code review that broke transmission of accurate X-Forwarded-For headers. The access control to some of our services was relaxed for certain IP ranges as transmitted with X-Forwarded-For headers. Only a couple of days after the change went in we found the Googlebot had spidered some of our internal wiki pages! Not good! The lesson is obvious and important: you must write tests of your infrastructure.

Unit testing a load balancer

A load balancer is a network service that forwards incoming requests to one of a number of backend services:


A pattern for unit testing is to substitute mock implementations for all components of a system except the unit under test. We can then verify the outputs for a range of given inputs.


To be able to unit test the Puppet recipes for the load balancers, we need to be able to create "mock" network services on arbitrary IPs and ports that the load balancer will communicate with, and which can respond with enough information for the test to check that the load balancer has forwarded each incoming request to the right host with the right headers included.


The first incarnation of tests was clumsy. It would spin up dozens of network interface aliases with various IPs, put a webservice behind those, then run the tests against the mock webservice. The most serious problem with this approach was that it required slight load balancer configuration changes so that the new services could come up cleanly. It also required tests to run as root to create the interface aliases and bind the low port numbers required. It was also slow. It only mocked the happy path, so tests could hit real services if there were problems with the load balancer configuration.

I spent some time researching whether it would be possible to run these mock network services without significantly altering the network stack of the machine under test. Was there any tooling around using promiscuous mode interfaces, perhaps? I soon discovered libdnet and from there Honeyd, and realised this would do exactly what I needed.

Mock services with honeyd

Honeyd is a service intended to create virtual servers on a network, which can respond to TCP requests etc, for network intrustion detection. It does all this by using promiscuous mode networking and raw sockets, so that it doesn't require changes to the host's real application-level network stack at all. The honeyd literature also pointed me in the direction of combining honeyd with farpd so that the mock servers can respond to ARP requests.

More complicated was that I needed to create scripts to create mock TCP services. I needed my mock services to send back HTTP headers, IPs, ports and SSL details so that the test could verify these were as expect. To create service Honeyd requires you to write programs that communicate on stdin and stdout as if these were the network socket (this is similar to inetd). While it is easy to write this for HTTP and a generic TCP socket, it's harder for HTTPS, as the SSL libraries will only wrap a single bi-directional file descriptor. I couldn't find a way of treating stdin and stdout as a single file descriptor. I eventually solved this by wrapping one end of a pipe with SSL and proxying the other end of the pipe to stdin and stdout. If anyone knows of a better solution for this, please let me know.

With these in place, I was able to create a honeyd configuration that reflected our real network:

# All machines we create will have these properties
create base
set base personality "Linux 2.4.18"
set base uptime 1728650
set base maxfds 35
set base default tcp action reset

# Create a standard webserver template
clone webserver base
add webserver tcp port 80 "/usr/share/honeyd/scripts/ --REMOTE_HOST $ipsrc --REMOTE_PORT $sport --PORT $dport --IP $ipdst --SSL no"

# Network definition
bind webserver
bind webserver

This was all coupled an interface created with the Linux veth network driver (after trying a few other mock networking devices that didn't work). With Debian's ifup hooks, I was able to arrange it so that bringing up this network interface would start honeyd and farpd and configure routes so that the honeynet would be seen in prefence to the real network. There is a little subtlety in this, because we needed the real DNS servers to be visible, as the load balancer requires DNS to work. Running ifdown would restore everything to normal.

Writing the tests

The tests were then fairly simple BDD tests against the mocked load balancer, for example:

Feature: Load balance production services

    Scenario Outline: Path-based HTTP backend selection
        Given the load balancer is listening on port 8001
        When I make a request for http://<domain><path> to the loadbalancer
        Then the backend host is <ip>
        And the path requested is <path>
        And the backend request contained a header Host: <domain>
        And the backend request contained a valid X-Forwarded-For header

        | domain             | path      | ip             |
        |      | /         | 42.323.167.197 |
        | | /api/cfe/ |      |
        | | /api/ccg/ |       |

The honeyd backend is flexible and fast. Of course it was all Puppetised as a single Puppet module that added all the test support; the load balancer recipe was applied unmodified. While I set up as a virtual network device for use on development loadbalancer VMs, you could also deploy it on a real network, for example for continuous integration tests or for testing hardware network devices.

As I mentioned in my previous post, having written BDD tests like this it's easier to reason about the system, so the tests don't just catch errors (protecting against losing our vital X-Forwarded-For headers) but give an overview of the load balancer's functions that makes it easier to understand and adapt in a test-first way as services migrate. We were able to make changes faster and more confidently and ultimately complete the migration project swiftly and successfully.


Experiences with BDD    Posted:

What is BDD?

Behaviour Driven Development (BDD) is a practice where developers collaborate with business stakeholders to develop executable specifications for pieces of development that they are about to start. Test Driven Development (TDD) says that tests should be written before development, but it doesn't say how tests should be written.

BDD builds on TDD by proposing that the first tests should be functional/acceptance tests written in business-oriented language. Using a business-oriented language rather than code allows stakeholders to be involved in verifying that a feature satisfy the business' requirements before work on that feature even commences. You might then do TDD at the unit test level around individual components as you develop.


The tools for BDD have generally come to revolve around Gherkin, a simple structure for natural language specifications.

My favourite description of Gherkin-based tools is given by the Ruby Cucumber website:

  1. Describe behaviour in plain text
  2. Write a step definition
  3. Run Cucumber and watch it fail
  4. Write code to make the step pass
  5. Run Cucumber again and see the step pass

To summarise, a feature might be described in syntax like:

Feature: Fight or flight
    In order to increase the ninja survival rate,
    As a ninja commander
    I want my ninjas to decide whether to take on an
    opponent based on their skill levels

    Scenario: Weaker opponent
        Given the ninja has a third level black-belt
        When attacked by a samurai
        Then the ninja should engage the opponent

You then write code that binds this specification language to a test implementation. Thus the natural language becomes a functional test.

This results in three tiers:

  1. The specification language
  2. The specification language code bindings
  3. The system(s) under test

Python Tools

In Python there are a few tools that implement Gherkin:

Of these I've only had experience with lettuce (we hacked up an internal fork of lettuce with HTML and modified xUnit test output), but outwardly they are similar.

Experiences of implementing BDD

A complaint I've heard a couple of times about BDD as a methodology is that it remains difficult to get the business stakeholders to collaborate in writing or reviewing BDD tests. This was my experience too, though there is a slightly weaker proposition of Specification by Example where the stakeholders are asked just to provide example cases for the developers to turn into tests. This doesn't imply the same bi-directionality and collaboration as BDD.

If you don't get collboration with your stakeholders there are still benefits to be had from BDD techniques if you put yourself in the shoes of the stakeholder and develop the BDD tests you would want to see. It gives you the ability later to step back and see the software at a higher level than as a collection of tested components. You may find this level is easier to reason at, especially for new starters and new team members.

Another complaint is that it seems like it's more work, with the two-step process - first write natural language, then work out how to implement those tests - but in fact, I found it makes it much easier to write tests in the first place. Where in TDD you have to start by thinking what the API looks like, in BDD you start with a simple definition of what you want to see happening. You soon build up a language that completely covers your application domain and the programming work required in creating new tests continues to drop.

Another positive observation is that the three tiers give your tests are protected from inadvertant change as the project developers. While your code might change, and the corresponding specification language code bindings might change, well-written Gherkin features will not need to change. Without using BDD I have encountered situations where functionality was broken because the tests that would have caught it were changed or removed at the same time that the implementation was changed. BDD protects against that.

The natural language syntax is helpful at ensuring that tests are written at a functional level. Writing tests in natural language makes it much more visible when you're getting too much into implementational detail, as you start to require weirdly detailed language and language that the business users would not understand.


There are a couple of pitfalls that I encountered. One is just that the business stakeholders won't be good at writing tests, and so the challenge of collaborating to develop the BDD tests is hard to solve. Just writing something in natural language isn't enough, you need to get on the path of writing tests that take advantage of existing code bindings and that are eminently testable scenarios.

Another pitfall was that you need to ensure that the lines of natural language really are implemented in the code binding by a piece of code that does what it says. Occasionally I saw code that tested not the desired function, but some proxy: assuming that if x, then y, let's test x, because it's easier to test. You really really need to test y, or the BDD tests will erroneously pass when that assumption breaks.


Battleships AIs    Posted:

Of the projects we have tackled at the Reading Python Dojo one of my favourites was programming AIs for the game Battleships, which came up in January 2013.

The dojo is not usually a competition, but in this case we waived that principle and split into two teams to create an AI that could play off against the other. I set down the basic battleships rules that we would compete under:

  • The grid is 10x10.

  • Each team has the following ships:

    Ship Length
    Aircraft carrier 5
    Battleship 4
    Submarine 3
    Destroyer 3
    Patrol boat 2

The teams were not tasked with drawing a board or placing the ships. We simply drew the grids up on the whiteboard, manually placed the ships, and then had the computers call the moves. The computers were given feedback on whether each shot had hit, missed, or hit and sunk a ship.

Team A's AI was extremely deterministic, sweeping the grid in a checkerboard pattern from the bottom-right corner to the top left, until it scored a hit, in which case it would try and strafe each possible orientation of the ship in turn until it was sunk. It would then resume the sweep from where it had left off.

Team Alpha's AI was more stochastic, choosing grid squares at random until it scored a hit, then working outward like a flood-fill to completely carpet-bomb the ship. If at any point a square was completely surrounded by misses, then it could not contain a ship, and the AI would not pick this square.

On the night, Team A finally won after some astonishly unlucky misses from Team Alpha. Team Alpha benefitted from Team A's worst case performance by luckily placing a ship in the top-left corner of the grid where it would take the maximum 50 moves for Team A's sweep to find. The randomness of Team Alpha's AI injected a tension that at any point it could stumble across Team A's last ship and win, even as Team A's AI swept inexorably towards that final ship.

After the Dojo I began to wonder just how often Team A's AI would beat Team Alpha's. Team Alpha could get lucky and find all of Team A's battleships more rapidly than Team A could sweep the grid. To answer the question I wrote BattleRunner, which runs the unmodified AI scripts as subprocesses over thousands of games, albeit with a simple random ship placement strategy. It was actually my first Twisted program! While I normally use gevent for async I/O, I hit a snag very early on with Python's output buffering and wondered if switching to Twisted would solve it. It didn't; the solution was to call python with -u (or modify the AI scripts, which I was keen not to do).

The answer is that Team A beats Team Alpha about 64% of the time; Team Alpha wins 36% of the time.

BattleRunner also let me test improvements to the AIs; I was able to add a optimisation to improve how Team A's AI detects a ship's orientation. The Team A+ AI beats the original AI 54% of the time (and loses 46% of the time) - a small but significant improvement.

Perhaps you can do better than Team A's AI! There are lots of optimisations left to be had. Why not clone the repo and give it a try?


Poetry Generators at the London Python Dojo    Posted:

Last week's London Python Dojo at OneFineStay - Season 5 Episode 3 in case anyone is counting - was on the theme of poetry generators.

The theme was proposed as poetry generators using Markov chains, but as always at the Dojo many of the teams strive to take more "unique" approaches to tackling the problem. Markov chains have been seen many times at Dojos, and produce output that fools only a cursory glance:

Others seem to do not thy
Horse with his prescriptions are
String sweet self almost thence
Glazed with my way for crime
Lay but those so slow they
Sea and is all things turns
Enjoys it was thy hours but
Followed it is in chase thee
Receiving nought but health from
Bent to hear and see again
Boughs which this growing comes
Lets so long but weep to
Slumbers should that word doth
Enjoy'd no defence save in a

(From Team 3's generator)

My team spent a while at the start of the programming time designing a different approach. I was keen to try to generate rhyming verse and I had an idea for how one might go about it.

I had investigated the possibility of detecting rhymes a few years ago when I had the idea for a gamified chat forum. In this forum users would have RPG-style 'classes' and each class would confer special capabilities when users level up. The 'Bard' class would be rewarded for using rhymes and alliteration. I never got as far as creating the forum, but I did research how I might go about detecting rhymes.

Words rhyme if they share their last vowel sound and trailing consonants. "Both" rhymes with "oath" because they share the ending 'oh-th' sound. The spelling is useless to detect rhymes, as words are not spelled phonetically in English: "both" does not rhyme with "moth". It may be a bit more complicated than this to find really satisfying rhymes, but this approach is good enough to start with.

I eventually discovered the CMU Pronouncing Dictionary, which contains US English pronunciations for 133,000 English words.

If we look up the pronunciation of a word in the CMU data and take the last few phonemes (from the last vowel sound onwards), we get a key that corresponds to a unique rhyme. This key allows us to partition words or phrases into groups that all rhyme. "Both" and "oath" might be part of one group, while "moth" and "sloth" would be in another.

Another idea that came up in discussion, suggested by Hans Bolang, was to use lines of existing poetry and remix them rather than generating rhyming gibberish. Nicholas Tollervey immediately suggested we source these lines from Palgrave's Golden Treasury which is available on Project Gutenberg. The Golden Treasury contains thousand of poems that are a perfect input to the algorithm.

Our poem generator, then, simply classifies all the lines in the Golden Treasury by the rhyme key of their last word, and then picks groups of lines to fit a given rhyme scheme.

For example, a poem to fit the AABBA rhyme scheme of limericks:

That fillest England with thy triumphs' fame
I long for a repose that ever is the same.
Bosom'd high in tufted trees,
For so to interpose a little ease,
Tell how by love she purchased blame.

Or rhyming couplets (AA BB CC DD):

My Son, if thou be humbled, poor,
The short and simple annals of the poor.

With uncouth rhymes and shapeless sculpture deck'd,
And now I fear that you expect?

But now my oat proceeds,
Lilies that fester smell far worse than weeds?

And strength by limping sway disabled,
When the soundless earth is muffled!

The last example demonstrates a known bug: we rhyme a word with itself. This could easily be fixed.

All in all I'm pleased with our result. The lines of the Treasury all sound profound and sometimes forlorn and so come together rather well. The lines may have been written by great poets but here they're brought together in new combinations that almost sometimes seem to tell a story.

Our code is available on Github. Photos of the event are up on Flickr.


Devopsdays London November 2013    Posted:

The second Devopsdays of 2013 in London wrapped up this afternoon after a packed schedule of talks, openspaces and socialising. As at the last event in March, there was plenty of food for thought, although as my current contract is primarily dev-centric the practical takeaways for me are the social aspects of process improvement and dev+ops collaboration rather than any specific technologies.

Drawing just a few threads from the notes that I took:

  • Mark Burgess kicked off the talks by suggesting that rather than reacting to faults, it is better to proactively build fault tolerance into your infrastructure and applications. During the ignite talks someone's slide included a relevant quote: "Failure is the inability to handle failure."
  • There were very varied ideas on how to become more collaborative between silos, including an openspace on how to roll out devops and a talk by Jeffrey Fredrick about the psychology and pitfalls of becoming more collaborative. One new idea I took away was the suggestion of making a business case to begin cross-function collaboration by demonstrating problems that stem from a lack of collaboration alongside business goals that can only be tackled through greater collaboration. Indeed, collaboration doesn't just need to be between dev and ops. We should collaborate with HR and IT departments too.
  • I noted several discussions about the future of configuration management. Mark Burgess' talk mentioned the idea of managing infrastructure systems as a whole rather than acting at the level of individual nodes. The view was expressed that solid orchestration should be the backbone of the next generation of configuration managment tools rather than a value-added bonus. However others commented that the orchestration-based tools (Ansible) are not yet on a par with the more node-centric (Puppet and Chef).
  • Some of the openspaces focused on wellbeing. It's easy to forget that technology should be about humans, not just about the cool things we might build. Someone brushed on the idea of a role of "People Leader" looking out for the welfare of team members.

To sum up it was a great conference, and while I'm not currently in a position to contribute new experiences to the technical openspaces, or apply those of others, I always find it very stimulating to be in a group of people who are deeply interested in finding ways to improve their working practices with both technological approaches and by improving "soft skills".

I look forward to the next devopsdays!