Get a Kanban! (or Scrum board)

I continue to be staggered at the effectiveness, as a process management technique, of simply sticking cards representing tasks onto a whiteboard. Whatever your industry or project management methodology, the ability it offers to visualise the flow of work is immensely powerful. It lets us plan the current work and the near future to maximise our productivity.

/2015/kanban-example.png

It's valuable whether you're working on your own or working as a team. When working as a team, it can be used to schedule work among team members. When on your own, it merely helps with clarity of thought (we'll look at why a little later).

Yet this is largely unknown outside of software development. All sorts of industries would benefit from this approach, from farming to law.

Terminology

There's lots of variation in the terminology around kanbans, so let me lay out the terms as I use them.

The idea of a kanban originates in manufacturing in Japan. The word itself means sign board and refers to the board itself. Specific processes built around a kanban are called kanban methodologies. Scrum calls the kanban a "Scrum Board" and naturally there are all manner of other terms and practices for using a similar approach in other methodologies too.

Onto the kanban we need to stick cards representing tasks - small pieces of work that are easy to pick up and get done. Sometimes tasks will relate to bigger projects. Some call these bigger projects epics, and may use additional cards to represent the relationship of tasks to epics.

A backlog is the totality of the work yet to do (and again, terms differ; some practices may exclude work that is already scheduled).

How to run a kanban

First of all, get yourself a real, physical whiteboard. If you can get a magnetic whiteboard, you can stick task cards to it with magnets, which is nice and clean. But otherwise your tasks can be cards stuck to the board with blu-tak, or post-it notes. I favour index cards of a weighty paper density, about the size of your hand when flat. This lets you write large, clear letters on them, which are easier to see from a distance, and they are somewhat resistant to being scuffed as you stack them into a deck and riffle through it.

Next, you need to come up with your backlog. If you're in the middle of a piece of work, you can start by braindumping the current state. Otherwise, get into a quiet room, with the appropriate people if necessary, and a stack of index cards, and write out cards, or break them down, or tear them up, until you have a set of concrete tasks that will get you to your goal. Make sure everyone agrees the cards are correct.

The cards can include all kinds of extra information that will help you plan the work. For example, you might include deadlines or an estimate (in hours, days or your own unit - I like "ideal hours").

/2015/kanban-task-metadata.png

Sometimes tasks are easy to describe on a card but if you were to pick up the card as something to work on, it wouldn't be immediately obvious where to start. These should be broken down into smaller pieces of work during this planning phase. This allows you to see with better granularity how much of the large piece of work is done. I like tasks that are of an appropriate size for each person to do several of them in a week. However, it's OK to break down the card into smaller tasks later if the task is probably going to be something to tackle further in the future.

Now, divide the whiteboard into columns. You will need at least two: something like backlog, and in progress. But you could have many more. Kanban is about flow. Tasks flow through the columns. The flow represents the phases of working on a task. You might start by quoting for work and finish by billing for it. Or you might start by making sure you have all the raw materials required and finish by taking inventory of materials used.

/2015/kanban-flow.png

None of these practices are set in stone - you can select them and reselect them as your practices evolve. For example, you could focus on longer-range planning:

/2015/kanban-schedule.png

So with your whiteboard drawn, you can put your tasks on the board. Naturally many of your cards may not fit, so you can keep your backlog stack somewhere else. Choosing what to put on the board becomes important.

Now, just move the cards to reflect the current state. When a task is done, you update the board and choose the next most valuable task to move forward. You might put initials by a card to indicate who is working on it.

Visit the kanban regularly, as a team. Stop and replan frequently - anything from a couple of times a week up to a couple of times a day - especially when new information becomes available. This might involve pulling cards from the backlog onto the board, writing new cards, tearing up cards that have become redundant, and rearranging the board to reprioritise. Make sure the right people are present every time if possible.

Less frequently you might make a bigger planning effort: pick up all the cards from your backlog pile or column, and sit down again with the relevant people to replan these and reassess all their priorities. Some of the cards may be redundant and some new work may have been identified.

The value of the kanban will then naturally begin to flow:

  • Higher productivity as you see how what you're working on fits into a whole
  • A greater ability to reschedule - for example, to park work in progress to tackle something urgent
  • Team collaboration around tasks that seem to be problematic
  • Estimates of when something might get done or which deadines are at risk

Tips

A physical whiteboard seems to be very important. A lot of the practices don't seem to evolve properly if you use some sort of digital version of a kanban. There are lots of reasons for this. One obvious one is that physical whiteboards offer the ability to annotate the kanban with little hints, initials, or whatever. Another one is that an online whiteboard doesn't beg to be looked at; a physical whiteboard up in your workplace is something to notice frequently, as well as offer a designated place to get away from a screen and plan work.

Naturally, having a physical whiteboard is only possible if your team is not geographically distributed. Geographically distributed teams are challenging for a whole host of reasons, and this is just one. A digital version of a kanban may be a good approach in those cases. Or perhaps frequent photos of a physical whiteboard elsewhere in the world can help to keep things in sync.

Readability from a distance helps get value from your kanban. Write in capital letters because these are more readable from a distance. Use a broad felt pen. Use differently coloured index cards or magnets to convey additional information.

It's somewhat important to ensure that the kanban captures all streams of work. There's a tendency to think "This isn't part of the project we're planning; let's not get distracted by it". But that reduces the value of the kanban in tracking what is actually happening in your workflow. Obviously, different streams of work can be put in a different place on the kanban, or use differently coloured cards.

You can also track obstacles to delivering work on the board. I like to reserve red cards to indicate obstacles. Removing those obstacles may require work!

Why Kanbans work

Kanbans are certainly a form of process visualisation. Enabling you to visualise how tasks are flowing will let you spot problems in the process, such as too much work building up that only a certain team member can do. You can design workarounds to a problem like this also right there on the kanban.

Stepping back from this, the reason I've found having a kanban useful even for solo work may be related to the psychological idea of transactive memory, where we use our memory not as a primary store of information, but as an index over other stores of information, such as those in other people's heads, or on paper. The model of thought is then very much like a database transaction - we might "read" a number of facts from different sources into working memory, generate some new insight, and "write" that insight back to an external source.

By committing our understanding of our backlog of work to index cards, we can free our memories to focus on the task at hand. And when that task is done, we waste no time in switching back to a view of our workflow that can tell us immediately "what's next". Or say we encounter new information that we suspect affects something in the backlog - being able to go straight back to that card and recover exactly how we defined the task turns out to be useful: it allows us to quickly assess the impact of new information to our existing ideas and plans.

The final reason I believe kanbans work so well is that both the kanban and the stack of cards that represent your backlog are artifacts that are constructed collaboratively in a group. Taking some concrete artifact out of a meeting as a record of what was said cuts down a lot on misremembered conclusions afterwards. Some people try to take "action points" out of meetings for the same reason, and then quote them back to everyone by e-mail afterwards. This doesn't seem to work as well - I often find myself thinking "I don't recall agreeing that!" One reason for this is that the record of the action points is not written down for all to see and approve/veto, but a personal list written by the person taking the minutes.

Writing tasks out on index cards in front of people, and reading them out repeatedly or handing them around (or laying them out on the table for people to move around and reorganise - related in principle to CRC Cards), means that everyone gets a chance to internalise or reject the wording on the card.

Similarly, the organisation of kanban is not only a concrete artifact that is modified with other people standing around: it is ever-present to consult and correct. Nobody can have an excuse to leave the kanban in an incorrect state. Thus the kanban is a reliable source of truth.

So whatever your industry, whatever your process methodology, set yourself up a kanban and give it a try. Happy kanbanning!

Pygame Zero 1.1 is out!

Pygame Zero 1.1 is released today! Pygame Zero is a zero-boilerplate games programming framework for education.

This release brings a number of bug fixes and usability enhancements, as well as one or two new features. The full changelog is available in the documentation, but here are a couple of the highlights for me:

  • A new spell checker will point out hook or parameter names that have been misspelled when the program starts. This goes towards helping beginner programmers understand what they have done wrong in cases where normally no feedback would be given.
  • We fixed a really bad bug on Mac OS X where Pygame Zero's window can't be focused when it is installed in a virtualenv.
  • Various contributors have contributed open-source implementations of classic retro games using Pygame Zero. This is an ongoing project, but there are now implementations of Snake, Pong, Lunar Lander and Minesweeper included in the examples/ directory. These can be used as a reference or turned into course material for teaching with Pygame Zero.

Pygame Zero was well-received at Europython. Carrie-Anne Philbin covered Pygame Zero in her keynote; I gave a lightning talk introducing the library and its new spellchecker features; and the sprints introduced several new collaborators to the project, who worked to deliver several of the features and bugfixes that are being released today.

A big thank you to everyone who helped make this release happen!

Pyweek 20 announced, 9th-15th August 2015

The next Pyweek competition has been announced, and will run from 00:00 UTC on Sunday 9th August to 00:00 UTC on Sunday 16th August.

Pyweek is a week-long games programming competition in which participants are challenged to write a game, from scratch, in a week. You can enter as a team or as an individual, and it's a great way to improve your experience with Python and express your creativity at the same time.

If writing a game seems like a daunting challenge, check out Pygame Zero, a zero-boilerplate game framework that can help you get up and running more quickly.

Due to various circumstances this has been delayed somewhat, and is now being announced at somewhat short notice. Be aware that this means that theme voting begins this Sunday, 2nd August.

Pygame Zero, a zero-boilerplate game framework for education

Pygame Zero (docs) is a library I'm releasing today. It's a remastering of Pygame's APIs, intended first and foremost for use in education. It gives teachers a way to teach programming concepts without having to explain things like game loops and event queues (until later).

Pygame Zero was inspired by conversations with teachers at the Pycon UK Education Track. Teachers told us that they need to be able to break course material down into tiny steps that can be spoon-fed to their students: our simplest working Pygame programs might be too complicated for their weakest students to grasp in a single lesson.

They also told us to make it Python 3 - so this is Python 3 only. Pygame on Python 3 works [1] already, though there has been no official release as yet.

A Quick Tour

The idea is that rather than asking kids to write a complete Pygame program including an event loop and resource loading, we give them a runtime (pgzrun) that is the game framework, and let them plug handlers into it.

So your first program might be:

def draw():
    screen.fill('pink')

That's the complete program: screen is a built-in and doesn't have to be imported from anywhere. Then you run it with:

pgzrun my_script.py

Image loading is similarly boilerplate-free; there are a couple of ways to do it but the one we recommend:

# Load images/panda.png (or .jpg) and position its center at (300, 200)
panda = Actor('panda', pos=(300, 200))

def draw():
    screen.clear()
    panda.draw()

More appropriate to sounds and static images, the images/ and sounds/ directories appear as built in "resource package" objects:

def draw():
    screen.blit(images.background, (0, 0))

def on_mouse_down():
    sounds.meow.play()

We use introspection magic to call the event handlers in the script with whatever arguments they are defined to take. Each of the following will "do the right thing":

def on_mouse_down(pos):
    print("You clicked at", pos)
def on_mouse_down(button):
    print("You clicked the", button.name, "mouse button")
def on_mouse_down(button, pos):
    print("You clicked", button, "at", pos)

Batteries Included

Pygame Zero is also useful for more seasoned developers. Though the APIs have been designed to be friendly to novices, they also help you get up-and-running faster with a larger project. The framework includes a weakreffing clock, a property interpolation system, and a built-in integration of Christopher Night's pygame-text. These are the kinds of things you want in your toolkit no matter how expert you are.

It's not hard to "reach behind the curtain" into Pygame proper, when you outgrow the training wheels offered by Pygame Zero.

Portable and distributable

I've discovered in previous Pyweeks that sticking to Pygame as a single dependency is just the simplest way to distribute Python games. OpenGL may offer better performance but users frequently encounter platform and driver bugs. The AVBin used by Pyglet has been buggy in recent Ubuntu versions. So Pygame gives much better confidence that others will be able to play your game exactly as intended.

Pygame Zero has been built to constrain where users store their images, sounds and fonts. It also disallows them being named in a way that will cause platform portability problems (eg. case sensitivity).

Hopefully this will help schoolkids share their Pygame Zero games easily. I'd be interested in pursuing this to make it even easier, for example for users without Python installed, or with a hosting system like a simplified pip/PyPI.

Contributing

I'd welcome your feedback if you are able to install Pygame Zero and try it out. It is, of course, on my Bitbucket if you would like to submit pull requests. If you would like to get involved in the project, writing easy-to-understand demos and tutorials would be much appreciated; or there's a short feature wishlist in the issue tracker.

[1] There's a bit of a showstopper bug in Pygame for Python 3 on some Macs - but not reproducable on the Mac I have to hand right now. If anyone can help get this fixed it would be enormously appreciated.

New, free Python Jobs board

Recently we've been on a recruitment drive, trying to fill a number of roles for experienced Python developers. The Python.org jobs board has been frozen for a while, so to assist us in meeting new candidates we tossed around ideas for a free, community-run jobs board: it would have to be a static site; it would have to be on Github; employers should be able to list a job just by submitting a pull request. And then Steve went ahead and wrote it!:

http://pythonjobs.github.io/

Please, please bookmark it, tweet it, reblog it, even if you're not looking for a job right now. It only works if it gets eyeballs. And of course, it's completely free, for everyone, forever. It's by Pythonistas, for Pythonistas.

If you are hiring (and are not a recruitment agent), knock up a Markdown file describing the role you're looking to fill (plus some metadata) and send us a pull request. Instructions are in the Github README.

We'll accept job listings from anywhere in the world. Sure it's not very easy to navigate by region yet. That may be the next job. Perhaps you could help out - pull requests don't have to be limited to new job postings, hint, hint! (Build machinery/templates are in this repo).

On a personal note I want everyone in this community to be employed, happy, and making a comfortable living. Perhaps this site can help make that happen? I'd love to hear your feedback/experiences; use the Disqus gizmos below.

Update 9:45pm UTC: Talking to the team, I discover I'm mistaken: we're actually going to allow recruiters to post job opportunities, providing they do all the work in sending us a pull request and include full relevant details such as the identity of the employer.

Don't name strings (in Python)

A recurring conversation with my collegues has been around whether to extract string literals to the top of a Python module as global constants. I often see people writing Python modules that include hundreds of lines of constant string definitions, like this:

...
DASHBOARD_COLUMN_NAME = 'name'
DASHBOARD_COLUMN_PRICE = 'price'

SUMMARY_BREAK_LABEL = 'breakLabel'
BREAK_LABEL_BREAK = 'Break'
BREAK_LABEL_MATCH = 'Match'
...

Naturally we want to avoid repetition and improve maintainability, but I've found myself at loggerheads with people who claim that defining literals as constants is invariably best practice. I see it as potentially damaging to the readability of Python code. The problem is that the "maintainability" afforded by these literals being defined in one place comes at a cost of indirection that actually does make the code harder to read, understand and debug.

The older chunks of our application are written in code that looks like this:

breakPrice = abs(sourceRow[DASHBOARD_COLUMN_PRICE] - targetRow[DASHBOARD_COLUMN_PRICE]) > BREAK_THRESHOLD
breakDict[SUMMARY_BREAK_LABEL] = BREAK_LABEL_BREAK if breakPrice else BREAK_LABEL_MATCH

(This is what our code typically looks like; please don't pay too much attention to other questionable style choices like naming conventions).

When I joined the team I advocated for inlining the constants, and in some cases I've written scripts to automatically do so:

breakPrice = abs(sourceRow['price'] - targetRow['price']) > BREAK_THRESHOLD
valueMap['breakLabel'] = 'Break' if breakPrice else 'Match'

Most people seem to agree this reads better. The code is shorter; you get richer syntax highlighting that helps you visually parse the code; and without the indirection of having to look up names somewhere else, the semantics stand out immediately.

It's also easier to debug: runtime values will match what the code says.

Finally, it will also run faster, because no name lookups are required. In Python, if you use a (non-local) variable name, it's a string, just like a literal - but it's a string that has to be looked up in some namespace dictionary to find the actual value.

There are a couple of argument that are often raised:

What if we need to change the values?

Sure, there's a bit more work, but code is read more often than it is written. Anyway, do you really want to use names that don't reflect their contents (eg. BREAK_LABEL_BREAK = 'Mismatch')?

Pyflakes will warn if a constant is misspelled / We prefer to fail fast at runtime with a ``NameError`` if a constant is mispelled

There's some weight to this, but we test our code properly, and typos are trivial to fix compared to semantic problems due to misreading of code.

In practice, the names are shorter so there are fewer characters to mistype. I can honestly say it just doesn't catch me out all that often.

On Naming

To take a step back a bit, we should realise that in human semantic terms, when we define a global constant, we are not just moving a definition around in our codebase, we are assigning a name to a value.

For non-string values, that's an incredibly important thing to do. If I see an integer literal in a function (say 30), I can't immediately infer its semantics without analysing the ways in which it is used. If I see POLL_INTERVAL, I can immediately start to guess what the code I'm reading is doing. Textual strings, on the other hand, already include semantics: they are already words.

If we use a literal, the value is not named and therefore remains anonymous. You might want to do this because the value is not important enough to deserve a name, or it is self-descriptive enough that an additional name would add rather than remove complexity.

Once we have this view of naming, we can understand a bit better when it might be valuable to name a value. For example, we might want to name a value to indicate additional semantics. I might even mix and match the literal version and the symbolic name on the same line, because the semantics of the two are different:

JOIN_COLUMN = 'breakLabel'
...
    report.select(['breakLabel', 'name', 'price']).join(JOIN_COLUMN, prev_report)

Rules of thumb

Define a global string constant...

  1. To convey additional semantics.
  2. If it's longer than a couple of words and is used more than once.
  3. If it will be typed often and it's easy to typo (eg. it contains symbols).
  4. For values that are likely to change (say, a reasonable chance of such a requirement arising within the next 6 months).

...and try to minimise the distance between declaring a constant and using it.

Use inline string literals...

  1. If they are unlikely to change (say, little chance of a change within 12 months).
  2. If they are only used once.
  3. If they are used in multiple places, but if they were to change they would probably change in different ways (for example, you have a couple of different dialog boxes that happen to share the same title).
  4. If they are already names for something (a dictionary key, a database column, etc).
  5. If they are already self-descriptive in another way. For example, we can see the comma character ',' is a comma; it doesn't need to be named COMMA.

...and try to minimise repetition through good code structure.

Pyweek 19 / Animations

Pyweek 19 is coming up in October, running from Sunday 5th (00:00 UTC) to Sunday 12th (again, 00:00 UTC). Pyweek is a games programming contest in which you have exactly one week to develop a game in Python, on a theme that is selected by vote and announced at the moment the contest starts.

You don't need to have a great deal of games programming savvy to take part: whatever your level you'll find it fun and informative to program along with other entrants, exchanging your experiences on the challenge blog (link is to the previous challenge).

I'd encourage everyone to take part, either individually or as a team, because there's a lot you can learn, about games programming and software development generally.

In celebration of the upcoming Pyweek, here's a little primer on how to write an animation display system.

Animations

An animation is made up of a set of frames, like these from Last Train to Nowhere:

/2014/animation-frames.png

Frames could be 2D sprites or 3D meshes, but for simplicity, let's assume sprites. I always draw my sprites, but you can find sprite sheets on the web if you're less confident in your art skills.

Playing the animation in a game involves displaying one frame after another, at the right place on the screen, so that the action looks smooth.

First of all, let's look at the typical requirements for such a piece of code:

  • It needs to be able to draw multiple copies of the same animation, each at different frames and positions.
  • It needs to be cycle through the frames at a certain rate, which will usually be slower than the rate I'm redrawing the entire game screen.
  • It needs to draw each frame the right places relative to a fixed "insertion point". That is, if the game treats the animation as being "at" a point (x, y), then the frames should be drawn at an offset (x + ox, y + oy) that will cause them to appear in the right place. The offset vector (ox, oy) may vary between frames if the sprites are different sizes.

Another feature I've found useful in the past is to be able to select from a number of different animations to "play" at the appropriate moment. So rather than needing to create a new animation object to play a different sequence of frames, I just instruct the animation to play a different named sequence:

anim.play('jump')

The chief benefit of this is that I can preconfigure what happens when animations end. Some animations will loop - such as a running animation. Other animations will segue into another animation - for example, shooting might run for a few frames and then return to standing.

The best approach for a system like this is with the flyweight pattern. Using the flyweight pattern we can split our animation system into two classes:

  • The Animation class will contain all the details of an animation: the sprite sequences, per-sprites offsets, frame rates, details of what happens when each sequence finishes, and so on. It's basically a collection of data so it doesn't need many methods.
  • The Animation Instance class will refer to the Animation for all of the data it holds, and include just a few instance variables, things like the current frame and coordinates of the animation on the screen. This will need a lot more code, because your game code will move it around, draw it, and tell it to play different animation sequences.

So the API for our solution might look something like this:

# A sprite and the (x, y) position at which it should be drawn, relative
# to the animation instance
Frame = namedtuple('Frame', 'sprite offset')

# Sentinel value to indicate looping until told to stop, rather than
# switching to a different animation
loop = object()

# A list of of frames plus the name of the next sequence to play when the
# animation ends.
#
# Set next_sequence to ``loop`` to make the animation loop forever.
Sequence = namedtuple('Sequence', 'frames next_sequence')


class Animation:
    def __init__(self, sequences, frame_rate=DEFAULT_FRAME_RATE):
        self.sequences = sequences
        self.frame_rate = frame_rate

    def create_instance(self, pos=(0, 0)):
        return AnimationInstance(self, pos, direction)


class AnimationInstance:
    def __init__(self, animation, pos=(0, 0)):
        self.animation = animation
        self.pos = pos
        self.play('default')  # Start default sequence
        clock.register_interval(self.next_frame, self.animation.frame_rate)

    def play(self, sequence_name):
        """Start playing the given sequence at the beginning."""
        self.currentframe = 0
        self.sequence = self.animation.sequences[sequence_name]

    def next_frame(self):
        """Called by the clock to increment the frame number."""
        self.currentframe += 1
        if self.currentframe >= len(self.sequence.frames):
            next = self.sequence.next_sequence
            if next is loop:
                self.currentframe = 0
            else:
                self.play(next)

    def draw(self):
        """Draw the animation at coordinates given by self.pos.

        The current frame will be drawn at its corresponding offset.

        """
        frame = self.sequence.frames[self.currentframe]
        ox, oy = frame.offset
        x, y = self.pos
        frame.sprite.draw(pos=(x + ox, y + oy))

    def destroy(self):
        clock.unregister_interval(self.next_frame)

This would allow you to define a fully-animated game character with a number of animation sequences and transitions:

player_character = Animation({
    'default': Sequence(..., loop),
    'running': Sequence(..., loop),
    'jumping': Sequence(..., 'default'),
    'dying': Sequence(..., 'dead'),
    'dead': Sequence(..., loop)
})

You'd have to find a suitable clock object to ensure next_frame() is called at the right rate. Pyglet has just such a Clock class; Pygame doesn't, but there might be suitable classes in the Pygame Cookbook. (Take care that the next_frame() method gets unscheduled when the animation is destroyed - note the destroy() method above. You'll need to call destroy(), or use weakrefs, to avoid keeping references to dead objects and "leak" memory.)

There's potentially a lot of metadata there about the various animation sequences, offsets and so on. You might want to load it from a JSON file rather than declaring it in Python - that way opens it up to writing some quick tools to create or preview the animations.

Next, you need to create the flyweight instance and play the right animations as your character responds to game events - perhaps you'd wrap a class around it like this:

class Player:
    def __init__(...):
        self.anim = player_character.create_instance()

    def jump(...):
        self.anim.play('jumping')

    def update(...):
        self.anim.pos = self.pos

    def draw(...):
        self.anim.draw()

I've used this basic pattern in a fair number of games. There are lots of different ways to extend it. Your requirements will depend on the kind of game you're writing. Here are a couple of ideas:

  • You might want the flyweight instances to be able to flip or rotate the animation. You might store a direction flag or angle variable.
  • You might want to be able to register arbitrary callback functions, to be called when an animation finishes. This would allow game events to be cued from the animations, which will help to make sure everything syncs up, even as you change the animations.
  • You could use vector graphics of some sort, and interpolate between keyframes rather than selecting just one frame to show. This would offer smoother animation, but you'd need good tooling to create the animations (tools like Spine).
  • I've assumed some sort of global clock object. You might want to link animations to a specific clock that you control. This would allow you to pause the animations when the game is paused, by pausing the animation clock. You could go more advanced and manipulate the speed at which the clock runs to do cool time-bending effects.

Have fun coding, and I hope to see your animations in Pyweek!

Every mock.patch() is a little smell

The Python unittest.mock library (and the standalone mock package for Python 2) includes an extremely handy feature in patch(), which allows easy and safe monkey-patching for the purposes of testing.

You can use patch() as a context manager, class decorator or start and stop it yourself, but the most commonly used form (in my experience) is as a decorator, which monkey patches something for the duration of a single test function. This is typically used to stub out some subsystem to allow testing of a single component in isolation.

Say we wanted a test of an order processing system, then we might write it with patch() like so:

@patch('billing.charge_card')
def test_card(charge_card):
    """Placing an order charges the credit card."""
    orders.place_order(TEST_ORDER)
    charge_card.assert_called_with(TEST_ORDER_TOTAL, TEST_CREDIT_CARD)

Here, the test verifies that place_order() would call charge_card() with the right arguments - while ensuring that charge_card is not actually called in the test.

One of my earliest contributions to the Quartz project at Bank of America Merrill Lynch was to recommend and justify the inclusion of mock in the standard distribution of Python that is shipped to every machine (desktop and server) in the organisation. My rationale was that while dependency injection is a better principle, refactoring is a chicken-and-egg process - you need tests in place to support refactoring, and refactoring so that good tests are easy to write. mock.patch() was well-suited to wrap tests around my team's existing, poorly tested codebase, before refactoring, so that we could have tests in place to enable us to refactor.

Dependency Injection

Dependency Injection is a simple technique where you pass in at runtime the key objects a function or class might need, rather than constructing those objects within the code. For example, we could have structured the previous example for dependency injection and passed in (injected) a mock dependency:

def test_card():
    """Placing an order charges the credit card."""
    p = Mock()
    orders.place_order(TEST_ORDER, card_processor=p)
    p.charge_card.assert_called_with(TEST_ORDER_TOTAL, TEST_CREDIT_CARD)

Dependency injection sounds like (and arises from) a strictly object-oriented and Java-ish technique, but it is absolutely a valuable approach in Python too. In Python we generally don't suffer the pain of having to define concrete subclasses of abstract interfaces; we can simply duck-type. But the benefits of dependency injection are great: as we can pass mock objects in testing, we can pass different concrete objects to be used in production, which makes for much more flexible, ravioli-style code. For example, maybe we could define a DeferredCardProcessor to satisfy a slightly different use case later. The possibilities are opened up.

Overuse of patch()

The value of patch() is that is makes tests easier to write. But it doesn't add any extra value to you application. Dependency injection is perhaps a little more expensive at first but adds extra value later, when your code is genuinely better structured.

In the past few days I've seen a couple of examples (via Code Review) of code that has numerous nested patch decorators:

@patch('my.project.trading.getEnrichedTradeData', lambda: TEST_TRADE_DATA)
@patch('my.project.subledger.getSubledgerData', getTestSubledgerData)
@patch('my.project.metadata.getSubledgerCodes', return_value=TEST_CODES)
@patch('my.project.reports.storeReport')
def test(self, storeReport):
    buildReport()
    storeReport.assert_called_with(EXPECTED_REPORT)

The lack of useful dependency injection aside, I think this is rather hard to read. And if we look at the code for buildReport() and see it does something like

def buildReport():
    trades = getEnrichedTradeData()
    subledger = getSubledgerData()
    codes = getSubledgerCodes()
    report = Report(trades, subledger, codes)
    storeReport(report)

then we've fallen into a trap of testing that the code does what the code says it does, rather than testing functional behaviour we care about.

Seeing (or writing) code like this should cause you to think that perhaps the code wasn't structured right. To avoid failling into this trap, think of every mock.patch() as a small code smell - a missed opportunity for dependency injection.

If you're struggling to refactor your code to build dependency injection in, a good way to start is to do a CRC Cards session, which will typically produce a design that includes opportunity for dependency injection, especially if you think about testability as one of your design criteria.

Organising Django Projects

This week I watched Alex Gaynor's recent talk entitled "The end of MVC" on Youtube:

To summarise, I think Alex's main point was that developers tend to weave business logic through each of the three MVC components rather than cleanly separating business logic from the other concerns of writing a web application: persistence, templating, request/response parsing and so on.

Given a limited set of framework components to hang things on, developers just bodge logic into that framework wherever it is most convenient, forgetting how to write self-contained, well-organised Python business logic that they might write if the framework was absent.

As a symptom of this, I've interviewed candidates who, when presented with a simple OO business logic exercise, start by writing Django models. Please don't do this.

As someone who has done Django projects at a range of scales over the past 8 years this talk chimed with the lessons I myself have learned about how to organise a project. I've also picked up other people's Django projects here and there, and suffered the pain of poor organisation. To be fair, the situation isn't as bad with Django as I remember it being when I picked up PHP projects 10 years ago: there is always a familiar skeleton provided by the framework that helps to ensure a new code base is moderately accessible.

I tried a long time ago to write up some thoughts on how to organise "controllers" (Django views) but re-reading that blog post again now, I realise I could have written it a lot better.

So to turn my thinking into practical recommendations (Django-specific, adapt as required):

Keep database code in models.py

It stinks when you see cryptic ORM queries like

entry.author_set.filter(group__priority__gt=1000)

randomly plastered throughout your application. Keeping these with the model definitions makes them more maintainable.

  • Keep ORM queries in models.py
  • Learn how to define custom managers, queryset subclasses etc as appropriate
  • Encapsulate the object queries you need using these features
  • Don't put business logic here.

Naturally, this generalises to other databases and persistence layers. For example, I did a very successful project with ElasticSearch, and all queries were implemented in searches.py; a small amount of user state was persisted in Redis, and there was a module that did nothing more than encapsulating Redis for the operations we needed.

Keep presentational code in templates

  • Write template tags and filters to add power to the template syntax rather than preparing data for presentation elsewhere.
  • Don't put ORM queries in templates. Make sure that your template parameters include all the things that will be necessary for rendering a page.
  • Don't put business logic here (obviously)

Writing better template tags and filters empowers you to change your presentation layer more easily, while maintaining a consistent design language across the site. It really feels great when you just drop a quick tag into a template and the page appears great on the first refresh, because your template tags were organised well to display the kinds of thing your application needs to show.

Override forms to do deep input pre-processing

  • Build on the forms system to push all of your validation code in forms.py, not just input conversion and sanitisation.

Forms are an explicit bridge to adapt the calling conventions of HTML to the calling conventions of your business logic, so make sure they fully solve this task.

As an example, your application might want validate a complex system of inputs against a number of database models. I would put all of this logic into the form and have the form return specific database models in cleaned_data.

That's not to say that some abstract parts of the code should not be extracted to live somewhere else, but the big advantage of Django forms is that errors can be faithfully tied back to specific inputs.

Create your own modules for business logic

  • Write your business logic in new modules, don't try to include it with a Django framework component.
  • Try to make it agnostic about persistence, presentation etc. For example, use dependency injection.

Views should just contain glue code

  • What is left in a view function/class should be just glue code to link requests to forms, forms to business logic, outputs to templates.
  • The view then serves as a simple, straightforward description of how a feature is coupled.