Don't name strings (in Python)

A recurring conversation with my collegues has been around whether to extract string literals to the top of a Python module as global constants. I often see people writing Python modules that include hundreds of lines of constant string definitions, like this:

...
DASHBOARD_COLUMN_NAME = 'name'
DASHBOARD_COLUMN_PRICE = 'price'

SUMMARY_BREAK_LABEL = 'breakLabel'
BREAK_LABEL_BREAK = 'Break'
BREAK_LABEL_MATCH = 'Match'
...

Naturally we want to avoid repetition and improve maintainability, but I've found myself at loggerheads with people who claim that defining literals as constants is invariably best practice. I see it as potentially damaging to the readability of Python code. The problem is that the "maintainability" afforded by these literals being defined in one place comes at a cost of indirection that actually does make the code harder to read, understand and debug.

The older chunks of our application are written in code that looks like this:

breakPrice = abs(sourceRow[DASHBOARD_COLUMN_PRICE] - targetRow[DASHBOARD_COLUMN_PRICE]) > BREAK_THRESHOLD
breakDict[SUMMARY_BREAK_LABEL] = BREAK_LABEL_BREAK if breakPrice else BREAK_LABEL_MATCH

(This is what our code typically looks like; please don't pay too much attention to other questionable style choices like naming conventions).

When I joined the team I advocated for inlining the constants, and in some cases I've written scripts to automatically do so:

breakPrice = abs(sourceRow['price'] - targetRow['price']) > BREAK_THRESHOLD
valueMap['breakLabel'] = 'Break' if breakPrice else 'Match'

Most people seem to agree this reads better. The code is shorter; you get richer syntax highlighting that helps you visually parse the code; and without the indirection of having to look up names somewhere else, the semantics stand out immediately.

It's also easier to debug: runtime values will match what the code says.

Finally, it will also run faster, because no name lookups are required. In Python, if you use a (non-local) variable name, it's a string, just like a literal - but it's a string that has to be looked up in some namespace dictionary to find the actual value.

There are a couple of argument that are often raised:

What if we need to change the values?

Sure, there's a bit more work, but code is read more often than it is written. Anyway, do you really want to use names that don't reflect their contents (eg. BREAK_LABEL_BREAK = 'Mismatch')?

Pyflakes will warn if a constant is misspelled / We prefer to fail fast at runtime with a ``NameError`` if a constant is mispelled

There's some weight to this, but we test our code properly, and typos are trivial to fix compared to semantic problems due to misreading of code.

In practice, the names are shorter so there are fewer characters to mistype. I can honestly say it just doesn't catch me out all that often.

On Naming

To take a step back a bit, we should realise that in human semantic terms, when we define a global constant, we are not just moving a definition around in our codebase, we are assigning a name to a value.

For non-string values, that's an incredibly important thing to do. If I see an integer literal in a function (say 30), I can't immediately infer its semantics without analysing the ways in which it is used. If I see POLL_INTERVAL, I can immediately start to guess what the code I'm reading is doing. Textual strings, on the other hand, already include semantics: they are already words.

If we use a literal, the value is not named and therefore remains anonymous. You might want to do this because the value is not important enough to deserve a name, or it is self-descriptive enough that an additional name would add rather than remove complexity.

Once we have this view of naming, we can understand a bit better when it might be valuable to name a value. For example, we might want to name a value to indicate additional semantics. I might even mix and match the literal version and the symbolic name on the same line, because the semantics of the two are different:

JOIN_COLUMN = 'breakLabel'
...
    report.select(['breakLabel', 'name', 'price']).join(JOIN_COLUMN, prev_report)

Rules of thumb

Define a global string constant...

  1. To convey additional semantics.
  2. If it's longer than a couple of words and is used more than once.
  3. If it will be typed often and it's easy to typo (eg. it contains symbols).
  4. For values that are likely to change (say, a reasonable chance of such a requirement arising within the next 6 months).

...and try to minimise the distance between declaring a constant and using it.

Use inline string literals...

  1. If they are unlikely to change (say, little chance of a change within 12 months).
  2. If they are only used once.
  3. If they are used in multiple places, but if they were to change they would probably change in different ways (for example, you have a couple of different dialog boxes that happen to share the same title).
  4. If they are already names for something (a dictionary key, a database column, etc).
  5. If they are already self-descriptive in another way. For example, we can see the comma character ',' is a comma; it doesn't need to be named COMMA.

...and try to minimise repetition through good code structure.

Pyweek 19 / Animations

Pyweek 19 is coming up in October, running from Sunday 5th (00:00 UTC) to Sunday 12th (again, 00:00 UTC). Pyweek is a games programming contest in which you have exactly one week to develop a game in Python, on a theme that is selected by vote and announced at the moment the contest starts.

You don't need to have a great deal of games programming savvy to take part: whatever your level you'll find it fun and informative to program along with other entrants, exchanging your experiences on the challenge blog (link is to the previous challenge).

I'd encourage everyone to take part, either individually or as a team, because there's a lot you can learn, about games programming and software development generally.

In celebration of the upcoming Pyweek, here's a little primer on how to write an animation display system.

Animations

An animation is made up of a set of frames, like these from Last Train to Nowhere:

/2014/animation-frames.png

Frames could be 2D sprites or 3D meshes, but for simplicity, let's assume sprites. I always draw my sprites, but you can find sprite sheets on the web if you're less confident in your art skills.

Playing the animation in a game involves displaying one frame after another, at the right place on the screen, so that the action looks smooth.

First of all, let's look at the typical requirements for such a piece of code:

  • It needs to be able to draw multiple copies of the same animation, each at different frames and positions.
  • It needs to be cycle through the frames at a certain rate, which will usually be slower than the rate I'm redrawing the entire game screen.
  • It needs to draw each frame the right places relative to a fixed "insertion point". That is, if the game treats the animation as being "at" a point (x, y), then the frames should be drawn at an offset (x + ox, y + oy) that will cause them to appear in the right place. The offset vector (ox, oy) may vary between frames if the sprites are different sizes.

Another feature I've found useful in the past is to be able to select from a number of different animations to "play" at the appropriate moment. So rather than needing to create a new animation object to play a different sequence of frames, I just instruct the animation to play a different named sequence:

anim.play('jump')

The chief benefit of this is that I can preconfigure what happens when animations end. Some animations will loop - such as a running animation. Other animations will segue into another animation - for example, shooting might run for a few frames and then return to standing.

The best approach for a system like this is with the flyweight pattern. Using the flyweight pattern we can split our animation system into two classes:

  • The Animation class will contain all the details of an animation: the sprite sequences, per-sprites offsets, frame rates, details of what happens when each sequence finishes, and so on. It's basically a collection of data so it doesn't need many methods.
  • The Animation Instance class will refer to the Animation for all of the data it holds, and include just a few instance variables, things like the current frame and coordinates of the animation on the screen. This will need a lot more code, because your game code will move it around, draw it, and tell it to play different animation sequences.

So the API for our solution might look something like this:

# A sprite and the (x, y) position at which it should be drawn, relative
# to the animation instance
Frame = namedtuple('Frame', 'sprite offset')

# Sentinel value to indicate looping until told to stop, rather than
# switching to a different animation
loop = object()

# A list of of frames plus the name of the next sequence to play when the
# animation ends.
#
# Set next_sequence to ``loop`` to make the animation loop forever.
Sequence = namedtuple('Sequence', 'frames next_sequence')


class Animation:
    def __init__(self, sequences, frame_rate=DEFAULT_FRAME_RATE):
        self.sequences = sequences
        self.frame_rate = frame_rate

    def create_instance(self, pos=(0, 0)):
        return AnimationInstance(self, pos, direction)


class AnimationInstance:
    def __init__(self, animation, pos=(0, 0)):
        self.animation = animation
        self.pos = pos
        self.play('default')  # Start default sequence
        clock.register_interval(self.next_frame, self.animation.frame_rate)

    def play(self, sequence_name):
        """Start playing the given sequence at the beginning."""
        self.currentframe = 0
        self.sequence = self.animation.sequences[sequence_name]

    def next_frame(self):
        """Called by the clock to increment the frame number."""
        self.currentframe += 1
        if self.currentframe >= len(self.sequence.frames):
            next = self.sequence.next_sequence
            if next is loop:
                self.currentframe = 0
            else:
                self.play(next)

    def draw(self):
        """Draw the animation at coordinates given by self.pos.

        The current frame will be drawn at its corresponding offset.

        """
        frame = self.sequence.frames[self.currentframe]
        ox, oy = frame.offset
        x, y = self.pos
        frame.sprite.draw(pos=(x + ox, y + oy))

    def destroy(self):
        clock.unregister_interval(self.next_frame)

This would allow you to define a fully-animated game character with a number of animation sequences and transitions:

player_character = Animation({
    'default': Sequence(..., loop),
    'running': Sequence(..., loop),
    'jumping': Sequence(..., 'default'),
    'dying': Sequence(..., 'dead'),
    'dead': Sequence(..., loop)
})

You'd have to find a suitable clock object to ensure next_frame() is called at the right rate. Pyglet has just such a Clock class; Pygame doesn't, but there might be suitable classes in the Pygame Cookbook. (Take care that the next_frame() method gets unscheduled when the animation is destroyed - note the destroy() method above. You'll need to call destroy(), or use weakrefs, to avoid keeping references to dead objects and "leak" memory.)

There's potentially a lot of metadata there about the various animation sequences, offsets and so on. You might want to load it from a JSON file rather than declaring it in Python - that way opens it up to writing some quick tools to create or preview the animations.

Next, you need to create the flyweight instance and play the right animations as your character responds to game events - perhaps you'd wrap a class around it like this:

class Player:
    def __init__(...):
        self.anim = player_character.create_instance()

    def jump(...):
        self.anim.play('jumping')

    def update(...):
        self.anim.pos = self.pos

    def draw(...):
        self.anim.draw()

I've used this basic pattern in a fair number of games. There are lots of different ways to extend it. Your requirements will depend on the kind of game you're writing. Here are a couple of ideas:

  • You might want the flyweight instances to be able to flip or rotate the animation. You might store a direction flag or angle variable.
  • You might want to be able to register arbitrary callback functions, to be called when an animation finishes. This would allow game events to be cued from the animations, which will help to make sure everything syncs up, even as you change the animations.
  • You could use vector graphics of some sort, and interpolate between keyframes rather than selecting just one frame to show. This would offer smoother animation, but you'd need good tooling to create the animations (tools like Spine).
  • I've assumed some sort of global clock object. You might want to link animations to a specific clock that you control. This would allow you to pause the animations when the game is paused, by pausing the animation clock. You could go more advanced and manipulate the speed at which the clock runs to do cool time-bending effects.

Have fun coding, and I hope to see your animations in Pyweek!

Every mock.patch() is a little smell

The Python unittest.mock library (and the standalone mock package for Python 2) includes an extremely handy feature in patch(), which allows easy and safe monkey-patching for the purposes of testing.

You can use patch() as a context manager, class decorator or start and stop it yourself, but the most commonly used form (in my experience) is as a decorator, which monkey patches something for the duration of a single test function. This is typically used to stub out some subsystem to allow testing of a single component in isolation.

Say we wanted a test of an order processing system, then we might write it with patch() like so:

@patch('billing.charge_card')
def test_card(charge_card):
    """Placing an order charges the credit card."""
    orders.place_order(TEST_ORDER)
    charge_card.assert_called_with(TEST_ORDER_TOTAL, TEST_CREDIT_CARD)

Here, the test verifies that place_order() would call charge_card() with the right arguments - while ensuring that charge_card is not actually called in the test.

One of my earliest contributions to the Quartz project at Bank of America Merrill Lynch was to recommend and justify the inclusion of mock in the standard distribution of Python that is shipped to every machine (desktop and server) in the organisation. My rationale was that while dependency injection is a better principle, refactoring is a chicken-and-egg process - you need tests in place to support refactoring, and refactoring so that good tests are easy to write. mock.patch() was well-suited to wrap tests around my team's existing, poorly tested codebase, before refactoring, so that we could have tests in place to enable us to refactor.

Dependency Injection

Dependency Injection is a simple technique where you pass in at runtime the key objects a function or class might need, rather than constructing those objects within the code. For example, we could have structured the previous example for dependency injection and passed in (injected) a mock dependency:

def test_card():
    """Placing an order charges the credit card."""
    p = Mock()
    orders.place_order(TEST_ORDER, card_processor=p)
    p.charge_card.assert_called_with(TEST_ORDER_TOTAL, TEST_CREDIT_CARD)

Dependency injection sounds like (and arises from) a strictly object-oriented and Java-ish technique, but it is absolutely a valuable approach in Python too. In Python we generally don't suffer the pain of having to define concrete subclasses of abstract interfaces; we can simply duck-type. But the benefits of dependency injection are great: as we can pass mock objects in testing, we can pass different concrete objects to be used in production, which makes for much more flexible, ravioli-style code. For example, maybe we could define a DeferredCardProcessor to satisfy a slightly different use case later. The possibilities are opened up.

Overuse of patch()

The value of patch() is that is makes tests easier to write. But it doesn't add any extra value to you application. Dependency injection is perhaps a little more expensive at first but adds extra value later, when your code is genuinely better structured.

In the past few days I've seen a couple of examples (via Code Review) of code that has numerous nested patch decorators:

@patch('my.project.trading.getEnrichedTradeData', lambda: TEST_TRADE_DATA)
@patch('my.project.subledger.getSubledgerData', getTestSubledgerData)
@patch('my.project.metadata.getSubledgerCodes', return_value=TEST_CODES)
@patch('my.project.reports.storeReport')
def test(self, storeReport):
    buildReport()
    storeReport.assert_called_with(EXPECTED_REPORT)

The lack of useful dependency injection aside, I think this is rather hard to read. And if we look at the code for buildReport() and see it does something like

def buildReport():
    trades = getEnrichedTradeData()
    subledger = getSubledgerData()
    codes = getSubledgerCodes()
    report = Report(trades, subledger, codes)
    storeReport(report)

then we've fallen into a trap of testing that the code does what the code says it does, rather than testing functional behaviour we care about.

Seeing (or writing) code like this should cause you to think that perhaps the code wasn't structured right. To avoid failling into this trap, think of every mock.patch() as a small code smell - a missed opportunity for dependency injection.

If you're struggling to refactor your code to build dependency injection in, a good way to start is to do a CRC Cards session, which will typically produce a design that includes opportunity for dependency injection, especially if you think about testability as one of your design criteria.

Organising Django Projects

This week I watched Alex Gaynor's recent talk entitled "The end of MVC" on Youtube:

To summarise, I think Alex's main point was that developers tend to weave business logic through each of the three MVC components rather than cleanly separating business logic from the other concerns of writing a web application: persistence, templating, request/response parsing and so on.

Given a limited set of framework components to hang things on, developers just bodge logic into that framework wherever it is most convenient, forgetting how to write self-contained, well-organised Python business logic that they might write if the framework was absent.

As a symptom of this, I've interviewed candidates who, when presented with a simple OO business logic exercise, start by writing Django models. Please don't do this.

As someone who has done Django projects at a range of scales over the past 8 years this talk chimed with the lessons I myself have learned about how to organise a project. I've also picked up other people's Django projects here and there, and suffered the pain of poor organisation. To be fair, the situation isn't as bad with Django as I remember it being when I picked up PHP projects 10 years ago: there is always a familiar skeleton provided by the framework that helps to ensure a new code base is moderately accessible.

I tried a long time ago to write up some thoughts on how to organise "controllers" (Django views) but re-reading that blog post again now, I realise I could have written it a lot better.

So to turn my thinking into practical recommendations (Django-specific, adapt as required):

Keep database code in models.py

It stinks when you see cryptic ORM queries like

entry.author_set.filter(group__priority__gt=1000)

randomly plastered throughout your application. Keeping these with the model definitions makes them more maintainable.

  • Keep ORM queries in models.py
  • Learn how to define custom managers, queryset subclasses etc as appropriate
  • Encapsulate the object queries you need using these features
  • Don't put business logic here.

Naturally, this generalises to other databases and persistence layers. For example, I did a very successful project with ElasticSearch, and all queries were implemented in searches.py; a small amount of user state was persisted in Redis, and there was a module that did nothing more than encapsulating Redis for the operations we needed.

Keep presentational code in templates

  • Write template tags and filters to add power to the template syntax rather than preparing data for presentation elsewhere.
  • Don't put ORM queries in templates. Make sure that your template parameters include all the things that will be necessary for rendering a page.
  • Don't put business logic here (obviously)

Writing better template tags and filters empowers you to change your presentation layer more easily, while maintaining a consistent design language across the site. It really feels great when you just drop a quick tag into a template and the page appears great on the first refresh, because your template tags were organised well to display the kinds of thing your application needs to show.

Override forms to do deep input pre-processing

  • Build on the forms system to push all of your validation code in forms.py, not just input conversion and sanitisation.

Forms are an explicit bridge to adapt the calling conventions of HTML to the calling conventions of your business logic, so make sure they fully solve this task.

As an example, your application might want validate a complex system of inputs against a number of database models. I would put all of this logic into the form and have the form return specific database models in cleaned_data.

That's not to say that some abstract parts of the code should not be extracted to live somewhere else, but the big advantage of Django forms is that errors can be faithfully tied back to specific inputs.

Create your own modules for business logic

  • Write your business logic in new modules, don't try to include it with a Django framework component.
  • Try to make it agnostic about persistence, presentation etc. For example, use dependency injection.

Views should just contain glue code

  • What is left in a view function/class should be just glue code to link requests to forms, forms to business logic, outputs to templates.
  • The view then serves as a simple, straightforward description of how a feature is coupled.