Skip to main content

New, free Python Jobs board

Recently we've been on a recruitment drive, trying to fill a number of roles for experienced Python developers. The jobs board has been frozen for a while, so to assist us in meeting new candidates we tossed around ideas for a free, community-run jobs board: it would have to be a static site; it would have to be on Github; employers should be able to list a job just by submitting a pull request. And then Steve went ahead and wrote it!:

Please, please bookmark it, tweet it, reblog it, even if you're not looking for a job right now. It only works if it gets eyeballs. And of course, it's completely free, for everyone, forever. It's by Pythonistas, for Pythonistas.

If you are hiring (and are not a recruitment agent), knock up a Markdown file describing the role you're looking to fill (plus some metadata) and send us a pull request. Instructions are in the Github README.

We'll accept job listings from anywhere in the world. Sure it's not very easy to navigate by region yet. That may be the next job. Perhaps you could help out - pull requests don't have to be limited to new job postings, hint, hint! (Build machinery/templates are in this repo).

On a personal note I want everyone in this community to be employed, happy, and making a comfortable living. Perhaps this site can help make that happen? I'd love to hear your feedback/experiences; use the Disqus gizmos below.

Update 9:45pm UTC: Talking to the team, I discover I'm mistaken: we're actually going to allow recruiters to post job opportunities, providing they do all the work in sending us a pull request and include full relevant details such as the identity of the employer.

Don't name strings (in Python)

A recurring conversation with my collegues has been around whether to extract string literals to the top of a Python module as global constants. I often see people writing Python modules that include hundreds of lines of constant string definitions, like this:



Naturally we want to avoid repetition and improve maintainability, but I've found myself at loggerheads with people who claim that defining literals as constants is invariably best practice. I see it as potentially damaging to the readability of Python code. The problem is that the "maintainability" afforded by these literals being defined in one place comes at a cost of indirection that actually does make the code harder to read, understand and debug.

The older chunks of our application are written in code that looks like this:


(This is what our code typically looks like; please don't pay too much attention to other questionable style choices like naming conventions).

When I joined the team I advocated for inlining the constants, and in some cases I've written scripts to automatically do so:

breakPrice = abs(sourceRow['price'] - targetRow['price']) > BREAK_THRESHOLD
valueMap['breakLabel'] = 'Break' if breakPrice else 'Match'

Most people seem to agree this reads better. The code is shorter; you get richer syntax highlighting that helps you visually parse the code; and without the indirection of having to look up names somewhere else, the semantics stand out immediately.

It's also easier to debug: runtime values will match what the code says.

Finally, it will also run faster, because no name lookups are required. In Python, if you use a (non-local) variable name, it's a string, just like a literal - but it's a string that has to be looked up in some namespace dictionary to find the actual value.

There are a couple of argument that are often raised:

What if we need to change the values?

Sure, there's a bit more work, but code is read more often than it is written. Anyway, do you really want to use names that don't reflect their contents (eg. BREAK_LABEL_BREAK = 'Mismatch')?

Pyflakes will warn if a constant is misspelled / We prefer to fail fast at runtime with a ``NameError`` if a constant is mispelled

There's some weight to this, but we test our code properly, and typos are trivial to fix compared to semantic problems due to misreading of code.

In practice, the names are shorter so there are fewer characters to mistype. I can honestly say it just doesn't catch me out all that often.

On Naming

To take a step back a bit, we should realise that in human semantic terms, when we define a global constant, we are not just moving a definition around in our codebase, we are assigning a name to a value.

For non-string values, that's an incredibly important thing to do. If I see an integer literal in a function (say 30), I can't immediately infer its semantics without analysing the ways in which it is used. If I see POLL_INTERVAL, I can immediately start to guess what the code I'm reading is doing. Textual strings, on the other hand, already include semantics: they are already words.

If we use a literal, the value is not named and therefore remains anonymous. You might want to do this because the value is not important enough to deserve a name, or it is self-descriptive enough that an additional name would add rather than remove complexity.

Once we have this view of naming, we can understand a bit better when it might be valuable to name a value. For example, we might want to name a value to indicate additional semantics. I might even mix and match the literal version and the symbolic name on the same line, because the semantics of the two are different:

JOIN_COLUMN = 'breakLabel'
...['breakLabel', 'name', 'price']).join(JOIN_COLUMN, prev_report)

Rules of thumb

Define a global string constant...

  1. To convey additional semantics.
  2. If it's longer than a couple of words and is used more than once.
  3. If it will be typed often and it's easy to typo (eg. it contains symbols).
  4. For values that are likely to change (say, a reasonable chance of such a requirement arising within the next 6 months).

...and try to minimise the distance between declaring a constant and using it.

Use inline string literals...

  1. If they are unlikely to change (say, little chance of a change within 12 months).
  2. If they are only used once.
  3. If they are used in multiple places, but if they were to change they would probably change in different ways (for example, you have a couple of different dialog boxes that happen to share the same title).
  4. If they are already names for something (a dictionary key, a database column, etc).
  5. If they are already self-descriptive in another way. For example, we can see the comma character ',' is a comma; it doesn't need to be named COMMA.

...and try to minimise repetition through good code structure.

Pyweek 19 / Animations

Pyweek 19 is coming up in October, running from Sunday 5th (00:00 UTC) to Sunday 12th (again, 00:00 UTC). Pyweek is a games programming contest in which you have exactly one week to develop a game in Python, on a theme that is selected by vote and announced at the moment the contest starts.

You don't need to have a great deal of games programming savvy to take part: whatever your level you'll find it fun and informative to program along with other entrants, exchanging your experiences on the challenge blog (link is to the previous challenge).

I'd encourage everyone to take part, either individually or as a team, because there's a lot you can learn, about games programming and software development generally.

In celebration of the upcoming Pyweek, here's a little primer on how to write an animation display system.


An animation is made up of a set of frames, like these from Last Train to Nowhere:


Frames could be 2D sprites or 3D meshes, but for simplicity, let's assume sprites. I always draw my sprites, but you can find sprite sheets on the web if you're less confident in your art skills.

Playing the animation in a game involves displaying one frame after another, at the right place on the screen, so that the action looks smooth.

First of all, let's look at the typical requirements for such a piece of code:

  • It needs to be able to draw multiple copies of the same animation, each at different frames and positions.
  • It needs to be cycle through the frames at a certain rate, which will usually be slower than the rate I'm redrawing the entire game screen.
  • It needs to draw each frame the right places relative to a fixed "insertion point". That is, if the game treats the animation as being "at" a point (x, y), then the frames should be drawn at an offset (x + ox, y + oy) that will cause them to appear in the right place. The offset vector (ox, oy) may vary between frames if the sprites are different sizes.

Another feature I've found useful in the past is to be able to select from a number of different animations to "play" at the appropriate moment. So rather than needing to create a new animation object to play a different sequence of frames, I just instruct the animation to play a different named sequence:'jump')

The chief benefit of this is that I can preconfigure what happens when animations end. Some animations will loop - such as a running animation. Other animations will segue into another animation - for example, shooting might run for a few frames and then return to standing.

The best approach for a system like this is with the flyweight pattern. Using the flyweight pattern we can split our animation system into two classes:

  • The Animation class will contain all the details of an animation: the sprite sequences, per-sprites offsets, frame rates, details of what happens when each sequence finishes, and so on. It's basically a collection of data so it doesn't need many methods.
  • The Animation Instance class will refer to the Animation for all of the data it holds, and include just a few instance variables, things like the current frame and coordinates of the animation on the screen. This will need a lot more code, because your game code will move it around, draw it, and tell it to play different animation sequences.

So the API for our solution might look something like this:

# A sprite and the (x, y) position at which it should be drawn, relative
# to the animation instance
Frame = namedtuple('Frame', 'sprite offset')

# Sentinel value to indicate looping until told to stop, rather than
# switching to a different animation
loop = object()

# A list of of frames plus the name of the next sequence to play when the
# animation ends.
# Set next_sequence to ``loop`` to make the animation loop forever.
Sequence = namedtuple('Sequence', 'frames next_sequence')

class Animation:
    def __init__(self, sequences, frame_rate=DEFAULT_FRAME_RATE):
        self.sequences = sequences
        self.frame_rate = frame_rate

    def create_instance(self, pos=(0, 0)):
        return AnimationInstance(self, pos, direction)

class AnimationInstance:
    def __init__(self, animation, pos=(0, 0)):
        self.animation = animation
        self.pos = pos'default')  # Start default sequence
        clock.register_interval(self.next_frame, self.animation.frame_rate)

    def play(self, sequence_name):
        """Start playing the given sequence at the beginning."""
        self.currentframe = 0
        self.sequence = self.animation.sequences[sequence_name]

    def next_frame(self):
        """Called by the clock to increment the frame number."""
        self.currentframe += 1
        if self.currentframe >= len(self.sequence.frames):
            next = self.sequence.next_sequence
            if next is loop:
                self.currentframe = 0

    def draw(self):
        """Draw the animation at coordinates given by self.pos.

        The current frame will be drawn at its corresponding offset.

        frame = self.sequence.frames[self.currentframe]
        ox, oy = frame.offset
        x, y = self.pos
        frame.sprite.draw(pos=(x + ox, y + oy))

    def destroy(self):

This would allow you to define a fully-animated game character with a number of animation sequences and transitions:

player_character = Animation({
    'default': Sequence(..., loop),
    'running': Sequence(..., loop),
    'jumping': Sequence(..., 'default'),
    'dying': Sequence(..., 'dead'),
    'dead': Sequence(..., loop)

You'd have to find a suitable clock object to ensure next_frame() is called at the right rate. Pyglet has just such a Clock class; Pygame doesn't, but there might be suitable classes in the Pygame Cookbook. (Take care that the next_frame() method gets unscheduled when the animation is destroyed - note the destroy() method above. You'll need to call destroy(), or use weakrefs, to avoid keeping references to dead objects and "leak" memory.)

There's potentially a lot of metadata there about the various animation sequences, offsets and so on. You might want to load it from a JSON file rather than declaring it in Python - that way opens it up to writing some quick tools to create or preview the animations.

Next, you need to create the flyweight instance and play the right animations as your character responds to game events - perhaps you'd wrap a class around it like this:

class Player:
    def __init__(...):
        self.anim = player_character.create_instance()

    def jump(...):'jumping')

    def update(...):
        self.anim.pos = self.pos

    def draw(...):

I've used this basic pattern in a fair number of games. There are lots of different ways to extend it. Your requirements will depend on the kind of game you're writing. Here are a couple of ideas:

  • You might want the flyweight instances to be able to flip or rotate the animation. You might store a direction flag or angle variable.
  • You might want to be able to register arbitrary callback functions, to be called when an animation finishes. This would allow game events to be cued from the animations, which will help to make sure everything syncs up, even as you change the animations.
  • You could use vector graphics of some sort, and interpolate between keyframes rather than selecting just one frame to show. This would offer smoother animation, but you'd need good tooling to create the animations (tools like Spine).
  • I've assumed some sort of global clock object. You might want to link animations to a specific clock that you control. This would allow you to pause the animations when the game is paused, by pausing the animation clock. You could go more advanced and manipulate the speed at which the clock runs to do cool time-bending effects.

Have fun coding, and I hope to see your animations in Pyweek!

Every mock.patch() is a little smell

The Python unittest.mock library (and the standalone mock package for Python 2) includes an extremely handy feature in patch(), which allows easy and safe monkey-patching for the purposes of testing.

You can use patch() as a context manager, class decorator or start and stop it yourself, but the most commonly used form (in my experience) is as a decorator, which monkey patches something for the duration of a single test function. This is typically used to stub out some subsystem to allow testing of a single component in isolation.

Say we wanted a test of an order processing system, then we might write it with patch() like so:

def test_card(charge_card):
    """Placing an order charges the credit card."""
    charge_card.assert_called_with(TEST_ORDER_TOTAL, TEST_CREDIT_CARD)

Here, the test verifies that place_order() would call charge_card() with the right arguments - while ensuring that charge_card is not actually called in the test.

One of my earliest contributions to the Quartz project at Bank of America Merrill Lynch was to recommend and justify the inclusion of mock in the standard distribution of Python that is shipped to every machine (desktop and server) in the organisation. My rationale was that while dependency injection is a better principle, refactoring is a chicken-and-egg process - you need tests in place to support refactoring, and refactoring so that good tests are easy to write. mock.patch() was well-suited to wrap tests around my team's existing, poorly tested codebase, before refactoring, so that we could have tests in place to enable us to refactor.

Dependency Injection

Dependency Injection is a simple technique where you pass in at runtime the key objects a function or class might need, rather than constructing those objects within the code. For example, we could have structured the previous example for dependency injection and passed in (injected) a mock dependency:

def test_card():
    """Placing an order charges the credit card."""
    p = Mock()
    orders.place_order(TEST_ORDER, card_processor=p)
    p.charge_card.assert_called_with(TEST_ORDER_TOTAL, TEST_CREDIT_CARD)

Dependency injection sounds like (and arises from) a strictly object-oriented and Java-ish technique, but it is absolutely a valuable approach in Python too. In Python we generally don't suffer the pain of having to define concrete subclasses of abstract interfaces; we can simply duck-type. But the benefits of dependency injection are great: as we can pass mock objects in testing, we can pass different concrete objects to be used in production, which makes for much more flexible, ravioli-style code. For example, maybe we could define a DeferredCardProcessor to satisfy a slightly different use case later. The possibilities are opened up.

Overuse of patch()

The value of patch() is that is makes tests easier to write. But it doesn't add any extra value to you application. Dependency injection is perhaps a little more expensive at first but adds extra value later, when your code is genuinely better structured.

In the past few days I've seen a couple of examples (via Code Review) of code that has numerous nested patch decorators:

@patch('', lambda: TEST_TRADE_DATA)
@patch('my.project.subledger.getSubledgerData', getTestSubledgerData)
@patch('my.project.metadata.getSubledgerCodes', return_value=TEST_CODES)
def test(self, storeReport):

The lack of useful dependency injection aside, I think this is rather hard to read. And if we look at the code for buildReport() and see it does something like

def buildReport():
    trades = getEnrichedTradeData()
    subledger = getSubledgerData()
    codes = getSubledgerCodes()
    report = Report(trades, subledger, codes)

then we've fallen into a trap of testing that the code does what the code says it does, rather than testing functional behaviour we care about.

Seeing (or writing) code like this should cause you to think that perhaps the code wasn't structured right. To avoid failling into this trap, think of every mock.patch() as a small code smell - a missed opportunity for dependency injection.

If you're struggling to refactor your code to build dependency injection in, a good way to start is to do a CRC Cards session, which will typically produce a design that includes opportunity for dependency injection, especially if you think about testability as one of your design criteria.

Organising Django Projects

This week I watched Alex Gaynor's recent talk entitled "The end of MVC" on Youtube:

To summarise, I think Alex's main point was that developers tend to weave business logic through each of the three MVC components rather than cleanly separating business logic from the other concerns of writing a web application: persistence, templating, request/response parsing and so on.

Given a limited set of framework components to hang things on, developers just bodge logic into that framework wherever it is most convenient, forgetting how to write self-contained, well-organised Python business logic that they might write if the framework was absent.

As a symptom of this, I've interviewed candidates who, when presented with a simple OO business logic exercise, start by writing Django models. Please don't do this.

As someone who has done Django projects at a range of scales over the past 8 years this talk chimed with the lessons I myself have learned about how to organise a project. I've also picked up other people's Django projects here and there, and suffered the pain of poor organisation. To be fair, the situation isn't as bad with Django as I remember it being when I picked up PHP projects 10 years ago: there is always a familiar skeleton provided by the framework that helps to ensure a new code base is moderately accessible.

I tried a long time ago to write up some thoughts on how to organise "controllers" (Django views) but re-reading that blog post again now, I realise I could have written it a lot better.

So to turn my thinking into practical recommendations (Django-specific, adapt as required):

Keep database code in

It stinks when you see cryptic ORM queries like


randomly plastered throughout your application. Keeping these with the model definitions makes them more maintainable.

  • Keep ORM queries in
  • Learn how to define custom managers, queryset subclasses etc as appropriate
  • Encapsulate the object queries you need using these features
  • Don't put business logic here.

Naturally, this generalises to other databases and persistence layers. For example, I did a very successful project with ElasticSearch, and all queries were implemented in; a small amount of user state was persisted in Redis, and there was a module that did nothing more than encapsulating Redis for the operations we needed.

Keep presentational code in templates

  • Write template tags and filters to add power to the template syntax rather than preparing data for presentation elsewhere.
  • Don't put ORM queries in templates. Make sure that your template parameters include all the things that will be necessary for rendering a page.
  • Don't put business logic here (obviously)

Writing better template tags and filters empowers you to change your presentation layer more easily, while maintaining a consistent design language across the site. It really feels great when you just drop a quick tag into a template and the page appears great on the first refresh, because your template tags were organised well to display the kinds of thing your application needs to show.

Override forms to do deep input pre-processing

  • Build on the forms system to push all of your validation code in, not just input conversion and sanitisation.

Forms are an explicit bridge to adapt the calling conventions of HTML to the calling conventions of your business logic, so make sure they fully solve this task.

As an example, your application might want validate a complex system of inputs against a number of database models. I would put all of this logic into the form and have the form return specific database models in cleaned_data.

That's not to say that some abstract parts of the code should not be extracted to live somewhere else, but the big advantage of Django forms is that errors can be faithfully tied back to specific inputs.

Create your own modules for business logic

  • Write your business logic in new modules, don't try to include it with a Django framework component.
  • Try to make it agnostic about persistence, presentation etc. For example, use dependency injection.

Views should just contain glue code

  • What is left in a view function/class should be just glue code to link requests to forms, forms to business logic, outputs to templates.
  • The view then serves as a simple, straightforward description of how a feature is coupled.

Friends don't let friends not use Django

It's difficult for most of us to evaluate the multitude of web frameworks and microframeworks available in Python, to choose the ideal framework for their project. Naturally, proponents of each microframework will pitch in with comments about how much simpler it is than the alternatives, how much less boilerplate there is, or how much faster it runs at scale. Likewise, seasoned users of each microframework will have their favoured approaches for caching, storage, deployment, and the myriad other components needed for web development. They appreciate having the flexibility to choose and integrate those components to their own liking.

The numerous arguments in favour of one framework or another muddy the waters somewhat; in most cases, starting your project with Django will be a safe decision that you will probably not regret.

In bullet points:

  • Newbies should choose Django. It will keep you secure and it will teach you a lot.
  • If you aren't completely sure what direction your project will develop, use Django. You'll soon hit use cases where you'll want some of the extras that are built into Django.
  • If you know what you're doing with Python web libraries, and you have a fairly comprehensive vision of what your app is doing - maybe you know you want high performance in particular areas, or more simplicity for a subset of use cases, then choose your framework on the basis of that knowledge.
  • Do experiment with and learn the other frameworks.

Django ships with good, solid "batteries" included - libraries that fulfil all sorts of requirements that come up time and again in web programming. Moreover, the Django community has produced thousands of packages (5413 at the time of writing) that fill other gaps. That's many thousands more packages than for any other Python framework. In my projects I invariably find myself wanting some of these components at some point down the line, when requirements arise that we'd never forseen (For example, i18n requirements sometimes come up later in a project's life, when you want to roll out to new regions). Certainly, neither Django's batteries included nor the community packages will be suitable in every use case, but they get you up and running quickly.

One argument made in favour of microframeworks is that they offers the flexibility to choose alternative components. I don't think it's particularly difficult to swap out components in Django for alternatives more suited to a specific need - I've done so many times. Django is explictly coupled, unlike, say, Ruby on Rails, so you can simply call alternative functions to use different storages or template engines etc.

Note however, that Django's integrated-by-default components will also be secure-by-default; any home-rolled integrations may not be. For example, Django applications using Django>=1.2 are protected from CSRF attacks by default. Any microframework that doesn't pre-integrate form generation and request dispatcher components won't be able to say that. This is true whether you're integrating things with a microframework or using non-standard components in Django.

There are a couple of other arguments that I've heard:

  • "Django is slower than x" - maybe, but don't optimise prematurely.
  • "Django doesn't scale as well as x" - scale is a much more complicated problem than "use this tool and you'll be alright". Approaches to scaling Django will be comparable to approaches to scaling any other framework.
  • "Django isn't well-suited to client-side HTML5 apps" - this is true, but it isn't particularly bad at them either. Also don't underestimate the numer of additional pages and components needed to productise your core app, even if it's a rich HTML5 app made of static assets and XHR.

I hope this unmuddies the waters a little, especially for beginners. Of course, I'm not advocating anything other than "use the right tool for the job", but until you're sure exactly what the job entails, it doesn't hurt to have a comprehensive toolbox at your disposal.

Design your organisation for Conway's Law

Conway's Law states that:

"Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations."

In other words, the structure of a program or system is likely to evolve to mirror the management structure of the organisation. Even with a couple of teams working on a small project you may end up with various layers of shims and wrappers to make code written by team A interface with team B's preferred way of doing things.

The schism between Dev and Ops teams that is regularly cited in the DevOps movement is another example of Conway's Law in action. The message there is simple: get developers and operations to collaborate on common business goals (eg. frequent, reliable deployments) or else their competing priorities, poorly communicated, will cause friction that risks the business' ability to deliver on any goals. The excellent The Phoenix Project describes several potential communication gaps other than between Dev and Ops, such as between compliance and developers, and information security and operations, and tells a parable about how close cross-team collaboration avoids a series of potential disasters.

There are various solutions to the problem. In the original magazine article in which Melvin Conway introduced the idea he went on to propose a solution:

"This criterion creates problems because the need to communicate at any time depends on the system concept in effect at that time. Because the design which occurs first is almost never the best possible, the prevailing system concept may need to change. Therefore, flexibility of organization is important to effective design."

Valve's Employee Handbook describes how they have fully embraced the flexible teams approach:

"Why does your desk have wheels? Think of those wheels as a symbolic reminder that you should always be considering where you could move yourself to be more valuable. But also think of those wheels as literal wheels, because that’s what they are, and you’ll be able to actually move your desk with them."

A slightly less radical approach is to attempt to create strong but fixed communication pathways between teams. Spotify, for example, has described having chapters and guilds that encourage collaboration across team boundaries on specific issues, skills or disciplines.

You can apparently also beat Conway's Law not by improving cross-team communication but by ensuring your teams are set up to match the architecture of the technology products you want to produce. A leaked memo from a former Amazon employee that contrasts Amazon's structure with Google's mentions that Jeff Bezos mandated that:

All teams will henceforth expose their data and functionality through service interfaces. [...] Teams must communicate with each other through these interfaces.

Bezos is relying on Conway's Law to ensure the technology is structured well rather than neglecting Conway's Law and letting it create an unexpected architecture. This solution doesn't attempt to address Melvin Conway's observation that "the design which occurs first is almost never the best possible", but if you have an established or proven architecture, perhaps something that offers maintainability or security benefits, you may be able to ensure it is more closely followed by removing the flexibility to interact across the architecture boundaries you want to draw.

Learning Rust

The past weekend I've been writing my first programs in Rust, which seems to be in vogue at the moment. Mentions of Rust keep coming up on /r/programming and it's obvious why: Rust is a very exciting community right now, with very active development and new libraries landing all the time.

Perhaps most persuasive was Jared Forsyth's seemingly balanced discussion of of Go vs Rust which points out several interesting features in Rust that are absent in Go.

The syntax itself is reminiscent of Ruby (but with braces). As a Python programmer, I've never found Ruby that interesting a prospect. I've learned the Ruby language enough to write Puppet providers, but Ruby as a language occupies very much the same space as Python and I've never seen the need to take it further.

Rust offers some of the same syntactic ideas as Ruby but on offer is a highly performant natively-compiled language with static but inferred types. Rust's pointer ownership model allows the compiler to automatically allocate and free objects in most cases without needing reference counting or garbage collection (though both of these are available too). You could perhaps describe it as a low-level language with high-level syntax. So this is a very different proposition to Python and Ruby, and rather different to C and C++ too.

What I've learned of Rust so far comes largely from the Rust tutorial and Rust by Example.

My first Rust program was an implementation of a simple, insecure monoalphabetic substitution ciphers (inspired, perhaps, because I've already written a genetic algorithm in Python to crack them). I'm pleased that the code ends up clean and easy to read. For example, a snippet that encodes a single character with ROT13 might be

// A function that takes a char c, and returns its rot13 substitution
fn rot13_char(c: char) -> char {
    let base = match c {
        'a'..'z' => 'a' as u8,  // If c is in [a-z], set base to 97
        'A'..'Z' => 'A' as u8,  // If c is in [A-Z], set base to 65
        _ => return c  // For all other characters, return c

    let ord = c as u8 - base;  // ord is in the range 0-25
    let rot = (ord + 13) % 26;  // rot13
    (rot + base) as char  // convert back to an ASCII character. Note no
                          // semicolon - this is an implicit return

I also spent some time working out how to call Rust code from Python, which would allow me to use both languages to their strength in the same project. It turns out it isn't hard to do this, by compiling a .so in Rust with the #[no_mangle] annotation on the exported methods, and some simple ctypes magic on the Python side. One downside is that so far I've only worked out how to pass strings as c_char_p which is not optimal either for Rust or Python. Sample code is on Bitbucket.

I could see myself using Rust in some projects in the future, though I'm not likely to stop using Python for the vast majority of applications. The Rust language itself is changing week to week and is probably unsuitable for any production development at this time, but for early adopters it's well worth a look.

Code quality flows from good tools

Delivering high quality code stands on two pillars: the developer's wisdom to write code well, and tools to inform and guide the developer towards better practice. Developers are clever, and will make poor tools work, but the benefits of great tools go beyond making the developers' lives easier, and actively promote higher quality code.

Here are my picks for sharp tools that improve not developer productivity but code quality.

Version Control Hosting

Going beyond just the benefits of keeping code in version control, tools like Rhodecode or Gitorious (both self-hostable) or Github or Bitbucket (SaaS) allow developers to create new repositories so that unwieldy projects can be split, or new tools and supporting apps can be kept disentangled from the existing code.

You really don't want developers to be bound by the architectural decisions made long ago and codified in pre-created repositories that are hard to get changed.

Code Review

The best code review tools let you show uncommitted changes to other developers, provide high-quality diffs that make it easy to read and understand the impact of a change, and let the other developers give detailed feedback on multiple sections of code. With this feedback developers can rapidly turn patches around and resubmit until they are perfect. Pre-commit review means that the committed history doesn't contain unnecessary clutter; each commit will do exactly one thing in as good code as can be achieved.

Code review can catch issues such as potential error cases, security weaknesses, duplicated code or missing tests or documentation. However the benefits of code review go far beyond the direct ability to catch problem code. Once working with code review, developers that to get their code through review they should adapt their coding style to be clearer and more legible, and pre-empt the criticisms that will be levelled by the reviewers. Code review also facilitates mutual learning: more people pay more attention to the new features that go into the codebase, and so understand the codebase better; also inexperienced developers get guidance from the more experienced developers about how their code could be improved.

Some hosted version control systems (eg. Github) have code review built in, or there are self-hosted tools such as ReviewBoard or SaaS tools like Atlassian Crucible.

Linters/Code Style checkers

Thee earliest time you can get feedback about code quality to developers is when the code is being edited. (If you're not a Pythonista, you'll have to translate this to your own language of choice.)

Linters like Pyflakes can be run in the editor to highlight potential problems, while style checkers like highlight coding style violations. Many IDEs will ship with something like this, but if yours doesn't then plugins are usually available.

Pyflakes is good at spotting undeclared and unused variables, and produces relatively few false positives; on the occasions I've tried PyLint I found it pedantic and plain wrong whenever anything vaguely magical happens. You can tailor it back with some configuration but in my opinion it's not worth it. is valuable and worth adopting, even if your own coding style is different (though don't change if your project already has a consistent style). The style promoted by pep8 is pleasantly spaced and comfortable to read, and offers a common standard across the Python community. I've found even the controversial 80-column line length limit useful - long lines are less readable, both when coding and when viewing side-by-side diffs in code review or three-way diff tools.

You might also consider docstring coverage checkers (though I've not seen one integrated with an editor yet). I find docstrings invaluable for commenting the intention that the developer had when they wrote the code, so that if you're debugging some strange issue later you can identify bits of code that don't do what the developer thought they did.

With Python's ast module it isn't hard to write a checker for the kind of bad practice that comes up in your own project.

Test Fixture Injection

Test code has a tendency to sprawl, with some developers happy to copy-and-paste code into dozens of test cases, suites and modules. Big test code becomes slow, unloved and hard to maintain. Of course, you can criticise these practices in code review, but it's an uphill challenge unless you can provide really good alternatives.

The kind of test fixtures your application will need will of course depend on your problem domain, but regardless of your requirements it's worth considering how developers can create the data their tests will depend on easily and concisely - without code duplication.

There are off-the-shelf fixture creation frameworks like factory_boy, which focuses on populating ORM fixtures, and integrated frameworks like Django have test fixture management tools.

However where these are not appropriate, it can be valuable to write the tools you need to make succinct, easily maintained tests. In our project we populate our object database using test objects loaded from YAML. You could also do this in-memory objects if the code required to create them is more complicated or slower than just describing the state they will have when created.

Another approach also in use in our project is to create a DSL that allows custom objects to be created succinctly. A core type in our project is an in-memory tabular structure. Creating and populating these requires a few lines of code, but for tests where tables are created statically rather than procedurally we construct them by parsing a triple-quoted string of the form:

| user | (int) karma | description |
| dave | 5           | hoopy frood |
| bob  | 0           | None        |

This kind of approach has not only simplified our tests but has made them faster, more comprehensive, and more stable.

What tools are most effective at promoting code quality in your projects? Leave a comment below.