Last week's London Python Dojo at OneFineStay - Season 5 Episode 3 in case anyone is counting - was on the theme of poetry generators.
The theme was proposed as poetry generators using Markov chains, but as always at the Dojo many of the teams strive to take more "unique" approaches to tackling the problem. Markov chains have been seen many times at Dojos, and produce output that fools only a cursory glance:
Others seem to do not thyHorse with his prescriptions areString sweet self almost thenceGlazed with my way for crimeLay but those so slow theySea and is all things turnsEnjoys it was thy hours butFollowed it is in chase theeReceiving nought but health fromBent to hear and see againBoughs which this growing comesLets so long but weep toSlumbers should that word dothEnjoy'd no defence save in a
(From Team 3's generator)
My team spent a while at the start of the programming time designing a different approach. I was keen to try to generate rhyming verse and I had an idea for how one might go about it.
I had investigated the possibility of detecting rhymes a few years ago when I had the idea for a gamified chat forum. In this forum users would have RPG-style 'classes' and each class would confer special capabilities when users level up. The 'Bard' class would be rewarded for using rhymes and alliteration. I never got as far as creating the forum, but I did research how I might go about detecting rhymes.
Words rhyme if they share their last vowel sound and trailing consonants. "Both" rhymes with "oath" because they share the ending 'oh-th' sound. The spelling is useless to detect rhymes, as words are not spelled phonetically in English: "both" does not rhyme with "moth". It may be a bit more complicated than this to find really satisfying rhymes, but this approach is good enough to start with.
I eventually discovered the CMU Pronouncing Dictionary, which contains US English pronunciations for 133,000 English words.
If we look up the pronunciation of a word in the CMU data and take the last few phonemes (from the last vowel sound onwards), we get a key that corresponds to a unique rhyme. This key allows us to partition words or phrases into groups that all rhyme. "Both" and "oath" might be part of one group, while "moth" and "sloth" would be in another.
Another idea that came up in discussion, suggested by Hans Bolang, was to use lines of existing poetry and remix them rather than generating rhyming gibberish. Nicholas Tollervey immediately suggested we source these lines from Palgrave's Golden Treasury which is available on Project Gutenberg. The Golden Treasury contains thousand of poems that are a perfect input to the algorithm.
Our poem generator, then, simply classifies all the lines in the Golden Treasury by the rhyme key of their last word, and then picks groups of lines to fit a given rhyme scheme.
For example, a poem to fit the AABBA rhyme scheme of limericks:
That fillest England with thy triumphs' fameI long for a repose that ever is the same.Bosom'd high in tufted trees,For so to interpose a little ease,Tell how by love she purchased blame.
Or rhyming couplets (AA BB CC DD):
My Son, if thou be humbled, poor,The short and simple annals of the poor.With uncouth rhymes and shapeless sculpture deck'd,And now I fear that you expect?But now my oat proceeds,Lilies that fester smell far worse than weeds?And strength by limping sway disabled,When the soundless earth is muffled!
The last example demonstrates a known bug: we rhyme a word with itself. This could easily be fixed.
All in all I'm pleased with our result. The lines of the Treasury all sound profound and sometimes forlorn and so come together rather well. The lines may have been written by great poets but here they're brought together in new combinations that almost sometimes seem to tell a story.