Generating Random Sentences
I very recently came across a Python package called Natural Language Toolkit (NLTK). It’s an impressive and extensive library of classes and functions to process the texts from various natural languages of the world. From what I can gather, it seems to be among the top three most used toolkits for natural language processing and linguistic analysis of various kinds worldwide. Strangely enough, it also turns out that the people developing NLTK are based only couple of doors down the corridor from my lab here at The University of Melbourne! I’ve been living under a rock :P
Anyway, during lunch time today I had a little fiddle with NLTK, and it is ridiculously easy to pick up (well, for me that is, since I know a bit about Python already). I managed to verify the output of my own dinky little program for parsing texts, counting words and building radix trees. They were consistent with the output of NLTK in each case - Yee haa!
Bonus part is that NLTK makes it easy to do n-gram analysis, which are just sequences of N words and their frequency distributions. Once you have such distributions, you can sample from them and start generating random sentences in the flavour of the underlying text that you used for building the distributions. So, in the space of mere minutes, I managed to analyse Hamlet and Ulysses and start generating random sentences. The results are utterly spooky! Here’s a sample of randomly generated sentences based on Hamlet.
The Tragedy of Hamlet sits smiling to my sick soul , freeze thy young
blood , make you from Wittenberg , Horatio , as sin ‘s true nature is
fine , it argues an act that blurs the grace and blush of modesty ,
calls virtue hypocrite , takes off his crown , kisses it , where he
goes to heaven ; send hither to see your father ‘s signet in my
imagination it is not his own too much of water hast thou done ? first
priest her obsequies have been my Hammlet ‘s Hamlet give the first dost
These are totally random and yet they appear Shakespearean in essence, ignoring the obvious semantic nonsense. Check out a randomly generated sample of sentences based on Ulysses below.
Episode 8 - Lestrygonians PINEAPPLE ROCK , LEMON PLATT , BUTTER
SCOTCH. A SUGARSTICKY GIRL shovelling scoopfuls of creams for a man
looks like with figures juggling. Always find out so long as possible
of proof is with tiny hands. Weeny bones. Almost see them with him.
Bloom stops , points a mailed hand against the Rt. Hon. Mr Justice
Fitzgibbon , John Henry Menton ‘s office. He ‘s stinking with money .
BLOOM I saw her at the grand stand while the land of Egypt to hanker
after. Wallow in it. Is she , Simon , with statements
See what I mean! This is thoroughly entertaining to read since I can just keep generating more and more new samples! But I better stop here for now. Looks like a world of possibilities is now open before me in this endeavour :)