Alif Wahid

Posts tagged with "design"

Jan 3

Just killing time by trying my hand at digital typography design using Dia :P I haven’t found something quite this meditative in a looooong time! The top design is my own work using simple Bezier curves (I totally lost track of time while shearing, rotating, tessellating and aligning these trivial lines). The bottom one is a calligraphic font that seems to be randomly installed in my Ubuntu Linux machine. It’s a certain kind of mystical experience, just staring at calligraphic curves, which I can’t quite put my finger on :|

Designboom is one dangerous website! Not recommended if you’ve just had a coffee and are fully awake at the moment, since the ideas and images are nothing short of a psychedelic cocktail that’ll mix potently with any caffeine induced stimulation :P

HTML5 looks like the way ahead…

The concept of a dynamic website has been around for a decade or so. By that I mean a page of various contents that allow the user to interact in a meaningful way, such as drawing. But with the expected new standard HTML5,  websites have a chance finally to do some pretty special things. If you’re using a modern browser like Chrome v15 then try out this page that contains a set of presentation slides written only in HTML5, which you can flick through in the browser and see some of the possibilities in the crude form that they exist right now (my favourite is the speech-to-text on slide-23 that worked for me!). Unfortunately only Chrome worked with all the slides (except the WebGL 3D graphics, that only worked for me on Firefox v8) since it’s a little bit ahead in terms of demonstrating HTML5 capabilities compared to other browsers.

Among the new things is that there’ll be no need for Flash content anymore, thanks to the ability to embed video and audio files directly in HTML5 using the new <video> and <audio> tags. So hopefully once the browsers get their act together in implementing the full HTML5 standard (once it is actually standardised :P ), then we can all say goodbye to the crash loving Flash plugin, forever! All I can say is that it will be nice to see the end of Adobe at the same time as well :P Main catch is that the videos should be encoded in the WebM format and audios should be encoded as MP3 files. There are royalty-free codecs available for both formats now, so it’s not a major barrier.

What excites me most are the new canvas element that lets you draw interactive 2D graphics right inside the browser and the offline storage with file access capability. These features mean that it will be possible, in principle at least, to create a website that provides most of the functionalities of a vector drawing application like Illustrator or Inkscape. But of course, that’s thinking in the old fashioned way. There’s no reason why a blog page can’t become an interactive doodle pad to form a neat community of users that exchange ideas in real-time! There’s even a new feature in HTML5 that allows for creating web-sockets (programming buzzword) so that peer-to-peer communication can be set up. What that means is that a bunch of people accessing a site can actually interact among themselves as well as interacting with the site thanks to this idea of web-sockets.

There are some neat extensions to CSS for formatting and laying out a page. One of the things you’ll be able to do is create simple dividers like horizontal and vertical boxes that mostly manage their own screen real-estate and that of their contents while still playing nice with each other. So it means that a page can, in principle again, re-scale automatically from small screens to large screens, or landscape to portrait, or even when the browser window is arbitrarily reshaped by the user. Currently what happens is that the contents (text, images, videos etc.) just get covered up by scroll-bars due to a rendering concept called “clipping”. So with automatic re-scaling geometry, the hope is that the content will take care of itself in different resolutions and different shapes without requiring scroll bars or zooming in/out - that can only be far less hassle for the user!

Anyway, these are just scratching the tip of an iceberg. I’m very excited actually! HTML5 is still under development and not yet standardised. So the browsers are all inconsistent and only Chrome v15 seems to be the most error free at the moment (but then again, you can’t blame the rest since there’s no definitive standard as yet!). I think I’m going to stop bothering with any kind of GUI programming in the future and only experiment with HTML5 (and its descendants).

Here’s another fascinating, though rather brief, look at how Leica M9-P cameras are made.

(Source: vimeo.com)

Fascinating look at how Leica lenses are made.

Interesting blog called "Seeing Data"

I came across this blog by Chris McDowall, an Informatics Researcher from Landcare Research NZ. There are some neat posts about various ways of visualising data using Python, my favourite little pet ;-)

One post in particular struck me as very novel, where McDowall shows how to visualise comment/response threads in forums and blogs. His technique uses a tree diagram to manifestly represent the comment/response cycles from posters in order to allow quickly browsing the whole conversation in a neat little context. Anyway, check out his diagrams to see what I mean rather than reading my waffle :P

Visualising Vocabularies

Have you ever seen someone’s vocabulary? We can hear what others speak, and read what they write; but I’ve often thought about what someone’s vocabulary might look like. May be quite colourful? Or perhaps, a little dense in some way? Over the years, I’ve learned that my own vocabulary is, in fact, quite small. It’s a cause for some sadness, seeing as it stems from my rigidly logical style of writing. Sometimes it reminded me of a musical analogy, whereby a small vocabulary is akin to a short vocal range for a singer. Anyway, recently I thought of trying to visualise vocabularies in some way, and this post is basically going to blabber on about what I’ve come up with, so far.

The common currency definition of the term vocabulary is simply the full set of words either spoken or written by a person. This leaves out anything to do with frequencies, as to how often particular words may be used by a person. But I guess everyone sort of intuitively understands that about vocabulary anyway. The interesting question then becomes, how do you paint a picture of this set of unique words used by a person? This can’t be the same as a word cloud since that’s conditional upon frequencies, and results in a corresponding scaling of the illustrated words’ sizes. What about visual ideas of colour, texture and density?

Well, I’m particularly interested in the idea of density, which I associate with how broad and how deep a person’s use of words beginning with various letters are. For example, I tend to write overwhelmingly in straightforward prose, so my use of words beginning with ‘k’, ‘j’ and ‘x’ are quite rare (ignoring the fact that there are not that many words that begin with those letters). Even when I do use a k-word, for instance, there wouldn’t be much density to speak of in terms of the number of different k-words that I might have used, or the longest for that matter. Consequently, once you start to describe a person’s vocabulary by this idea of rooting words to their initial letters and then seeing how they branch out into a tree, you begin to get a tangible structure that manifestly represents that person’s vocabulary.

In fact, this idea is not new. In the realm of computer programming, tree data structures are used for all kinds of text processing and manipulation. And it certainly comes to me as no surprise that there is a beautifully rich and diverse set of techniques for visualising vocabularies; unfortunately you don’t see these tools brought out into the open often enough. So I’m going to try and demonstrate one such technique that is available, and paint pictures of the vocabularies of Shakespeare and Joyce in order to show you the vast differences in their respective use of words.

This technique is called Radix Trees, or sometimes PATRICIA Tries, in the computer programming world. The name is basically self-evident as to its purpose and function. It is an abstract tree structure that represents strings by their common prefixes. Strings can be any arbitrary sequence of characters but I’m only going to use words. So a Radix Tree is something that organises words by pulling out lexicographic prefixes that are common - just like how a dictionary is organised. In other words, it is a lexicographic ordering of words so that the common prefixes become easily visible and shared across words.

That’s enough words spent describing something so abstract; now it’s time for a diagram! Below is the plot that I copied from the Wikipedia page on Radix Trees (it comes with a creative commons license, CC-BY v2.5). The example in this diagram uses seven r-words: romane, romanus, romulus, rubens, ruber, rubicon and rubicundus, in order to build a radix tree that contains 13 individual nodes and shares various prefixes among the words stored within it. One way to read this radix tree is to start at the bottom nodes that are numbered, which are also called leaves, and then work your way up the tree by following the parent of each node. Along the way, you simply keep prepending each label that you come across until you reach the root of the tree and retrieve the original word. You can also start at the root and work your way down each child node by appending the labels. I personally just like to go up a tree, not down. Have a go with this diagram tracing out the seven words that are stored, and convince yourself that there’s no false magic in this. It’s quite fun actually.

The planar diagram corresponding to a radix tree is, in and of itself, a vivid visualisation of vocabularies. In this toy example, the vocabulary only contains seven words and the diagram is at least sketchable by hand, if not neatly drawable by a program. But the problem is when you want to visualise thousands of words for each letter in the alphabet. Imagine the number of branches that you could have in such a radix tree, and just how deep the number of levels might actually go! There is not enough space in a 2D plane to put all of the little nodes and connect them with edges and create something resembling a tree. You could do it if hard pressed, but it would look like a jumbled black box from all of the overlapping lines and dots. Unfortunately, this is a problem in any kind of data visualisation where scalability is the major bottle neck because there are far too many dimensions to fit into a 2D or 3D visual space.

So what can we do? Well, we have to rely on statistics to reduce some of these dimensions and plot the relationships between key parameters while ignoring the rest. There’s no other way of reducing dimensions - you have to throw information away to make room! The trick is to throw away the redundant information that does not add much value to the nature of the underlying tree structure. So in the case of a radix tree, what we have is an easy to see relationship between the depth of the tree and the breadth of the tree. This is because we can trace the level of a node down the tree (i.e., its depth), and for any given depth in the tree we can count the number of nodes that are actually at that level, (i.e., the breadth of the tree). This gives us a 2D relationship for any given tree. We can then extend this across many radix trees that are each rooted at a specific letter of the alphabet. In the same way that the diagram above is rooted at ‘r’, you can have rooted radix trees for any of the other letters depending on what words are present in a vocabulary. This would extend the 2D depth-breadth relationship to 3D and still make it easily plottable in a graph for visualisation.

And that’s what I did, by taking the set of unique words from Hamlet and Ulysses in order to build their corresponding radix trees (that is, one tree for each letter of the alphabet for each piece of text using all of the unique words found). I then computed the distribution of the depth and breadth of these trees and plotted them as a density map (sometimes called a heat map). These figures are shown below for Ulysses and Hamlet respectively. The x-axis is the tree depth parameter while the y-axis is the root letter that identifies a tree. The density of the colour mapping corresponds to the breadth of the trees at the given level of depth. As you can see, deep blue colours are close to zero while darkish red are close to the saturation point of the data. Hence, this luminosity of the colours is normalised to the inherent dynamic range of the larger data set out of the two texts, which is of course Ulysses.

Personally, I find these plots to be quite rich with information, as well as being colourful pictures that let me see what shapes the vocabularies of these two giants actually take on. For any given letter in the alphabet, you simply trace across horizontally and get a measurement of the depth-breadth relationship corresponding to the underlying radix tree. If a tree is quite bushy with lots of branches at a given depth, then that means that there are lots of unique words without common prefixes in that region (i.e., large vocabulary). Similarly, if a tree is naturally pruned for any given depth, then that means that there are lots of common prefixes shared among words, which in turn means that there are few unique words in that region (i.e., small vocabulary). The combined effect in 3D space is the manifestation of dense blobs for certain groups of unique words and their prefixes. Notice how the radix trees corresponding to ‘j’, ‘k’ and ‘x’ are not dense at all in the Ulysses plot. Alternatively, there’s a dense blob in the region of ‘a’, b’, ‘c’ and ‘d’ collectively.

Can you see how Joyce is so much broader and denser in his vocabulary than Shakespeare (at least in this partial comparison with Hamlet, as opposed to ALL of the plays)? The saturation of Ulysses’ density map is so much higher such that Hamlet’s density map is barely visible! Consequently, by doing a direct visual comparison, the absolute scale of the difference in breadth and depth is evidently clear. However, to be fair to Shakespeare, we should at least do a relative comparison of the breadths and depths by rescaling the Hamlet plot to saturate at much lower level, around 300 instead of 1500 (which is a reduction by a factor of 5). So there’s Hamlet rescaled below. Interestingly, very much the same kinds of dense blobs manifest in Hamlet as in Ulysses, except they are roughly 5 times smaller in scale. That is not surprising given that Ulysses has ~30,000 unique words compared to the ~5,000 unique words of Hamlet.

I guess I haven’t succeeded in as much visual appeal as I was subconsciously looking for. Ah well, that happens with any kind of endeavour. Even though these plots are not as instantly understandable as a word cloud, I think they convey something deeper and broader about the writing style and content of these authors. The difference between them is evidently clear in a quantified multi-dimensional manner, which speaks densely to the way they used language in order to achieve distinct effects of their choice. Joyce being the supreme word inventor, while Shakespeare being that unparalleled succinct poet. Just a word of caution about these plots before I conclude, they are preliminary as my implementation of the radix tree data structure has not been reviewed by anyone. It is plausible that some life threatening bugs are hiding in my code somewhere even though I’ve tested and checked the data quite a bit (within the confines of a hobby project that is). So please don’t launch into a war using my data, for I will bear no responsibility of your silliness :P

Sep 9
What is redundancy?

What is redundancy?

Sep 5
150 most frequent words from &#8220;Othello&#8221; thanks to http://www.wordle.net. I like how Othello and Desdemona are at right angles to each other. That just randomly happened, I didn&#8217;t put them that way! Speaks to the inherent theme of this play.

150 most frequent words from “Othello” thanks to http://www.wordle.net. I like how Othello and Desdemona are at right angles to each other. That just randomly happened, I didn’t put them that way! Speaks to the inherent theme of this play.