Monday, November 19, 2012

Zipf's Law

A while back I was watching a sci-fi documentary that was discussing various scenarios for alien contact. I was struck by a rather obvious point that had never even crossed my mind.

So let's say the folks over at S.E.T.I. receive some sort of signal originating from deep space. The likelihood that the message will be received in one of the 6,900 or so languages that are currently spoken on earth seems highly unlikely. So what are the chances that we would be able to decode a message of a truly alien language? This documentary then considered the fact -- and here's the part I hadn't considered previously -- that scientists have been studying whale sounds for decades and still have no idea what is being communicated. (On a side note, have you seen where a beluga whale was making some "humanlike sounds?") Suddenly it didn't seem very likely that we would ever be able to decipher an alien language.

Then the documentary went on a bit of a tangent and asked the question "how do we know that whale sounds are language?" Maybe they are simply meaningless sounds that do not convey any information. Enter Zipf's Law. The law is actually a mathematical equation but here's the most simple explanation I could find:


"Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc. For example, in the Brown Corpus [of English], the word "the" is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word "of" accounts for slightly over 3.5% of words (36,411 occurrences), followed by "and" (28,852). Only 135 vocabulary items are needed to account for half the Brown Corpus."

The interesting part here is that every known human language follows this pattern. Scientists applied the equation to whale sounds to try and determine if whales do indeed have a language. As it turns out, the sound frequency frequencies do indeed follow the pattern that Zipf set forth! That seems to indicate that these are not simply random sounds, rather there is actual information being exchanged! Figuring out the information is a whole other matter but at least the tiniest of the first steps seems to be taken.

Here is a way cooler article that describes the theory of the experiment but with dolphins.


Just Joe

2 comments:

  1. Sweet! As someone who has enjoyed learning languages, this thought has actually crossed my mind... but as I do with things I can't wrap my mind around at that moment, I set it on the back burner. It's way cool that ya found these! Thanks for sharing!

    ReplyDelete
  2. My pleasure. It's also interesting to note that Zipf's Law seems to apply to many natural, social phenomena like cities' sizes in a country and corporation sizes to name a couple.

    ReplyDelete