Here is a game that Claude Shannon, the founder of information theory, invented in 1948. He tried to model the English language as a random process. Go to your bookshelf, take a random book, open it and point to a random spot on the page and highlight the first two letters you see. Say it’s I and N. Write these two letters on your side.
Now take any other book off the shelf and search it until you find the letters I and N one after the other. No matter what character the following “IN” is – let’s say, for example, it is a space – this is the next letter in your book. And now you take down another book and look for an N followed by a space, and when you find one, write down which character comes next. Repeat this until you have a paragraph
“IN NO IS LAT WHEY CRATICT FROURE BIRS GROCIDOC
PONDENOME OF THE DEMONSTRATIONS OF REPTAGIN IS
REGOAKTIONA OF CRE “
That’s not English, but kind of looks like english.
Shannon was interested in the “entropy” of the English language, a measure of how much information an English text string contains in its new frame. The Shannon game is a Markov chain; That is, it is a random process where the next step depends only on the current status of the process. Once you are in LA, the “IN NO IS” doesn’t matter; For example, the probability that the next letter will be a B is the probability that a randomly selected instance of “LA” in your library will be followed by a B.
And as the name suggests, the method was not original for him; it was almost half a century older and came from a vicious mathematical-theological cow of all things in late-Arist Russian mathematics.
There is almost nothing I find intellectually more sterile than verbal wars between true religious believers and movement atheists. And yet it led, at least this time, to a great mathematical advance, the echoes of which have appeared again and again since then. One of the main actors in Moscow was Pavel Alekseevich Nekrasov, who was originally trained as an Orthodox theologian before turning to mathematics. His counterpart in St. Petersburg was Andrei Andrejewitsch Markow, an atheist and bitter church enemy. He wrote many angry letters to the newspapers on social issues and was widely known as Neistovyj Andrei, “Andrei the Furious”.
The details are a bit detailed here, but the gist is this: Nekrasov believed he had found mathematical evidence of free will that confirmed the faith of the Church. For Markov this was mystical nonsense. Worse, it was mystical nonsense to wear math clothes. He invented the Markov chain as an example of random behavior that could be generated purely mechanically, but had the same characteristics that Nekrasov believed to be guaranteed free will.
A simple example of a Markov chain: a spider walking on a triangle with corners 1, 2, 3. With every tick of the clock, the spider moves from its current seat to one of the other two corners it is connected to, chosen at random. So the way of the spider would be a series of numbers
1, 2, 1, 3, 2, 1, 2, 3, 2, 3, 2, 1 …
Markov started with abstract examples like this one, but later applied that idea (maybe he inspired Shannon?) To text sequences, including Alexander Pushkin’s poem Eugene Onegin. For the sake of mathematics, Markov viewed the poem as a series of consonants and vowels that he laboriously cataloged by hand. Letters after consonants are 66.3 percent vowels and 33.7 percent consonants, while letters after vowels are only 12.8 percent vowels and 87.2 percent consonants.
So you can produce “wrong pushkin” just like Shannon produced wrong English; If the current letter is a vowel, the next letter is a vowel with a 12.8 percent probability, and if the current letter is a consonant, the next is a vowel with a 66.3 percent probability. The results won’t be very poetic; but, as Markov discovered, they can be distinguished from the Markovized works of other Russian writers. Some of her style is captured by the chain.
Today the Markov chain is a fundamental tool for exploring spaces of conceptual units that are much more general than poetry. This is how electoral reformers know what law plans are being brutally overhauled and how Google works out which websites are most important (the key is a Markov chain where you are on a specific website at every step, and the next step is that , follow a random link from this site). What a neural network like GPT-3 learns – which enables it to create an eerie imitation of human-written text – is a gigantic Markov chain that suggests it choose the next word after a sequence of 500 instead of that next letter after a sequence of two. All you need is a rule that tells you what probabilities will determine the next step in the chain, given the last step.
You can find your Markov necklace in your home library or on Eugene Onegin, or on the huge corpus of text that GPT-3 has access to; You can train it on anything, and the chain will imitate this thing! You can train it with baby names from 1971 and get:
Kendi, Jeane, Abby, Fleureemaira, Jean, Starlo, Caming, Bettilia …
Or on baby names from 2017:
Anaki, Emalee, Chan, Jalee, Elif, Branshi, Naaviel, Corby, Luxton, Naftalene, Rayerson, Alahna …
Or from 1917:
Vensie, Adelle, Allwood, Walter, Wandeliottlie, Kathryn, Fran, Earnet, Carlus, Hazellia, Oberta …
The Markov chain, simple as it is, somehow captures some of that style Naming practices of different eras. You almost experience it as creative. Some of these names aren’t bad! You can think of a kid in elementary school named “Jalee”, or for a retro feel “Vensie”.
But maybe not “Naftalene”. Even Markov nods.