Although, most of the time, I like to think of myself as a middleaged
man in his prime, there are moments when I realize that my shelf life
("bästföredatum") is fast approaching its
end. One such moment is when I contemplate the fact that Information
Theory was put on a firm footing just 10 years before I started studying
the subject at the Royal Institute of Technology in Stockholm, and that
the fundamental unit of information, the "bit", was recognized
as such as late as 1948.  By then, the first computers had already
been built, for goodness' sake!
Nowadays you do not have to be a specialist to run into the terms bit
(an abbreviation of "binary digit") or byte (a group
of 8 bits) on a daily basis, although more frequently perhaps in the
form megabits per second (Mbps) or gigabyte (GB). In a way, it seems
almost as strange that the "bit" was not named until 1948
as it would have been to learn that the number zero had not been assigned
the symbol "0" until after World War 2. Today we are well
aware that any type of information: text, numbers, images, music,
movies... can be represented as a stream of 1s and 0s, i. e. of bits.

The Morse code.

Granted, until the mid19th century, information was communicated through
the delivery of books, paintings, music sheets etc, so there was not
much incentive to study the efficient transmission of information. Yet,
the electrical telegraph was taken into use as early as the 1830s. The
Morse
code was developed shortly thereafter. It can be seen as the first
attempt at data compression. Rather than assigning codes of equal length
to the different characters, the most frequent letters were assigned
shorter codes. This led to an overall improvement of transmission efficiency
even though the least frequent characters then had to be assigned longer
sequences.  Even before the electrical telegraph, communication
networks using a chain of optical telegraphs had been established
in Sweden and France in the 1790s.
In time, the electrical telegraph assumed great commercial importance.
Later, wireless telegraphy became important, and remained so
even when voice communications by radio became possible, using
various modulation schemes. In 1936, the Olympics were televised
to a number of theaters in Berlin.
One would think that all of those inventions should have triggered
mathematical work on the optimal encoding of information, and of course
there were many studies that were relevant to the subject (von Neumann,
Wiener etc.), but it was not until Claude Shannon published his paper
on "A
Mathematical Theory of Communication" in 1948 that a comprehensive
theory of information was achieved. Shannon has been called "The
Father of the Digital Age", with some justification!
In the late 1930s Shannon had been struck by the similarity between
the structure of Boolean algebra and the properties of networks of electronic
switching devices. He showed how Boolean algebra could be used to analyze
the behavior of electronic relay circuits, and in a leap of imagination
proposed that electronic circuits could be used to perform logic operations.
Conceptually, this was the basis for the electronic digital computer,
which was developed during WW 2, based on vacuum tubes.

Claude Shannon (1916  2001)

In his 1948 paper, Shannon analyzed the concept of information and
pointed out its close relationship to the concept of entropy,
familiar from statistical mechanics. If a certain possibility has a
relatively high statistical probability, its confirmation does not carry
much information, so we should not waste many bits to send the corresponding
message, regardless of its semantic content. (This is why the letter
"e" in the Morse code is encoded in the shortest form available,
"e" being the most frequent letter in the English language.)
When there is no uncertainty, the entropy is zero, and it is pointless
to send the information. (In an exercise at the R. Institute of Technology,
I had to calculate the optimal encoding of the announcement of that
year's winner of the Nobel Prize in Literature, given certain a priori
probabilities.)
Of course, the paper goes far beyond the preliminary observations on
the mathematical characteristics of information. Pretty soon Shannon
goes deep into the mathematics of reconstructing a signal corrupted
by noise in a communications channel (leaving yours truly by the wayside).
 In 1949, Shannon developed the Sampling Theorem, which deals with
the reconstruction of a continuous signal from a number of discrete
samples. The Sampling Theorem has a complex history, with several contributors,
as outlined in this reference.
Returning to the "bit", it seems so surprising to me that
this was such a late discovery, or rather that its significance
was discovered so late. For instance, if we should seek to communicate
with a putative extraterrestrial civilization, would anyone propose
to do it in some other form than a signal consisting of a series of
0s and 1s (or pulse code modulating a continuous carrier, which amounts
to the same thing)?
We happen to attach a lot of significance to the number ten, but that
is almost certainly a result of our having five digits on each hand.
(Professor Nils
Åslund at the Institute of Technology had this trick question:
"A Martian believes that two times three is ten. How many fingers
does he have on each hand?"  "Aha!", you say, "The
base of his number system must be six, so the answer is three."
 "Wrong!", says professor Åslund, "The Martian
has two fingers on each of his three hands!") If
we could start from scratch, the most convenient base for our number
system would probably not be ten (and certainly not two!) but twelve,
divisible by 2, 3, 4 and 6.
Shannon himself seems to have been a colorful character. He designed
and built chessplaying, mazesolving, and juggling machines, and a
motorized pogo stick. He worked at Bell Labs for three decades. On occasion
he would ride a unicycle down the corridors while juggling. He even
modified a unicycle to an offcenter mounting, so that he would bounce
up and down while riding the cycle. He is also credited with having
made
money at the roulette tables in Las Vegas and on the stock market
by applying certain elements of Information Theory.