# Information theory A 1993 chronology of the development of information theory, which starts with the work of Harry Nyquist (1924), Ralph Hartley (1928), and Claude Shannon (1948). 
In science, information theory, or unambiguously the "theory of the transmission of information", as defined Italian-born American electrical engineer Robert Fano, who taught one of the first courses on information theory, at MIT, is a mode of logic that attempts to analyze and mathematically quantify the transmittal of data, typically through wire or radio wave.  Information theory, according to the Oxford Dictionary of Science, is the branch of mathematics that analyses information mathematically. 

Naming confusions
See main: Information entropy (quotes); Shannon bandwagon
In 1948, American electrical engineer Claude Shannon adopted the physics term "entropy" as the new name for his logarithm of a probability formulation of data transmission, telegraphy in particular, alluding to the premise that the his formula and the formula for entropy in statistical mechanics has the same "form", after which people began to continuously assume a connection to the two fields. To exemplify this confusion, the 2005 Oxford Dictionary of Science, gives a the following addendum to the definition of information theory: 

"Several branches of physics have been related to information theory. For example an increase in entropy has been expressed as a decrease in information. It has been suggested that it may be possible to express the basic laws of physics using information theory. See also: Landauer's priniciple; Zeilinger's principle."

Here we see the recursive confusion that results from Shannon's misfortune naming choice of the formula for the mathematical quantification of the "measure of information, choice, and uncertainty" in a signal transmission by the name entropy; meaning that, the above science dictionary definition, in the Shannon-namesake sense of the matter, reduces to the following nonsensical statement, as the unacquainted reader would see things: "an increase in entropy has been expressed as a decrease in entropy."

In short, beginning in the last half of the 20th century, there has been a push to blend information theory concepts together with the laws of thermodynamics, e.g. by suggesting that an increase in entropy can be expressed as in decrease in information, to yield new laws, e.g. the law of conservation of information, or branches of sciences, such as information theory and evolution, chaos theory, or complexity theory. 

Thermodynamicists, however, view the connection between entropy and information to be only superficial, very illogical, and not justified.  To cite one example, in 1950 American electrical engineer Claude Shannon estimated the "entropy" of written language to be 0.6 to 1.3 bits per character; and in modern times, through many convoluted probability arguments, one can find dozens of researchers arguing that this is the same as German physicist Rudolf Clausius' 1865 definition of "entropy" in thermodynamics, which has units of joules per kelvin per mole (J/K∙mol). 

American molecular machines theorist Thomas Schneider, since 1991, has devoted a number of articles on online pages, e.g. “Information is Not Entropy, Information is Not Uncertainty!” (1997), “Shannon entropy is a misnomer” (2002), etc., to a discussion of the terminology confusion resulting from the overlap of information theory and thermodynamics in his field of research: ‘molecular information theory’, which he defines as the application of Shannon’s information theory to molecular patterns and states. 

Overview
In the 1940s, while American electrical engineer Claude Shannon was developing the basic mathematics of information theory of telephone signals, an associate, American chemical engineer John von Neumann, suggested that Shannon call the informational uncertainty associated with a random variable "entropy" because his basic formula was similar to the statistical formula for the entropy of an ideal gas as developed by Ludwig von Boltzmann in 1872. Every since, many individuals, especially outside the field of thermodynamics, such as mathematicians, have inadvertently assumed a physical existence between thermodynamic "irreversibility" in heat engine cycles and "information uncertainty" in communication lines. This has led to a great confusion. The three main researchers regarded as being responsible for the alleged equivalence between information and negative entropy are French physicist Leon Brillouin (1950), American physicist Edwin Jaynes (1957), and Hungarian-American physicist Leo Szilard (1964). Claude Shannon - American electrical engineer who founded information theory with his 1948 paper "A Mathematical Theory of Communication"; in which he argued that entropy is a measure of information, thus initiating the questionable field of information theory thermodynamics.

In more detail, information theory was founded by Shannon with the publication of his 1948 article "A Mathematical Theory of Communication."  The central paradigm of classical information theory is the engineering problem of the transmission of information over a noisy channel. The most fundamental results of this theory are Shannon's source coding theorem, which establishes that, on average, the number of bits needed to represent the result of an uncertain event is given by the “uncertainty associated with a random variable” (called “entropy” by Shannon); and Shannon's noisy-channel coding theorem, which states that reliable communication is possible over noisy channels provided that the rate of communication is below a certain threshold called the channel capacity.

Origin of assumed thermodynamic connection
In 1872, Austrian physicist Ludwig Boltzmann formulated the H theorem, a statistical interpretation of Rudolf Clausius’ entropy, or “transformation content” of a working body, for an ideal gas system of particles with no appreciable interaction, thus finding a proof for the phenomenon of atomic and molecular irreversibility in steam engine cycles.  The H theorem was defined by Boltzmann as: where f(p,q,t) is a distribution function, or probability of finding, at a given time t a particle with a position q and momentum p, in which the distribution is assumed to evolve with time, owing to the proper motion of the molecules and their collisions.  The H theorem of Boltzmann states that this function decreases with time and tends towards a minimum which corresponds to the Maxwell distribution: If the distribution of the velocities is Maxwellian, the H function remains constant during time. For a system of N statistically independent particles, H is related to the thermodynamic entropy S through:

S = -NkH The Boltzmann tombstone showing the S = k log W entropy formula, which many people mistakenly assume, can be used to quantify "bits" in data storage and transmission.

In 1901, German physicist Max Planck put Boltzmann’s statistical formula in the form shown below, and also carved on his tombstone (adjacent):

S = k log W

where k is the Boltzmann constant equal to equal to 1.38062 x 10E-23 joule/kelvin and W is the number of “states” the particles of the system can be found in according to the various energies with which they may each be assigned.

Bell telephone labs
In a completely different field of study, in 1924, while working at Bell telephone labs, American electrical engineer Harry Nyquist published a paper called “Certain Factors Affecting Telegraph Speed”, containing a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation:

W = K log m

where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant.

In 1928, American electronics researcher Ralph Hartley published the paper “Transmission of Information”, in which he used the word information as a measurable quantity, reflecting the receiver's ability to distinguish that one sequence of symbols from any other, thus quantifying information as:

H = n log S

where S was the number of possible symbols, and n the number of symbols in a transmission. 

Pulling this all together, so to say, in 1944, while working at Bell telephone labs, Claude Shannon sought to solve the basic problem in communication, namely: "that of reproducing at one point, either exactly or approximately, a message selected at another point." In this direction, Shannon developed a number of fundamental theorems, such as defining the “bit” as the basic unit of information.

A Mathematical Theory of Communication
In 1948 Shannon published his famous paper “A Mathematical Theory of Communication”, in which he devoted a section to what he calls Choice, Uncertainty, and Entropy. In this section, Shannon introduces an “H function” of the following form: $H = -K\sum_{i=1}^np_i\log p_i\,\!$
where K is a positive constant. Shannon then states that “any quantity of this form, where K merely amounts to a choice of a unit of measurement, plays a central role in information theory as measures of information, choice, and uncertainty.” Then, as an example of how this expression applies in a number of different fields, he references R.C. Tolman’s 1938 Principles of Statistical Mechanics, stating that “the form of H will be recognized as that of entropy as defined in certain formulations of statistical mechanics where pi is the probability of a system being in cell i of its phase space… H is then, for example, the H in Boltzmann’s famous H theorem.” The following excerpt is the key section in which Shannon defined his new variable, with implied thermodynamic connotations: Excerpt (page 11) of Claude Shannon's 1948 "A Mathematical Theory of Communication" in which he connects information to entropy; where the reference 8 is (See, for example, R. C. Tolman, Principles of Statistical Mechanics, Oxford, Clarendon, 1938).
Shannon then gives what he calls "the entropy H in the case of two possibilities with probabilities p and q = 1- p" and then goes on to declare:

“The quantity of H [entropy] has a number of interesting properties which further substantiate it as a reasonable measure of choice or information.”

and from hereafter, German physicist Rudolf Clausius' 1865 thermodynamic quantity entropy S has since been assumed, by millions of people, to be a measure of information. In the modern age, one can even find books that not only equate entropy to information but actually suggest that information replace not just entropy but the entire science of thermodynamics. An example is the 2008 book A Farewell to Entropy: Statistical Thermodynamics Based on Information by Israeli physical chemist Arieh Ben-Naim who argues that “thermodynamics and statistical mechanics will benefit from replacing the unfortunate, misleading and mysterious term entropy with a more familiar, meaningful and appropriate term such as information, missing information or uncertainty.” 

Brillouin's conceptions of information
By 1956, owing to the work of Shannon and Schrödinger, French physicist Leon Brillouin, in his book Science and Information Theory, had stated that “a new scientific theory has been born during the last few years, the theory of information.”  The basics of this theory, according to Brillouin, revolve around the question of how one is able to define the quantity of information communicated by a system of telegraph signals, signals which themselves consist of either electromagnetic or current transmissions. In this theory, the basic definition of information in a given operation, a quantity that Brillouin supposedly proves is “very closely related to the physical entropy of thermodynamics”, is defined by him as:

I = K ln P

where I denotes information, K is a constant, and P is the probability of the outcome. Brillouin reasons that with these types of probability arguments, “it enables one to solve the problem of Maxwell’s demon and to show a very direct connection between information and entropy” and that “the thermodynamic entropy measures the lack of information about a physical system.” Moreover, according to Brillouin, “whenever an experiment is performed in the laboratory, it is paid for by an increase in entropy, and a generalized Carnot principle states that the price paid in increase of entropy must always be larger than the amount of information gained.” Information, according to Brillouin, corresponds to “negative entropy” or negentropy, a term he coined. In sum, Brillouin unjustifiably declares that the “generalized Carnot principle (the second law of thermodynamics) may also be called the negentropy principle of information”.

Neumann's famous suggestion?
See main: Neumann-Shannon anecdote
In 1940s, after Shannon had been working on his equations for some time, he happened to visit the mathematician and chemical engineer John von Neumann. During their discussions, regarding what Shannon should call the “measure of uncertainty” or attenuation in phone-line signals with reference to his new information theory, Shannon stated the following (a conversation that varies depending on the source): In short, Neumann told Shannon:

“You should call [your measure of choice or information] entropy, because nobody knows what entropy really is, so in a debate you will always have the advantage.”

This is a famous comment classifies as both a famous entropy quotation and as a classic entropy misintrepretation. One must always remember that any thermodynamic connotations made in information theory are always traced back to this ridiculous suggestion. To note, Neumann could have just as easily told Shannon to call is new quantity “information sensation” based on similarity to German physiologist Justav Fechner’s 1860 statistical logarithmic psychological sensation formula:

S = K log I

or

S = c log R

where S stands for sensation, c or K represent a constant that must be determined by experiment in each particular order of sensation, and R or I represent the stimulus numerically estimated.  This would have saves the thermodynamics community decades of wasted energy. In any event, over the last sixty years, ever since this terminology overlap or assumed equivalence between entropy and information was made, people have been overlapping the two concepts or even stating that they are exactly the same, e.g. citing conservation of information laws modeled on the law of conservation of energy, namely that the Clausius’ entropy, defined by S = dQ/T, as derived from the phenomenon of irreversibility in heat engine cycles as defined by the second law of thermodynamics: is the same as Shannon’s entropy, from the phenomenon of information loss in communication as defined in Shannon’s “Mathematical Theory of Communication” paper: Entropy = Information | Objections
Every since 1948, with the publication of Shannon's paper, there has been a growth in the assumed equivalence of heat engine entropy and the entropy of a message, as well as growth in the objections to this point of view. In the 1999, to cite one example, American chemistry professor Frank Lambert, who for many years taught a course for non-science majors called "Enfolding Entropy" at Occidental College in Los Angeles, stated that another major source of confusion about entropy change as the result of simply rearranging macro objects comes from information theory "entropy" of Claude Shannon.  In Shannon’s 1948 paper, as discussed, the word "entropy” was adopted by the suggestion von Neumann. This step, according to Lambert, was “Wryly funny for that moment,” but “Shannon's unwise acquiescence has produced enormous scientific confusion due to the increasingly widespread usefulness of his equation and its fertile mathematical variations in many fields other than communications". 

According to Lambert, “certainly most non-experts hearing of the widely touted information entropy would assume its overlap with thermodynamic entropy. However, the great success of information "entropy" has been in areas totally divorced from experimental chemistry, whose objective macro results are dependent on the behavior of energetic microparticles. Nevertheless, many instructors in chemistry have the impression that information "entropy" is not only relevant to the calculations and conclusions of thermodynamic entropy but may change them. This logic, according to Lambert, is not true.  In sum, according to Lambert, information "entropy" in all of its myriad nonphysicochemical forms as a measure of information or abstract communication has no relevance to the evaluation of thermodynamic entropy change in the movement of macro objects because such information "entropy" does not deal with microparticles whose perturbations are related to temperature. Even those who are very competent chemists and physicists have become confused when they have melded or mixed information "entropy" in their consideration of physical thermodynamic entropy. This is shown by the results in textbooks and by the lectures of professors found on the Internet.

In the 2007 book A History of Thermodynamics, for instance, German physicist Ingo Müller summarizes his opinion on the matter of von Neumann’s naming suggestion:

“No doubt Shannon and von Neumann thought that this was a funny joke, but it is not, it merely exposes Shannon and von Neumann as intellectual snobs. Indeed, it may sound philistine, but a scientist must be clear, as clear as he can be, and avoid wanton obfuscation at all cost. And if von Neumann had a problem with entropy, he had no right to compound that problem for others, students and teachers alike, by suggesting that entropy had anything to do with information.”

Müller clarifies the matter, by stating that: “for level-headed physicists, entropy (or order and disorder) is nothing by itself. It has to be seen and discussed in conjunction with temperature and heat, and energy and work. And, if there is to be an extrapolation of entropy to a foreign field, it must be accompanied by the appropriate extrapolations of temperature, heat, and work.” 

Computer science thermodynamics

References
1. Daintith, John. (2005). Oxford Dictionary of Science (pg. 421). Oxford University Press.
2. Nyquist, Harry. (1924). “Certain Factors Affecting Telegraph Speed”, Bell System Technical Journal, 3, pgs. 324–346.
3. Boltzmann, Ludwig. (1872). “Weitere Studien uber das Warmegleichgeleichgewicht unter gasmolekulen.” Sitzungsberichte der Akademie der Wissencschafte, Wein, II, 66, 275. [English translation in: S.G. Brush. (1966). Kinetic theory, Vol. 2, Irreversible processes, pp. 88-195, Oxford: Pergamon Press.]
4. Shannon, Claude E. (1948). "A Mathematical Theory of Communication", Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, July, October.
9. Perrot, Pierre. (1998). A to Z of Thermodynamics, Oxford: Oxford University Press.
10. (a) Campbell, Jeremy. (1982). Grammatical Man - Information, Entropy, Language, and Life. new York: Simon and Schuster.
(b) Yockey, Hubert P. (2005). Information Theory, Evolution, and the Origin of Life. Cambridge: Cambridge University Press.
(c) Baeyer, Hans Christian von. (2004). Information - the New Language of Science. Cambridge, Massachusetts: Harvard University Press.
(d) Avery, John (2003). Information Theory and Evolution. World Scientific.
(e) Sardar, Ziauddin and Abrams, Iwona. (2004). Introducing Chaos. USA: Totem Books.
11. Muller, Ingo. (2007). A History of Thermodynamics - the Doctrine of Energy and Entropy (ch 4: Entropy as S = k ln W, pgs: 123-126). New York: Springer.
12. Lambert, Frank L. (1999). “Shuffled Cards, Messy Desks, and Disorderly Dorm Rooms: Examples of Entropy Increase? Nonsense!Journal of Chemical Education, 1999, 76, 1385-1387.
13. Including: Golembiewski, R. T. Handbook of Organizational Behavior; Dekker: New York, 1993.
14. Mayumi, Kozo. (2001). The Origins of Ecological Economics: The Bioeconomics of Georgescu-Reogen (ch. 3: section 4: “The Alleged equivalence between information and negative entropy, pgs. 39-44). Routledge.
15. (a) Shannon, Cluade E. (1950), "Prediction and Entropy of Printed English", Bell Sys. Tech. J (3) p. 50-64.
(b) Mahoney, Matt. (1997). "Refining the Estimated Entropy of English by Shannon Game Simulation," Florida Institute of Technology.
16. (a) Fancher, R. E. (1996). Pioneers of psychology (3rd Ed.). New York: W. W. Norton & Company.
(b) Sheynin, Oscar. (2004), "Fechner as a Statistician" (abstract), The British journal of mathematical and statistical psychology, 57 (Pt 1): 53-72, May.
17. Ben-Naim, Arieh. (2008). A Farewell to Entropy. World Scientific Publishing Co.
18. Coveney, Peter V. and Highfield, Roger. (1992). The Arrow of Time: A Voyage Through Science to Solve Time’s Greatest Mystery (pgs. 178, 253). Fawcett Columbine.
19. Aftab, O., Cheung, P., Kim, A., Thakkar, S., and Yeddanapudi, N. (2001). “Information Theory and the Digital Age” (§: Bandwagon, pgs. 9-11), Project History, Massachusetts Institute of Technology.
Robert Fano – Wikipedia.
20. (a) Schneider, Thomas D. (1991). “Theory of Molecular Machines. II. Energy Dissipation from Molecular Machines” (abs), Journal of Theoretical Biology, 148(1): 125-37.
(b) Schneider, Thomas D. (1997). “Information is Not Entropy, Information is Not Uncertainty!”, Frederick National Laboratory for Cancer Research, Jan 4.
(c) Schneider, Thomas D. (2000). “Pitfalls in Information Theory and Molecular Information Theory”, Frederick National Laboratory for Cancer Research, Mar 13.
(d) Shannon entropy is a misnomer (section) – Schneider.ncifcrf.gov.