The entropy in equilibrium thermodynamics is defined as , which always increases in closed systems. It is clearly a special case of Shannon entropy . If the probabilities are uniform, , then Shannon entropy boils down to thermodynamic entropy.
A system can be called structured, if some states are predicted with higher probability than others, which leads to lower entropy. As I have argued in a different post, the loss of the ability to predict the system is diagnosed by increasing entropy. In the extreme case, if microstates transition randomly with equal probability, chances are high to get to an unstructured state than to a structured one.
In order to describe both the structure and the transition laws, the concept of algorithmic complexity is needed. If the microstate i is described by a set of numbers, say the speed and position of particles, then this set of numbers can be written as a sequence. Then a Kolmogorov complexity can be assigned to state : . Since is quite high for most sequences, starting out with low and transiting randomly will increase . Therefore the Kolmogorov complexity of the system will increase with time.
Interestingly, a low K allows a better (spatial) prediction of the sequence. However, one may have a model, a “law” if you wish, that governs the time evolution of the system. One may not restrict oneself with a microscopic law, but one about some emergent variables of the system. Even macroscopic ones that we have in classical thermodynamics. The “length” of such laws is, in some sense the Kolmogorov complexity of the system, neglecting all the micro- and mesoscopic details.
What happens when structures occur such as living systems or galaxies etc.? The system evolves into a low complexity state. Moreover, it looks like there is scale-free structure in the universe. Why is it not possible nowadays to plug-in the laws of physics, and see chemistry and biology evolve? Because, even with todays computers things would become too complicated.
Interestingly, the minimal entropy of a system corresponds roughly to the algorithmic complexity. Vitanyi and Li write on p. 187: “the interpretation is that bits are on average sufficient to describe an outcome . Algorithmic complexity says that an object has complexity, or algorithmic information, equal to the minimum length of a binary program for .”
What would be the holy grail?
If we could update the theory of statistical physics to phenomena that create low complexity, i.e. structures, a theory of self-organization, really. If we could state the condition under which complexity of a subsystem will drop below some bound, then it would be a great thing. This may even become a theory of life.
The even holier grail would be to explain the emergence of intelligent life. To me, it seems sufficient to explain the emergence of subsystems that can develop compressed representations of some of its surroundings. For example, a frog that predicts the flight trajectory of a fly has achieved some degree of intelligence since it compresses the trajectory in its “mind”, which enables prediction in the first place.
What does that mean in physical terms? What is representation? In what sense is representation “about” something else? Somehow, it is the ability to create, to unpack, our representation into an image of the represented object, which is what it means to imagine something. Even if the frog does not imagine the future trajectory, it acts as if it knew how it will continue. Essentially, successful goal-directed action is possible only if you predict, that is only if you compress. However, actions and goals are still not part of physical vocabulary.
It will turn out that in order to maintain a low complexity state, the system will have to be open and exchange energy with the environment. After all, in a closed system, entropy and therefore complexity must increase. In order words, the animal has to eat and to shit 🙂
If you extract the energy from your environment in such a way that you maintain your simple state, does it not imply intelligent action already? Don’t you have to be fairly selective of what kind of energy you take and what you reject? If a large stone flies toward you, you may want to avoid collision: that type of energy transfer is not welcome, since it does not help to maintain a low complexity state. However, a crystal also maintains its simplicity. Probably because it becomes so firm that after a while it just does not desintegrate from the influence of the environment. It any case, it does not represent anything about it’s surroundings. It does not react to the environment either.
From Jeremy England (2013).
Irreversible systems increase the entropy (and the heat) of the heat bath.
If we want internal structure (dS shall be negative) and high irreversibility, then a lot of heat would be released into the bath. Can this result be transformed into an expression with algorithmic complexity? If yes, and if we figure out how to construct a system such that it does create that, then we have figured out, how to create structure.
We can also increase beta, which is done by lowering the temperature. Thus, unsurprisingly, freezing leads to structure formation. But that’s not enough for life. Freezing is also fairly irreversible. So, maybe, structure formation is not enough. What we need is structure representation! What does it mean to represent and to predict in physical terms?
Let’s say a particle travels along a straight line. If a living organism can predict it, it means that it has somehow internally found a short program that, when executed, can create the trajectory and also expand it further in time. It can compute points and moments in future where the particle is going to be. It is the birth of intentionality, of “aboutness”. If there is an ensemble of particles with all their microstates, how can they be “about” some other external particle?
The funny thing is, you need such representations, in order to decrease entropy. After all, the more you compress, the less degrees of freedom are remaining, hence the state space is reduced and the entropy decreases therefore. There can also be hierarchical representations within a system, which means that there is an “internal aboutness” as well. Thus the internal entropy decreases once an ensemble of particles at one level is held in a macrostate determined by a higher level ensemble! Hence, predicting the “outer” world may simply be a special case of predicting the “inner” world, you own macrostates. Thus, in order to decrease the state space in such a way, a few high level macrostates have to physically determine all the microstate at lower levels. For example, in an autoencoder the hidden layer compresses and recreates the inputs at the input layer. In the nervous system neurons get active or not active and therefore take up a large part of the entropy of the brain. The physical determination happens through the propagation of an electric potential though the axons and dendrites of neurons. But, especially in the beginning of life, things have to work out without a nervous system.
Instead of thinking about a practical implementation, I could think of a theoretical description, such as the formula (8). I imagine all microstates of a level being partitioned with a macrostate assigned to each subset of the partition. Those macrostates would correspond to the microstates of the level above. Now, there is a highly structured outside world, which means that the entropy is low and there are much less states than are theoretically possible. If you have got sensors, some part of the outer world activates them. Their probability distribution is the one to be compressed. Which means, if the world has been created by running a short program, your job as a living being is to find that program. And why should that happen? It would be very cool to show that our world is made in a way such that subsystem will emerge that try to represent it and ultimately find out the way it has been made. Predicting food trajectories may be a start to do exactly that.
So, it is not just the goal of decreasing internal entropy, but to do it in such a way that it represents the outer world, the entropy of which is already decreased by the laws of nature. And what does represent mean in that sense? In the internal sense it means to physically determine ones own internal states. And for the outer world it means to have sensors somehow such that the states of the outer world are reflected by the states of your sensors. So, we can imagine the lowest layer/level to encode the states of the outer world, at least a part of it. And it does so in a non-compressing way, hence it is a one to one map of a part of the world. Can we show, that under such circumstances, compression in terms of algorithmic complexity is the best thing to be done? Basically, it means that some part of the animal is driven by some outer influences and therefore can not be changed, hence contributes majorly to the entropy of the animal. In order to decrease the entropy nevertheless, the animal has to find a way to recreate those same inputs.
Now, I have to clarify, what probability distributions are meant, when we compute the entropy. In a deterministic world, probabilities are always the reflection of our – the scientists’ – lack of knowledge. Those probabilities are different from the probabilities assigned by the animal: those are the animal’s knowledge. We should treat them as the same.
A way to reduce internal entropy is to couple all remaining internal states to the sensor inputs. Which does not necessarily mean to compress. Well, it does decrease the entropy, since only the sensor entropy remains, but it does not decrease it even further! In order to decrease it even further, the sensor entropy has to depend on internal states, which should be fewer in number. They have to GENERATE the microstates of the sensors.