Thermodynamics and cross entropy

Question

I am facing with the concept of cross entropy. I would like to know the thermodynamic and statistical meaning of cross entropy (if exists)?

What is cross entropy? Can you define it or provide a reference? — Vijay Murthy, Apr 20 '12 at 20:16

Nathaniel · Accepted Answer · 2012-04-21 13:20:44Z

Thank you for your interesting question, which lead me to quite a nice little result. I couldn't find a nice thermodynamic meaning for the cross entropy, but I did find one for a related quantity called the Kullback-Leibler divergence. As far as I know, this is a new result.
The cross entropy between two probability distributions is defined as

H (P, Q) = - \sum i p i log q i .

These two probability distributions should both refer to the same set of underlying states. Normally in thermodynamics we think of a system only having one probability distribution, which represents (roughly) the range of possible states the system might be in at the present time. But systems can change over time. So let's imagine have a system (with constant volume) that's initially in equilibrium with a heat bath at a temperature

T1 . According to the usual principles of statistical mechanics, its state can be represented by the probability distribtion

p i = 1 Z 1 e - β 1 u i,

where

β1=1/T1 (I've set Boltzmann's constant equal to 1 for clarity) and

Z1 is a normalisation factor called the partition function. The

ui are the energy levels of the system's permitted microscopic states.
Now let's imagine we pick up our system and put it in contact with a heat bath at a different temperature,

T2 , and let it come to equilibrium again. Since no work has been done, all the

ui values will be unchanged and we'll have a new distribtion that looks like this

q i = 1 Z 2 e - β 2 u i,

where

β2=1/T2 .
Now we can do a bit of algebra to find the cross-entropy:

H (P, Q) = 1 Z 1 \sum i e - β 1 u i (β 2 u i + log (Z 2)) = log (Z 2) + 1 Z 1 \sum i e - β 1 u i β 2 u i

= log (Z 2) + U 1 β 2 .

It's a standard result from statistical mechanics that

S 2 = H (Q) = log (Z 2) + U 2 β 2 .

Solving this for

log(Z2) and substituting into the cross entropy formula we have

H (P, Q) = S 2 - β 2 (U 2 - U 1) = S 2 - U 2 - U 1 T 2 .

Physicists are, generally speaking, afraid of any quantity that has entropy units, and if they see one they like to multiply it by a temperature in order to make it look like an energy. If we multiply this by

T2=1/β2 we get

T 2 H (P, Q) = T 2 S 2 - Δ U .

It's possible that this might have a nice thermodynamic interpretation in terms of something like the maximum amount of work that we can extract from doing this transformation under a particular set of circumstances --- but if it does then I haven't seen it just yet. The expression looks tantalisingly like a change in free energy (

ΔU−TΔS ), but it's not quite the same.
However, we can get a much more interesting result if we note that in information theory, the Kullback-Leibler divergence (aka information gain) is often seen as more fundamental than the cross entropy. The KL-divergence is defined as

D K L (P ∥ Q) = \sum i p i log p i q i = H (P, Q) - H (P),

which in our case is equal to

S 2 - S 1 - U 2 - U 1 T 2 = Δ S - Δ U / T 2 .

This is much more interesting than the result for the cross-entropy, because it does have a clear thermodynamic interpretation. When we put the system in contact with the second heat bath, its entropy changes by

ΔS , and the entropy of the heat bath changes by

−ΔU/T2 . (This is because entropy is heat divided by temperature - an amount

ΔU leaves the system, so

−ΔU enters the heat bath.) So the KL-divergence is just the total change in entropy after we put the system in contact with the new heat bath. I'm quite excited about this because I didn't know it before, and I don't think anyone else did either!
We can even take this a bit further. Let's imagine putting a heat engine in between the system and the second heat reservoir. So we'll try to extract some useful work from the flow of heat that takes place as the system and the heat bath equilibriate. If we do this the total change of entropy becomes

ΔS+(−ΔU−W)/T2 . This has to be greater than 0, which means that

W≤T2ΔS−ΔU .
Now, if we do that physicist thing of multiplying

DKL by

T2 , it becomes

T2ΔS−ΔU , which is the value for the maximum work that we just calculated. So while the thermodynamic meaning of the cross-entropy isn't clear to me, the KL-divergence does seem to have a nice interpretation in terms of work.

I have a doubt. If Dkl represents the maximum work we can extract from two thermodynamic bath, why it is not symmetric? Namely Dkl(Q|P)/=Dkl(P|Q). — Emanuele Luzio, Nov 7 '13 at 16:11
@EmanueleLuzio it's because the heat baths don't play symmetrical roles: we put the system first in contact with one and then the other. If we did it in the opposite order the maximum work would be T1DKL(Q∥P) instead. — Nathaniel, Nov 7 '13 at 17:03
Sorry, but I don't understand. You are telling me that if I put a system of T1 in contact with a bath of T2 the work I can extract is different from putting a system of T2 in contact with a bath of T1? — Emanuele Luzio, Nov 7 '13 at 17:29
@EmanueleLusio in general yes, it can be different. It's because the waste heat from the heat engine has to go into the heat bath. The temperature if the waste heat bath determines the maximum efficiency of the engine, which is different in the two cases. — Nathaniel, Nov 8 '13 at 2:35

phymath999

Sunday, March 23, 2014

information01 KL-divergence is just the total change in entropy after we put the system in contact with the new heat bath

Thermodynamics and cross entropy

1 Answer

No comments:

Post a Comment