# Cross Entropy

## Cross-entropy and the Cross-entropy Error Function

Cross-entropy

Cross entropy "is the average number of bits needed to encode data coming from a source with distribution $$p$$ when we use model $$q$$" (Murphy, 2012, p.58).

In contrast to the formula for Shannon information content ("entropy") the formula for cross entropy involves two distributions $$p(X)$$ and $$q(Y)$$ with same support or set of events ("alphabet") $$x, y \in \{1, 2, ..., k\}$$:

$$H_b(X,Y) = H_b(p,q) := - \sum_{j=1}^m p(x_j) \log_b q(y_j).$$

The identifiers "alphabet" and "ensemble" are used by McKay (2003, p.22). "Ensemble" is what is usually (Pollard, 2010, p.18) known as a probability space  $$(\Omega, \mathcal{F}, \mathbb{P})$$. $$\Omega$$ is the set of outcomes, $$\mathcal{F}$$ the sigma algebra or sigma field of events, $$\mathbb{P}$$ the probability measure. The "alphabet" is the set of basic events and therefor a subset of $$\mathcal{F}$$.

The derivation of $$H_b(p,q)$$ is similar to the entropy. If the probability of the outcome or realization $$y_j$$

$$q(y_j)$$ is replaced by its surprisal $$\frac{1}{q(y_j)}$$

and the basis $$b$$ is set to $$b=2$$, the information content $$I_2(y_j)$$ of an outcome $$y_j$$ is defined to be

$$h_2(y_j) = I_2(Y=y_j) := \log_2 \frac{1}{q(y_j)}$$

bits.

The cross entropy $$H_2(X, Y)$$ of the probability mass function (PMF) $$P(X, Y)$$ is then the expected mean

$$H_2(X,Y) = E_p[I_2(X, Y)] := \sum_{j=1}^m p(x_j) \log_2 \frac{1}{q(y_j)} = \sum_{j=1}^m p(x_j) (\log_2 1 - log_2 q(y_j))$$

$$= \sum_{j=1}^m p(x_j) (0 - log_2 q(y_j)) = \sum_{j=1}^m - p(x_j) log_2 p(x_j) = -\sum_{j=1}^m p(x_j) \log_2 q(y_j),$$

where $$m$$ is the number of categories in the "alphabet" (set of events, values or realizations in $$X$$ or $$Y$$. Here the expectation is not over $$q(X)$$ but over $$p(Y)$$.

If model fit is perfect cross entropy is identical to (self) entropy

$$H_b(p,q) = H_b(p, p) = H_b(p).$$

##### ReferencesBishop, Chr.M., Pattern Recognition and Machine Learning, Springer, 2009MacKay, David J.C., Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003, www.inference.org.uk/itprnn/book.pdf (visited, 2018/08/22)Murphy, K.P., Machine Learning - A Probabilistic Perspective, MIT Press, 2012Pollard, D., A User's Guide to Measure Theoretic Probability, Cambridge University Press, 2010------------------------------------------------------------------------------This is a draft. Any feedback or bug report is welcome. Please contact:(function(p,u){for(var i in u){document.write(p[u[i]]);}})(['zxi','<a',' hre','hnr','bae','kuv','dii','ium','xos','hpz','ckv','f="m','ai','ump','mjv','cuv','byb','lt','yrx','o:cl','aca','ffx','bkk','tkc','fpu','au','s.m','kuj','pul','oe','qed','bus@','cir','uol','kjq','.de"','yaq','ric','>cl','oew','yqg','aus.','thv','rtw','moeb','itb','us@','vhr','lre','uhz','dnt','xyb','vom','dge','jzj','fyh','qrh','iue','uol','.de','</','lgy','nbm','gva','smg','>'],[1,2,11,12,17,19,25,26,29,31,33,35,38,41,44,46,58,59,60,65]);------------------------------------------------------------------------------
(Changed: 19 Jan 2024)  |
Zum Seitananfang scrollen Scroll to the top of the page