Numerical stability of binary cross entropy loss and the log-sum-exp trick

September 26, 2018

[latexpage] When training a binary classifier, cross entropy (CE) loss is usually used as squared error loss cannot distinguish bad predictions from extremely bad predictions. The CE loss is defined as follows: $$L_{CE}(y,t) = tlogy + (1-t)log(1-y)$$ where $y$ is the probability of the sample falling in the positive class $(t=1)$. $y = \sigma(z)$, where […]