pete's messy room

thoughts and notes

Basic NN derivatives

Tuesday, January 31, 2017, 04:32 PM

I want to log some of my the basic derivatives for neural networks here especially because many solutions over the internet gloss over basic steps and are not as friendly for newcomers.

Deriving Softmax w.r.t. to its input

f(xj)=softmax(xj)=expxjnkexpxk

Using quotient rule where:

f(x)=g(x)h(x) f(x)=g(x)h(x)g(x)h(x)h(x)2

When i = j:

g(xj)xi=expxi h(xj)xi=0+0+0+0+expxi=j+0+0=expxi

So:

f(xj)xi=expxiexpxkexpxjexpxi(expxk)2 =expxiexpxkexpxkexpxjexpxk =f(xj)(expxkexpxkexpxiexpxk) =f(xj)(1f(xj))

When ij, since expxi is not a function of expxj:

g(x)xi=0 and h(x)xi=expxi because i is some number in expxk

Using quotient rule:

f(xj)xi=0expxjexpxi(expxk)2 =expxjexpxkexpxiexpxk =f(xj)f(xi)

To summarize:

Given pj=exjkexk

pjθi=pi(1pi),i=j pjθi=pipj,ij

Deriving Cross Entropy w.r.t Softmax’s input

CE=jylog(^y) where ^y=softmax(θ) and y is a one-hot vector

With chain rule:

f(x)=f(x)g(x)g(x)h(x) CEθi=ylog(^y)θi =jyj1^yj^yjθi

When i=j

jyj1^yj^yjθi=yi1^yi^yi(1^yi) =yi(1^yi)

When ij :

yj1^yj^yθi=yij1^yj(^yi^yj) =ijyi^yi

Combining the two

yj1^yj^yθi=yi(1^yi)+ijyi^yi =yi^yiyi+ijyi^yi =^yijyiyi

We know that sum of yi is 1, so the solution is:

=^yiyi

or equivalently:

^yi1, i=j

^yi, ij