Basic NN derivatives
Tuesday, January 31, 2017, 04:32 PM
I want to log some of my the basic derivatives for neural networks here especially because many solutions over the internet gloss over basic steps and are not as friendly for newcomers.
Deriving Softmax w.r.t. to its input
f(xj)=softmax(xj)=expxj∑nkexpxkUsing quotient rule where:
f(x)=g(x)h(x) f′(x)=g′(x)h(x)−g(x)h′(x)h(x)2
When i = j:
∂g(xj)∂xi=expxi ∂h(xj)∂xi=∑0+0+0+0+expxi=j+0+0=expxiSo:
∂f(xj)∂xi=expxi∗∑expxk−expxj∗expxi(∑expxk)2 =expxi∑expxk∗∑expxk−expxj∑expxk =f(xj)∗(∑expxk∑expxk−expxi∑expxk) =f(xj)∗(1−f(xj))When i≠j, since expxi is not a function of expxj:
∂g(x)∂xi=0 and ∂h(x)∂xi=expxi because i is some number in ∑expxk
Using quotient rule:
∂f(xj)∂xi=0−expxj∗expxi(∑expxk)2 =−expxj∑expxk∗expxi∑expxk =−f(xj)∗f(xi)To summarize:
Given pj=exj∑kexk
∂pj∂θi=pi(1−pi),i=j ∂pj∂θi=−pipj,i≠jDeriving Cross Entropy w.r.t Softmax’s input
CE=−∑jylog(^y) where ^y=softmax(θ) and y is a one-hot vector
With chain rule:
f′(x)=∂f(x)∂g(x)∗∂g(x)∂h(x) ∂CE∂θi=−∑∂ylog(^y)∂θi =−∑jyj∗1^yj∂^yj∂θiWhen i=j
−∑jyj∗1^yj∗∂^yj∂θi=−yi∗1^yi∗^yi(1−^yi) =−yi(1−^yi)When i≠j :
−∑yj∗1^yj∗∂^y∂θi=−∑yi≠j∗1^yj∗(−^yi^yj) =∑i≠jyi^yiCombining the two
−∑yj∗1^yj∂^y∂θi=−yi(1−^yi)+∑i≠jyi^yi =yi^yi−yi+∑i≠jyi^yi =^yi∑jyi−yiWe know that sum of yi is 1, so the solution is:
=^yi−yior equivalently:
^yi−1, i=j
^yi, i≠j