Building the network

Forward Propogation

Forward prop is how the network generates a probability vector for your given inputs.

Lets use this diagram to visualize how it works, where B(L) is the bias of a layer, A(L-1) is the value of the previous layer and W(L) is the weight of that connection. Finally, A(L) is the value of the next layer in the network, expressed as a represenation of the previous variables I mentioned. In the working layers, we use the ReLU function, and in the output layer, we use the softmax function. Y is the expected output, C is the generated probability.

$$ Z(L) = A(L-1)*W(L) + B(L) \\ A(L) = ReLU(Z(L)) $$

Inspiration: 3B1B

This sequeunce occurs for every layer, and using matrices we can perform the operation for a whole layer at once. Now, lets tackle backpropogation, something that twisted my mind for a bit.

Back Propogation

// TODO

Overall challenges

Unstable probabilities

For a single forward pass, the output value of the third layer is too low, its too close to 0, to the point where the floating point numbers are unstable. eg → A probaility of too 4.15633838e-26 is too low, even for a bad guess.

First, we check the “squishing” function, softmax:

Softmax function

Initially, I had the following function:

def softmax(vector):
    e = np.exp(vector)
    return e / e.sum()

but the output layer probabilities are too close to 0 (eg: 2.57699342e-20 can be unstable). Apparently, this is common when the exponential values (numerators) get too large. To mitigate this, we will sutract the max value from the vector, to have a function that looks like:

def softmax(vector):
    e = np.exp(vector - np.max(vector))
    return e / e.sum()