• composed of perceptions
  • can theoretically compute any function
  • why not boolean circuits?
    • differentiable end to end

Perceptron

Perceptron

Logistic Regression

⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

Text Elements

Link to original
if the activation is sigmoid, then this perceptron is basically just Logistic Regression

One Layer Net

activation function on a linear(affine in general) transformation There are many possible Activation Functions

A one layer net with ReLU activation would be N ∶ Rn → Rp x ↦ max{0,Wx + b} ∶= N(x)

Multi-layer Perceptron (MLP)

  • multiple layer neural network, could be any kind of artificial neurons, not just perceptrons

Feedforward neural Networks

stack of multiple MLPs with multiple hidden layers

  • depth = no of transformations = no of layers(counting both and input and output as layers) - 1 = 4
  • max number of nodes in a layer is called the width = 5
  • no of parameters = Nl * (Nl-1 +1) + Nl-1 * (Nl-2 +1) + …
    • +1 accounting for bias terms, assume a bias term in every unit

deep neural nets are just functions

Population Risk and Empirical Risk

For a distribution D, a finite set of samples from it S,

Population Risk (LD) : Measures The Expected Prediction Error of N Over All Data. Empirical population risk will be over the test samples.

Empirical Risk (LS): Measures The Expected Prediction Error of N Over The Seen Training Data

Backpropagation

Backpropagation

⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠

Text Elements

Link to original

Issues with training neural networks

Neural Networks as universal approximators

In theory, one hidden layer is enough: 2 layer neural networks are universal approximators (they are capable of approximating any continuous function to any desired degree of accuracy