Linear Regression Gradient Descent
⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠
Text Elements
Link to original
Feature Scaling
- always normalize/standardize
How to pick learning rate?
- Line search methods
- Conjugate gradient
- used for quadratic objectives
- Newton Search direction
Large datasets
In large datasets, computing gradient gets expensive (since it uses the whole dataset X)
- Online Learning
- Stochastic Gradient Descent a.k.a minibatch gradient descent