Continuous Bag of Words
- predicting the word given its context
- context ⇒ surrounding words
- m words before and after the word wt
- no hidden layers
- context is averaged
Objective function
Let wt-1,…,wt-m, wt+1,…,wt+m be the context
- y = average(context)
- Pr(wt |context) = softmax(Wy)