Continuous Bag of Words

  • predicting the word given its context
  • context surrounding words
    • m words before and after the word wt
  • no hidden layers
  • context is averaged

Objective function

Let wt-1,…,wt-m, wt+1,…,wt+m be the context

  • y = average(context)
  • Pr(wt |context) = softmax(Wy)