-
predicting the context given a word **wt
-
Let wt-1,…,wt-m, wt+1,…,wt+m be the context
-
Pr(wt | context) * Pr (context) = Pr(context | wt) * Pr(wt)
-
Pr(context) and Pr(wt) are uniform distributions and are constants
-
Pr(context | wt) = Product { Pr(wj | wt) } for all js
Word2Vec is a skip-gram model