a.k.a. Word-Word matrices or Co-occurrence vectors
Process
- Requires a large volume of data
- basic preprocessing steps: Tokenization, Lemmatization, etc
- count number of times word u appears with word v
- meaning of a word u is the vector of counts (named word vector)
- meaning(u) = [count(u,v1), count(u,v2), …]
We get,
- A matrix X n × m where n = |V| (target words) and m = |Vc | (context words)
- usually a square matrix
- context window of ±k words (to the left & right)
Pros
- compute similarities between words using cosine
- visualize words
- dimensions are meaningful, Explainable AI
Cons
- cannot capture semantics beyond words
- Distributional Semantics may not capture entire semantics
- vectors are sparse, high dimensional
- use dimensionality reduction techniques like Latent Semantic Analysis (LSA)