🪴Digital Garden

Search

❯

artificial intelligence

❯

natural language processing

❯

❯

Term-Term matrices

Term-Term matrices

Jul 19, 20241 min read

a.k.a. Word-Word matrices or Co-occurrence vectors

Process

Requires a large volume of data
basic preprocessing steps: Tokenization, Lemmatization, etc
count number of times word u appears with word v
meaning of a word u is the vector of counts (named word vector)
- meaning(u) = [count(u,v₁), count(u,v₂), …]

We get,

A matrix X _{n × m} where n = |V| (target words) and m = |V_c | (context words)
- usually a square matrix
context window of ±k words (to the left & right)

Pros

compute similarities between words using cosine
visualize words
dimensions are meaningful, Explainable AI

Cons

cannot capture semantics beyond words
Distributional Semantics may not capture entire semantics
vectors are sparse, high dimensional
- use dimensionality reduction techniques like Latent Semantic Analysis (LSA)

Weighing scheme

Distance Discount

Graph View

Process
Pros
Weighing scheme

Backlinks

Distance Discount
Word Embeddings

Created with Quartz v4.2.3 © 2024

GitHub
Discord Community