a.k.a GMMs Probabilistic Clustering Model
- Clusters are the hills in the normalized density function of distribution of the feature space.
- The distribution of the feature space can have varying densities → more points in a space makes it more dense
- normalized density function → returns the density
- peak or mode of the hill is the cluster center.
- look at the density function as a probability distribution function
- fit a mixture of gaussian models is used to model this probability distribution
- can handle overlapping clusters → assigns points to clusters with some probability
- Generative, as it gives a probability model of x
- fitting 1 gaussian distribution to 1-dimensional gaussian-
- where, ω is the scale or evidence
- μ is the mean and σ is the std dev of the gaussian η
High dimensional GMM
- Assume the probability distribution P(x) in some D-dimensions is made of k different gaussians
- weighted sum of k gaussians
- such that
- where x ∈ RD
- X belongs to gaussian k for which
- ||X - μk|| is min and
- ||X - μk|| < 2.5σk - Equivalent Latent variable form
- p(z=k) = ωk
- select a mixture component with probability ω
- p(x | z = k) = η(x ; μk, σ k)
- sample from that component’s gaussian
- then p(x) is the marginal over x
- z is the latent variable
- p(z=k) = ωk
Expectation-Maximization (EM) Algorithm
- start with k clusters. Select some μc, σ c and ωc (a.k.a. πc ) for all the clusters.
- Expectation step:
- for all x datapoints, compute soft probabilities ric, the probability that xi belongs to cluster c →N x K matrix
- normalize to sum to one
- if xi likely belongs to cluster c, πc for the cluster will be high
- Maximization step:
- fixing the ric, recompute the cluster parameters
- for each cluster
- Repeat E-M steps until convergence
- each step increases the log-likelihood P(x)
- convergence when log-likelihood doesn’t change much
- convergence guaranteed
How to choose k?
- penalize the log-likelihood scores
Pros
- can handle overlapping clusters
Cons
- sensitive to initialization parameters - can converge to local optima
- reinitialize multiple times
Alternatives
- Stochastic EM
- Hard EM