Clustering – Finding Related Posts
主流程
- Extract the salient features from each post and store it as a vector per post.
- Compute clustering on the vectors.
- Determine the cluster for the post in question.
- From this cluster, fetch a handful of posts that are different from the post in
question. This will increase diversity.
预处理
code
The naive approach would be to take the post, calculate its similarity to all other
posts, and display the top N most similar posts as links on the page. This will quickly become very costly.
Levenshtein distance (edit distance)