how efficiently retrieve top k-similar vectors cosine similarity using r? asks how calculate top similar vectors each vector of 1 matrix, relative matrix. it's satisfactorily answered, , i'd tweak operate on single matrix.
that is, i'd top k similar other rows each row in matrix. suspect solution similar, can optimized.
this function based on linked answer:
cosinesimilarities <- function(m, top.k) { # computes cosine similarity between each row , other rows in matrix. # # args: # m: matrix of values. # top.k: number of top rows show each row. # # returns: # data frame columns pair of rows, , cosine similarity, top # `top.k` rows per row. # # similarity computation cp <- tcrossprod(m) mm <- rowsums(m ^ 2) result <- cp / sqrt(outer(mm, mm)) # top similar rows train (per row) # use `top.k + 1` remove self-reference (similarity = 1) top <- apply(result, 2, order, decreasing=true)[seq(top.k + 1), ] result.df <- data.frame(row.id1=c(col(top)), row.id2=c(top)) result.df$cosine.similarity <- result[as.matrix(result.df[, 2:1])] # remove same-row records , return return(result.df[result.df$row.id1 != result.df$row.id2, ]) }
for example:
(m <- matrix(1:9, nrow=3)) # [,1] [,2] [,3] # [1,] 1 4 7 # [2,] 2 5 8 # [3,] 3 6 9 cosinesimilarities(m, 1) # row.id1 row.id2 cosine.similarity # 2 1 2 0.9956 # 4 2 3 0.9977 # 6 3 2 0.9977
Comments
Post a Comment