Retrieving top k similar rows in a matrix for each row via cosine similarity in R -


how efficiently retrieve top k-similar vectors cosine similarity using r? asks how calculate top similar vectors each vector of 1 matrix, relative matrix. it's satisfactorily answered, , i'd tweak operate on single matrix.

that is, i'd top k similar other rows each row in matrix. suspect solution similar, can optimized.

this function based on linked answer:

cosinesimilarities <- function(m, top.k) {   # computes cosine similarity between each row , other rows in matrix.   #   # args:   #   m: matrix of values.   #   top.k: number of top rows show each row.   #   # returns:   #   data frame columns pair of rows, , cosine similarity, top   #   `top.k` rows per row.   #      # similarity computation   cp <- tcrossprod(m)   mm <- rowsums(m ^ 2)   result <- cp / sqrt(outer(mm, mm))   # top similar rows train (per row)   # use `top.k + 1` remove self-reference (similarity = 1)   top <- apply(result, 2, order, decreasing=true)[seq(top.k + 1), ]   result.df <- data.frame(row.id1=c(col(top)), row.id2=c(top))   result.df$cosine.similarity <- result[as.matrix(result.df[, 2:1])]   # remove same-row records , return   return(result.df[result.df$row.id1 != result.df$row.id2, ]) } 

for example:

(m <- matrix(1:9, nrow=3)) #      [,1] [,2] [,3] # [1,]    1    4    7 # [2,]    2    5    8 # [3,]    3    6    9 cosinesimilarities(m, 1) #   row.id1 row.id2 cosine.similarity # 2       1       2            0.9956 # 4       2       3            0.9977 # 6       3       2            0.9977 

Comments