hi i'm searching way store vector in each cell, here's sample dataset.
dt1 <- data.table(id = rep(1:2, each = 3), set.a = c(5,1,3,10,4,7)) dt1 id set.a 1: 1 5 2: 1 1 3: 1 3 4: 2 10 5: 2 4 6: 2 7
now want method convert dt1
this:
id set.a 1: 1 5,1,3 2: 2 10, 4, 7
this question may elementary did consume me hour. in fact it's part of job. suppose have dataset looks like:
dt2 <- data.table(id = rep(1:2, each = 3), set.b = c(3,5,9,8,10,4)) dt id set.b 1: 1 3 2: 1 5 3: 1 9 4: 2 8 5: 2 10 6: 2 4
what i'm looking how compute overlap of each id, i.e, overlap
variable indicate intersetion of set.a
, set.b
each id looks like
id overlap 1: 1 5,3 2: 2 10, 4
to implement this, plan first aggregate set
variable each id vector , calculate intersection, failed store vector in each cell, mentioned @ beginning. there can me out? in advance.
update
i experimented both akrun's , frank's way , found data.table merge way more efficient. here little benchmark, both of :)
dt1 <- data.table(id = rep(1:10000, each = 10), set1 = sample(letters[1:24], 100000, replace = t)) dt2 <- data.table(id = rep(1:10000, each = 10), set2 = sample(letters[1:24], 100000, replace = t)) system.time({ re1 <- rbindlist(list(dt1, dt2), idcol=true)[, .(overlap=tostring(intersect(set1[.id==1], set1[.id==2]))) , =id] }) # 0.25s system.time({ re2 <- dt1[dt2, on = c(id = "id", set1 = "set2"), nomatch = 0][, .(ovlp = list(unique(set1))), = "id"] }) # 0.07s system.time({ dt3 <- dt1[, .(set1 = list(set1)), = id] dt4 <- dt2[, .(set2 = list(set2)), = id] re3 <- dt3[dt4, nomatch = 0, on = "id"][, .(ov = list(intersect(unlist(set1), unlist(set2)))), = id] }) # 0.21s
i use merge:
res <- merge( dt1[, .(a = list(set.a)), by=id], # <- answer first question dt2[, .(b = list(set.b)), by=id], by="id" )[, overlap := .(map(intersect, a, b)) ] # id b overlap # 1: 1 5,1,3 3,5,9 5,3 # 2: 2 10, 4, 7 8,10, 4 10, 4
if speed concern, suspect faster:
dt1[dt2, on = c(id = "id", set.a = "set.b"), nomatch=0][, .(overlap = list(set.a)) , by=id] # id overlap # 1: 1 3,5 # 2: 2 10, 4
dt1[dt2,...]
kind of merge/join.
Comments
Post a Comment