unix - count the occurrences of value in column 1 for each string in column 2 using awk -


i have large file 7 columns, compare 2 columns, col 1 , col 7

 chr_locations(col 1)        gene_name(col 7) chr1:66997989-67000678        genea chr1:66997824-67000456        genea chr2:33544389-33548489        geneb chr2:33546285-33547055        geneb chr2:44567890-44568980        geneb 

i count occurrences of chromosomal locations given gene:

chr1:66997989-67000678    genea     2 chr1:66997824-67000456    genea     2 chr2:33544389-33548489    geneb     3 chr2:33546285-33547055    geneb     3 chr2:44567890-44568980    geneb     3 

i sure there easier way in awk writing script in python, can of help? thanks.

with both languages easy (any language really).... depends on knowledge

awk

awk '{     count[$7]++;      memory_1[nr] = $1;      memory_7[nr] = $7; }  end{     for(i=1; i<=nr; ++i) print memory_1[i] ofs memory_7[i] ofs count[memory_7[i]] }' file 

python

records = [line.split() line in open("file").readlines()] collections import counter count = counter(r[6] r in records) print "\n".join("\t".join((r[0], r[6], str(count[r[6]]))) r in records) 

you get:

 chr1:66997989-67000678  genea   2 chr1:66997824-67000456  genea   2 chr2:33544389-33548489  geneb   3 chr2:33546285-33547055  geneb   3 chr2:44567890-44568980  geneb   3 

Comments