i have large file 7 columns, compare 2 columns, col 1 , col 7
chr_locations(col 1) gene_name(col 7) chr1:66997989-67000678 genea chr1:66997824-67000456 genea chr2:33544389-33548489 geneb chr2:33546285-33547055 geneb chr2:44567890-44568980 geneb
i count occurrences of chromosomal locations given gene:
chr1:66997989-67000678 genea 2 chr1:66997824-67000456 genea 2 chr2:33544389-33548489 geneb 3 chr2:33546285-33547055 geneb 3 chr2:44567890-44568980 geneb 3
i sure there easier way in awk writing script in python, can of help? thanks.
with both languages easy (any language really).... depends on knowledge
awk
awk '{ count[$7]++; memory_1[nr] = $1; memory_7[nr] = $7; } end{ for(i=1; i<=nr; ++i) print memory_1[i] ofs memory_7[i] ofs count[memory_7[i]] }' file
python
records = [line.split() line in open("file").readlines()] collections import counter count = counter(r[6] r in records) print "\n".join("\t".join((r[0], r[6], str(count[r[6]]))) r in records)
you get:
chr1:66997989-67000678 genea 2 chr1:66997824-67000456 genea 2 chr2:33544389-33548489 geneb 3 chr2:33546285-33547055 geneb 3 chr2:44567890-44568980 geneb 3
Comments
Post a Comment