python - 去除重復(fù)行并保留得分最高的行
問題描述
第一列(scaffold這一列)相同,則根據(jù)第AS列(AS:i:xx)數(shù)字 xx的大小,保留數(shù)字最大的行。如果數(shù)字大小相同則都保留。
舉例,輸入文件
scaffold_010679_1AL.2 16 chr1A 429400034 119 3272M * GACACAAGAGACTCTTTG * AS:i:3268 XS:i:2147 XF:i:0 XE:i:29 NM:i:1scaffold_010679_1AL.2 16 chr1A 429400034 119 3272M * GACACAAGAGACTCTTTG * AS:i:3268 XS:i:2147 XF:i:0 XE:i:29 NM:i:1 scaffold_010679_1AL.2 16 chr1A 429400034 119 3272M * GACACAAGAGACTCTTTG * AS:i:1268 XS:i:2147 XF:i:0 XE:i:29 NM:i:1scaffold_010679_1AL.3 16 chr1A 429397743 19 599S1730M1I279M * 0 0 TGCCGAGGTTTTTGA * AS:i:1998 XS:i:1877 XF:i:3 XE:i:20 NM:i:2 XN:i:1scaffold_010679_1AL.3 16 chr1A 429397743 19 599S1730M1I279M * 0 0 TGCCGAGGTTTTTGA * AS:i:1098 XS:i:1877 XF:i:3 XE:i:20 NM:i:2 XN:i:1
結(jié)果文件
scaffold_010679_1AL.2 16 chr1A 429400034 119 3272M * GACACAAGAGACTCTTTG * AS:i:3268 XS:i:2147 XF:i:0 XE:i:29 NM:i:1scaffold_010679_1AL.2 16 chr1A 429400034 119 3272M * GACACAAGAGACTCTTTG * AS:i:3268 XS:i:2147 XF:i:0 XE:i:29 NM:i:1 scaffold_010679_1AL.3 16 chr1A 429397743 19 599S1730M1I279M * 0 0 TGCCGAGGTTTTTGA * AS:i:1998 XS:i:1877 XF:i:3 XE:i:20 NM:i:2 XN:i:1
問題解答
回答1:# coding: utf-8from itertools import groupbywith open(’a.txt’) as f: data = [line for line in f] #因?yàn)閿?shù)據(jù)的列數(shù)不相同, 只能以AS:i:為開頭來識別 #取第一列為key, AS:i:列為value lst = [(l.split()[0], _) for l in data for _ in l.split() if _.startswith(’AS:i:’)]#找出同key下的max(value) max_lst = [max(list(g)) for k, g in groupby(lst, lambda x: x[0])]#從原數(shù)據(jù)里找到同時(shí)包含key和value的行 print [line for line in data for _ in max_lst if _[0] in line and _[1] in line]回答2:
awk ’{n=gensub('.*AS:i:([0-9]+).*','1','g')}n>=k[$1]{c[$1]=n==k[$1]?c[$1]'n'$0:$0;k[$1]=n}END{for(i in c)print c[i]}’ file回答3:
grep '`sort -r -t '*' -k 3 b.txt | head -1 |awk -F '*' ’{split($3,a,' ');print a[1]}’`' b.txt
思路文件按星號*分列分3列,按照第三列降序排序,取出第一行,取出AS:i:最大數(shù),grep搜索之,得到結(jié)果。
是我沒仔細(xì)看提問,失誤了~~結(jié)果不對
相關(guān)文章:
1. android - weex 項(xiàng)目createInstanceReferenceError: Vue is not defined2. javascript - 如圖,百度首頁,查看源代碼為什么什么都沒有?3. 網(wǎng)頁爬蟲 - python requests爬蟲,如何post payload4. npm鏡像站全新上線5. html - 關(guān)于CSS實(shí)現(xiàn)border的0.5px設(shè)置?6. PHPExcel表格導(dǎo)入數(shù)據(jù)庫怎么導(dǎo)入7. android - 哪位大神知道java后臺(tái)的api接口的對象傳到前端后輸入日期報(bào)錯(cuò),是什么情況?求大神指點(diǎn)8. pdo 寫入到數(shù)據(jù)庫的內(nèi)容為中文的時(shí)候?qū)懭雭y碼9. PHP類封裝的插入數(shù)據(jù),總是插入不成功,返回false;10. vue2.0+webpack 如何使用bootstrap?
