Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare entries in columns from CSV files and extract matched - python

I have two CSV files (three columns) which I need to compare and extract rows from other file (five columns) that matches. The example for files are:

File1:

ATGCGCGACAGT, ch3, 123546
ATGCATACAGGATAT, ch2, 5141561615

......so on approx 100 entries

File2:

ATGCGGCGACAGT,ch3, 123456,mi141515, AUCAGCUAUAUAU, UACGCAGAUAUAUA
ATCAGACGATTATGA, ch4, 4564764, mi653453, AUCAGCAAUUUUCG, AUACAGACAAAAA

....so on approx 50000 entries

I need to match the column 1,2 and 3 for both the files in such a way that all three columns of file1 should match with file2. If so happens than extract 4,5 and 6 columns for further processing.

I was thinking of:

fhout=csv.writer(open('parsed_out', 'w'), delimiter=',')

for i in file1:

     a=[0]
     b=[1]
     c=[2]
      for x in file2:
       d=[0]
       e=[1]
       f=[2]
       g=[3]
       h=[4]
       i=[5]
         if a==d and b==e and c==f:
           fhout.writerow([g]+[h]+[i])
         else:
           pass

But somebody told me that I can use hashing or some better way rather writing such big loops for 10,000 or more entries in file1

Please suggest me better way to achieve this. Both file 1 and file 2 are parsed from more complex files.

like image 855
Bade Avatar asked Dec 30 '25 20:12

Bade


1 Answers

Below creates a hash using a set comprehension for the first file as you suggest:

S = {tuple(line) for line in csv.reader(File1)}

Then when reading the second file the lookups are much faster.

for line in csv.reader(File2):
    key = tuple(line[:3])
    if key in S:
        print(line)
like image 84
Mark Tolonen Avatar answered Jan 01 '26 12:01

Mark Tolonen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!