Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Removing duplicate tuples in 2 csv files

I have 2 csv files containing sorted tuples of integers

old file.csv

"(1, 2, 3)","(1, 2, 4)","(1, 3, 5)"

new file.csv

"(1, 2, 3)","(1, 2, 4)"

I want to remove common tuples between these two csv files and print the output as final.csv

Expected Output

"(1,3,5)"

Code Attempt A

import csv

with open('old file.csv', newline ='') as myFile_1:  
    reader = csv.reader(myFile_1)
    list_a = list(reader)
    older = [tuple(map(int, i)) for i in list_a]

with open('new file.csv', newline ='') as myFile_2:  
    reader = csv.reader(myFile_2)
    list_b = list(reader)
    newer = [tuple(map(int, i)) for i in list_b]

final_output = older.difference(newer)

csvData = [final_output]

with open('final.csv', 'w') as csvFile:
    writer = csv.writer(csvFile)
    writer.writerows(csvData)

csvFile.close()

Error Type

Exception has occurred: ValueError
invalid literal for int() with base 10: '(1, 2, 3)'

Code Attempt B

import csv

with open('old file.csv', newline ='') as myFile_1:  
    reader = csv.reader(myFile_1)
    list_a = list(reader)
    older = [tuple(map(str, i)) for i in list_a]

with open('new file.csv', newline ='') as myFile_2:  
    reader = csv.reader(myFile_2)
    list_b = list(reader)
    newer = [tuple(map(str, i)) for i in list_b]

final_output = older.difference(newer)

csvData = [final_output]

with open('final.csv', 'w') as csvFile:
    writer = csv.writer(csvFile)
    writer.writerows(csvData)

csvFile.close() 

Error Type

Exception has occurred: AttributeError
'list' object has no attribute 'difference'

This issue arose when I wanted to manipulate csv files and worked pretty well when the data contained in old.csv and new.csv were generated while running the program and were stored as a variable. This works fine when generating smaller data sets but is extremely problematic when generating large data sets.

like image 437
user7970547 Avatar asked Mar 10 '26 09:03

user7970547


1 Answers

I would actually recommend to change data storing strategy and not save raw data structures representation into a csv files.
But if you're not allowed to effect those things - use the following short approach:

import csv
from ast import literal_eval

with open('old_file.csv', newline ='') as f1, open('new_file.csv', newline ='') as f2:
    t1 = literal_eval('{{{}}}'.format(f1.read().replace('"', '')))
    t2 = literal_eval('{{{}}}'.format(f2.read().replace('"', '')))

    final_output = t1 - t2

with open('final.csv', 'w') as csv_result:
    writer = csv.writer(csv_result, delimiter=',', quotechar='"')
    writer.writerow(final_output)
  • literal_eval('{...}' - allows to get a set of tuples at once wrapping the passed argument with set object literal {}

The final final.csv file contents:

"(1, 3, 5)"
like image 198
RomanPerekhrest Avatar answered Mar 12 '26 22:03

RomanPerekhrest



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!