Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Display rows with repeated values in csv files

I have a .csv file with several columns, one of them filled with random numbers and I want to find duplicated values there. In case there are - strange case, but it's what I want to check after all -, I would like to display/store the complete row in which those values are stored.

To make it clear, I have sth like this:

First, Whatever, 230, Whichever, etc
Second, Whatever, 11, Whichever, etc
Third, Whatever, 46, Whichever, etc
Fourth, Whatever, 18, Whichever, etc
Fifth, Whatever, 14, Whichever, etc
Sixth, Whatever, 48, Whichever, etc
Seventh, Whatever, 91, Whichever, etc
Eighth, Whatever, 18, Whichever, etc
Ninth, Whatever, 67, Whichever, etc

And I would like to have:

Fourth, Whatever, 18, Whichever, etc
Eighth, Whatever, 18, Whichever, etc

To find duplicated values, I store that column into a dictionary and I count every key in order to discover how many times they appear.

import csv
from collections import Counter, defaultdict, OrderedDict

with open(file, 'rt') as inputfile:
        data = csv.reader(inputfile)

        seen = defaultdict(set)
        counts = Counter(row[col_2] for row in data)

print "Numbers and times they appear: %s" % counts

And I see

Counter({' 18 ': 2, ' 46 ': 1, ' 67 ': 1, ' 48 ': 1,...})

The problem comes now because I don't manage to link the key with the repetitions and compute it later. If I do

for value in counts:
        if counts > 1:
            print counts

I would be taking only the key, which is not what I want and every value (not to mention that I'm looking to print not only that but the whole line...)

Basically I'm looking for a way of doing

If there's a repeated number:
        print rows containing those number
else
        print "No repetitions"

Thanks in advance.

like image 814
Informatico_Sano Avatar asked Jul 11 '14 12:07

Informatico_Sano


People also ask

How do I find duplicate values in a CSV file in Python?

Method 1: Read the csv file and pass it into the data frame. Then, identify the duplicate rows using the duplicated() function. Finally, use the print statement to display the duplicate rows.

How do I find duplicates in a csv file?

To find duplicate values in a column, click the column header and select Histogram. This will count how many many times each value appears in the dataset. You can search the Histogram for values that show up more than once.

How do I show a row in a CSV file in Python?

Step 1: In order to read rows in Python, First, we need to load the CSV file in one object. So to load the csv file into an object use open() method. Step 2: Create a reader object by passing the above-created file object to the reader function. Step 3: Use for loop on reader object to get each row.

Which function is used to write multiple rows to a CSV file?

The most common method to write data from a list to CSV file is the writerow() method of writer and DictWriter class.


1 Answers

try this may work for you.

entries = []
duplicate_entries = []
with open('in.txt', 'r') as my_file:
    for line in my_file:
        columns = line.strip().split(',')
        if columns[2] not in entries:
            entries.append(columns[2])
        else:
            duplicate_entries.append(columns[2]) 

if len(duplicate_entries) > 0:
    with open('out.txt', 'w') as out_file:
        with open('in.txt', 'r') as my_file:
            for line in my_file:
                columns = line.strip().split(',')
                if columns[2] in duplicate_entries:
                    print line.strip()
                    out_file.write(line)
else:
    print "No repetitions"
like image 78
Sar009 Avatar answered Sep 22 '22 21:09

Sar009