Python - Display rows with repeated values in csv files

Tags:

I have a .csv file with several columns, one of them filled with random numbers and I want to find duplicated values there. In case there are - strange case, but it's what I want to check after all -, I would like to display/store the complete row in which those values are stored.

To make it clear, I have sth like this:

First, Whatever, 230, Whichever, etc
Second, Whatever, 11, Whichever, etc
Third, Whatever, 46, Whichever, etc
Fourth, Whatever, 18, Whichever, etc
Fifth, Whatever, 14, Whichever, etc
Sixth, Whatever, 48, Whichever, etc
Seventh, Whatever, 91, Whichever, etc
Eighth, Whatever, 18, Whichever, etc
Ninth, Whatever, 67, Whichever, etc

And I would like to have:

Fourth, Whatever, 18, Whichever, etc
Eighth, Whatever, 18, Whichever, etc

To find duplicated values, I store that column into a dictionary and I count every key in order to discover how many times they appear.

import csv
from collections import Counter, defaultdict, OrderedDict

with open(file, 'rt') as inputfile:
        data = csv.reader(inputfile)

        seen = defaultdict(set)
        counts = Counter(row[col_2] for row in data)

print "Numbers and times they appear: %s" % counts

And I see

Counter({' 18 ': 2, ' 46 ': 1, ' 67 ': 1, ' 48 ': 1,...})

The problem comes now because I don't manage to link the key with the repetitions and compute it later. If I do

for value in counts:
        if counts > 1:
            print counts

I would be taking only the key, which is not what I want and every value (not to mention that I'm looking to print not only that but the whole line...)

Basically I'm looking for a way of doing

If there's a repeated number:
        print rows containing those number
else
        print "No repetitions"

Thanks in advance.

814

asked Jul 11 '14 12:07

Informatico_Sano

1 Answers

try this may work for you.

entries = []
duplicate_entries = []
with open('in.txt', 'r') as my_file:
    for line in my_file:
        columns = line.strip().split(',')
        if columns[2] not in entries:
            entries.append(columns[2])
        else:
            duplicate_entries.append(columns[2]) 

if len(duplicate_entries) > 0:
    with open('out.txt', 'w') as out_file:
        with open('in.txt', 'r') as my_file:
            for line in my_file:
                columns = line.strip().split(',')
                if columns[2] in duplicate_entries:
                    print line.strip()
                    out_file.write(line)
else:
    print "No repetitions"

answered Sep 22 '22 21:09

Sar009

Related questions
                            
                                Django query with variable number of filter arguments
                            
                                Search numpy array ((x, y, z)...) for z matching nearest x, y
                            
                                Generate the MS word document in django
                            
                                Python Pandas Replace Special Character
                            
                                Is it possible to choose specific network interface to transmit data in Python?
                            
                                Logging events in django application
                            
                                Numpy equivalent of dot(A,B,3)
                            
                                using subprocess.call with mysqldump
                            
                                redis server for windows with use for python3 [duplicate]
                            
                                pip does not honor PIP_INDEX_URL with sudo
                            
                                3D numpy array into block diagonal matrix
                            
                                Python - Convolution with a Gaussian
                            
                                NumPy: Evaulate index array during vectorized assignment
                            
                                Ignore NaN in numpy bincount in python
                            
                                Python: Requests Session Login Cookies
                            
                                Python list subtraction [duplicate]
                            
                                Installing external or unverified packages with Pip on Elastic Beanstalk
                            
                                Collaborative filtering in Python
                            
                                Python subprocess call with whitespaces in arguments doesn't work on Windows
                            
                                Matplotlib: set superscript font size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - Display rows with repeated values in csv files

Tags:

python

dictionary

csv

Informatico_Sano

People also ask

1 Answers

Sar009

Recent Activity

Donate For Us