I am working with a poorly-formed CSV file; it has duplicate fieldnames. <code>csv.DictReader</code> just overwrites the first column with the same name with the contents of the second column with the same name. But I need both contents of columns with duplicate name. I can't assign the <code>DictReader.fieldnames</code> parameter directly. There are about one hundred columns and every time it would be different number of columns, e.g.: <pre class="prettyprint"><code>product, price1, price2, price1,...,price100 car, 100, 300, 200,...,350 </code></pre> output: <code>{'product':'car', 'price1': 200, 'price2':300}</code> I need: <code>{'product':'car', 'price1': 100, 'price2':300, 'price3': 200}</code> What is the way to do it?

Don't use a <code>DictReader()</code> in this case. Stick to a regular reader instead. You can always map to a dictionary based on a re-mapped list of fieldnames: <pre class="prettyprint"><code>with open(filename, 'rb') as csvfile: reader = csv.reader(csvfile) fieldnames = remap(next(reader)) for row in reader: row = dict(zip(fieldnames, row)) </code></pre> where the <code>remap()</code> function could either renumber your numbered columns or append extra information if column names are duplicated. Re-numbering could be as easy as: <pre class="prettyprint"><code>from itertools import count def remap(fieldnames): price_count = count(1) return ['price{}'.format(next(price_count)) if f.startswith('price') else f for f in fieldnames] </code></pre>

How to handle csv file with duplicate fieldnames when reading with csv.DictReader?

Tags:

python

csv

I am working with a poorly-formed CSV file; it has duplicate fieldnames.

csv.DictReader just overwrites the first column with the same name with the contents of the second column with the same name. But I need both contents of columns with duplicate name.

I can't assign the DictReader.fieldnames parameter directly. There are about one hundred columns and every time it would be different number of columns, e.g.:

product, price1, price2, price1,...,price100
car, 100, 300, 200,...,350

output: {'product':'car', 'price1': 200, 'price2':300}

I need: {'product':'car', 'price1': 100, 'price2':300, 'price3': 200}

What is the way to do it?

271

asked Aug 02 '15 11:08

Eugene Alkhouski

1 Answers

Don't use a DictReader() in this case. Stick to a regular reader instead.

You can always map to a dictionary based on a re-mapped list of fieldnames:

with open(filename, 'rb') as csvfile:
    reader = csv.reader(csvfile)
    fieldnames = remap(next(reader))
    for row in reader:
        row = dict(zip(fieldnames, row))

where the remap() function could either renumber your numbered columns or append extra information if column names are duplicated.

Re-numbering could be as easy as:

from itertools import count

def remap(fieldnames):
    price_count = count(1)
    return ['price{}'.format(next(price_count)) if f.startswith('price') else f
            for f in fieldnames]

156

answered Oct 30 '22 12:10

Martijn Pieters

Related questions
                            
                                Colorful Python Syntax in vim?
                            
                                How to capitalize a string in Python? [duplicate]
                            
                                Permission denied when i try to execute a python script from bash? [duplicate]
                            
                                Python3 Django -> HTML to PDF
                            
                                In Python (2.7), why is os.remove not identical to os.unlink?
                            
                                Fastest way to write large CSV with Python
                            
                                Json Encoder AND Decoder for complex numpy arrays
                            
                                Can I delete the django migration files inside migrations directory
                            
                                Is there a way in python to execute all functions in a file without explicitly calling them?
                            
                                How to get around the pickling error of python multiprocessing without being in the top-level?
                            
                                List of dictionaries with comprehension in python
                            
                                How to find out wether a word exists in english using nltk
                            
                                Check if python script is running on an aws instance
                            
                                How do I make a contextmanager with a loop inside?
                            
                                PySpark reduceByKey? to add Key/Tuple
                            
                                How to represent month as field on django model
                            
                                NetworkX - How to change the shape of the node?
                            
                                Rate-limiting python decorator
                            
                                How to make python script press 'enter' when prompted on Shell
                            
                                Writing to HTML5 localStorage from python/Flask app

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With