Most Pythonic way to read CSV values into dict of lists

Tags:

I have a CSV file with headers at the top of columns of data as:

a,b,c
1,2,3
4,5,6
7,8,9

and I need to read it in a dict of lists:

desired_result = {'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]}

When reading this with DictReader I am using a nested loop to append the items to the lists:

f = 'path_to_some_csv_file.csv'
dr = csv.DictReader(open(f))
dict_of_lists = dr.next()
for k in dict_of_lists.keys():
    dict_of_lists[k] = [dict_of_lists[k]]
for line in dr:
    for k in dict_of_lists.keys():
        dict_of_lists[k].append(line[k])

The first loop sets all values in the dict to the empty list. The next one loops over every line read in from the CSV file, from which DictReader creates a dict of key-values. The inner loop appends the value to list matching the corresponding key, so I wind up with the desired list of dicts. I end up having to write this fairly often.

My question is, is there a more Pythonic way of doing this using built-in functions without the nested loop, or a better idiom, or an alternative way to store this data structure such that I can return an indexable list by querying with a key? If so is there also a way to format the data being ingested by column upfront?

550

asked May 05 '14 14:05

mlh3789

2 Answers

Depending on what type of data you're storing and if you're ok with using numpy, a good way to do this can be with numpy.genfromtxt:

import numpy as np
data = np.genfromtxt('data.csv', delimiter=',', names=True)

What this will do is create a numpy Structured Array, which provides a nice interface for querying the data by header name (make sure to use names=True if you have a header row).

Example, given data.csv containing:

a,b,c
1,2,3
4,5,6
7,8,9

You can then access elements with:

>>> data['a']        # Column with header 'a'
array([ 1.,  4.,  7.])
>>> data[0]          # First row
(1.0, 2.0, 3.0)
>>> data['c'][2]     # Specific element
9.0
>>> data[['a', 'c']] # Two columns
array([(1.0, 3.0), (4.0, 6.0), (7.0, 9.0)],
      dtype=[('a', '<f8'), ('c', '<f8')])

genfromtext also provides a way, as you requested, to "format the data being ingested by column up front."

converters : variable, optional

The set of functions that convert the data of a column to a value. The converters can also be used to provide a default value for missing data: converters = {3: lambda s: float(s or 0)}.

answered Sep 24 '22 03:09

ford

If you're willing to use a third-party library, then the merge_with function from Toolz makes this whole operation a one-liner:

dict_of_lists = merge_with(list, *csv.DictReader(open(f)))

Using only the stdlib, a defaultdict makes the code less repetitive:

from collections import defaultdict
import csv

f = 'test.csv'

dict_of_lists = defaultdict(list)
for record in DictReader(open(f)):
    for key, val in record.items():    # or iteritems in Python 2
        dict_of_lists[key].append(val)

If you need to do this often, factor it out into a function, e.g. transpose_csv.

answered Sep 25 '22 03:09

Fred Foo

Related questions
                            
                                How to use pandas to group pivot table results by week?
                            
                                I can not connect to https waitress wsgi server
                            
                                Choose adapter dynamically depending on librarie(s) installed
                            
                                Linking and Loading in interpreted languages
                            
                                zero padding numpy array
                            
                                Is it bad practice in Python to define a function in the middle of operational code? [closed]
                            
                                Pandas OR statement ending in series contains
                            
                                Creating Custom user registration form Django
                            
                                Porting invRegex.py to Javascript (Node.js)
                            
                                Python reading and writing to tty
                            
                                Override signup view django-allauth
                            
                                python manage.py build_solr_schema giving ImportError: No module named markup
                            
                                Fourier transform a trig function in Sympy returns unexpected result
                            
                                python can't set attributes of built-in/extension type 'object'
                            
                                Does begin_nested() automatically rollback/commit?
                            
                                Bandpass butterworth filter frequencies in scipy
                            
                                Pytest 2.5.2 coverage reports missing lines which must have been processed
                            
                                pendant to inline formsets for many-to-many relations
                            
                                read README in setup.py
                            
                                How does Python 2.7.3 hash strings used to seed random number generators?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Most Pythonic way to read CSV values into dict of lists

Tags:

python

dictionary

list

csv

mlh3789

People also ask

2 Answers

ford

Fred Foo

Recent Activity

Donate For Us