Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most Pythonic way to read CSV values into dict of lists

I have a CSV file with headers at the top of columns of data as:

a,b,c
1,2,3
4,5,6
7,8,9

and I need to read it in a dict of lists:

desired_result = {'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]}

When reading this with DictReader I am using a nested loop to append the items to the lists:

f = 'path_to_some_csv_file.csv'
dr = csv.DictReader(open(f))
dict_of_lists = dr.next()
for k in dict_of_lists.keys():
    dict_of_lists[k] = [dict_of_lists[k]]
for line in dr:
    for k in dict_of_lists.keys():
        dict_of_lists[k].append(line[k])

The first loop sets all values in the dict to the empty list. The next one loops over every line read in from the CSV file, from which DictReader creates a dict of key-values. The inner loop appends the value to list matching the corresponding key, so I wind up with the desired list of dicts. I end up having to write this fairly often.

My question is, is there a more Pythonic way of doing this using built-in functions without the nested loop, or a better idiom, or an alternative way to store this data structure such that I can return an indexable list by querying with a key? If so is there also a way to format the data being ingested by column upfront?

like image 550
mlh3789 Avatar asked May 05 '14 14:05

mlh3789


People also ask

How do you read a CSV file in a list in Python?

To do so, we use csv. DictReader() . As the name suggests, it parses each row as a dictionary, using the header row to determine column names. If you don't have a header row, you may specify the fieldnames argument.

How do I read a CSV file in a dictionary?

The best way to convert a CSV file to a Python dictionary is to create a CSV file object f using open("my_file. csv") and pass it in the csv. DictReader(f) method. The return value is an iterable of dictionaries, one per row in the CSV file, that maps the column header from the first row to the specific row value.

What is the difference between csv reader and csv DictReader?

csv. Reader() allows you to access CSV data using indexes and is ideal for simple CSV files. csv. DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files.

What is csv DictReader?

CSV, or "comma-separated values", is a common file format for data. The csv module helps you to elegantly process data stored within a CSV file. Also see the csv documentation. This guide uses the following example file, people.


2 Answers

Depending on what type of data you're storing and if you're ok with using numpy, a good way to do this can be with numpy.genfromtxt:

import numpy as np
data = np.genfromtxt('data.csv', delimiter=',', names=True)

What this will do is create a numpy Structured Array, which provides a nice interface for querying the data by header name (make sure to use names=True if you have a header row).

Example, given data.csv containing:

a,b,c
1,2,3
4,5,6
7,8,9

You can then access elements with:

>>> data['a']        # Column with header 'a'
array([ 1.,  4.,  7.])
>>> data[0]          # First row
(1.0, 2.0, 3.0)
>>> data['c'][2]     # Specific element
9.0
>>> data[['a', 'c']] # Two columns
array([(1.0, 3.0), (4.0, 6.0), (7.0, 9.0)],
      dtype=[('a', '<f8'), ('c', '<f8')])

genfromtext also provides a way, as you requested, to "format the data being ingested by column up front."

converters : variable, optional

The set of functions that convert the data of a column to a value. The converters can also be used to provide a default value for missing data: converters = {3: lambda s: float(s or 0)}.

like image 61
ford Avatar answered Sep 24 '22 03:09

ford


If you're willing to use a third-party library, then the merge_with function from Toolz makes this whole operation a one-liner:

dict_of_lists = merge_with(list, *csv.DictReader(open(f)))

Using only the stdlib, a defaultdict makes the code less repetitive:

from collections import defaultdict
import csv

f = 'test.csv'

dict_of_lists = defaultdict(list)
for record in DictReader(open(f)):
    for key, val in record.items():    # or iteritems in Python 2
        dict_of_lists[key].append(val)

If you need to do this often, factor it out into a function, e.g. transpose_csv.

like image 30
Fred Foo Avatar answered Sep 25 '22 03:09

Fred Foo