Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the pythonic way to read CSV file data as rows of namedtuples?

What is the best way to take a data file that contains a header row and read this row into a named tuple so that the data rows can be accessed by header name?

I was attempting something like this:

import csv
from collections import namedtuple

with open('data_file.txt', mode="r") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", ", ".join(i for i in reader[0]))
    next(reader)
    for row in reader:
        data = Data(*row)

The reader object is not subscriptable, so the above code throws a TypeError. What is the pythonic way to reader a file header into a namedtuple?

like image 733
drbunsen Avatar asked Jan 25 '12 17:01

drbunsen


People also ask

How do I read a csv file in a row wise in Python?

Step 1: In order to read rows in Python, First, we need to load the CSV file in one object. So to load the csv file into an object use open() method. Step 2: Create a reader object by passing the above-created file object to the reader function. Step 3: Use for loop on reader object to get each row.

What is the best way to read a csv file in Python?

Read A CSV File Using Python There are two common ways to read a . csv file when using Python. The first by using the csv library, and the second by using the pandas library.

How do I find out how many rows a csv file has?

Using len() function Under this method, we need to read the CSV file using pandas library and then use the len() function with the imported CSV file, which will return an int value of a number of lines/rows present in the CSV file.


2 Answers

Use:

Data = namedtuple("Data", next(reader))

and omit the line:

next(reader)

Combining this with an iterative version based on martineau's comment below, the example becomes for Python 2

import csv
from collections import namedtuple
from itertools import imap

with open("data_file.txt", mode="rb") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in imap(Data._make, reader):
        print data.foo
        # ...further processing of a line...

and for Python 3

import csv
from collections import namedtuple

with open("data_file.txt", newline="") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in map(Data._make, reader):
        print(data.foo)
        # ...further processing of a line...
like image 113
Sven Marnach Avatar answered Oct 26 '22 19:10

Sven Marnach


Please have a look at csv.DictReader. Basically, it provides the ability to get the column names from the first row as you're looking for and, after that, lets you access to each column in a row by name using a dictionary.

If for some reason you still need to access the rows as a collections.namedtuple, it should be easy to transform the dictionaries to named tuples as follows:

with open('data_file.txt') as infile:
    reader = csv.DictReader(infile)
    Data = collections.namedtuple('Data', reader.fieldnames)
    tuples = [Data(**row) for row in reader]
like image 25
jcollado Avatar answered Oct 26 '22 19:10

jcollado