Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I read only the header column of a CSV file using Python?

Tags:

python

pandas

csv

I am looking for a a way to read just the header row of a large number of large CSV files.

Using Pandas, I have this method available, for each csv file:

>>> df = pd.read_csv(PATH_TO_CSV) >>> df.columns 

I could do this with just the csv module:

>>> reader = csv.DictReader(open(PATH_TO_CSV)) >>> reader.fieldnames 

The problem with these is that each CSV file is 500MB+ in size, and it seems to be a gigantic waste to read in the entire file of each just to pull the header lines.

My end goal of all of this is to pull out unique column names. I can do that once I have a list of column headers that are in each of these files.

How can I extract only the header row of a CSV file, quickly?

like image 683
Andy Avatar asked Jul 25 '14 19:07

Andy


People also ask

How do I read a specific column in a CSV file in Python?

This can be done with the help of the pandas. read_csv() method. We will pass the first parameter as the CSV file and the second parameter the list of specific columns in the keyword usecols. It will return the data of the CSV file of specific columns.

How do I read a CSV file in Python without column names?

To read CSV file without header, use the header parameter and set it to “None” in the read_csv() method.


1 Answers

Expanding on the answer given by Jeff It is now possbile to use pandas without actually reading any rows.

In [1]: import pandas as pd In [2]: import numpy as np In [3]: pd.DataFrame(np.random.randn(10, 4), columns=list('abcd')).to_csv('test.csv', mode='w')  In [4]: pd.read_csv('test.csv', index_col=0, nrows=0).columns.tolist() Out[4]: ['a', 'b', 'c', 'd'] 

pandas can have the advantage that it deals more gracefully with CSV encodings.

like image 60
Jarno Avatar answered Oct 14 '22 01:10

Jarno