How can I read only the header column of a CSV file using Python?

Tags:

I am looking for a a way to read just the header row of a large number of large CSV files.

Using Pandas, I have this method available, for each csv file:

>>> df = pd.read_csv(PATH_TO_CSV) >>> df.columns

I could do this with just the csv module:

>>> reader = csv.DictReader(open(PATH_TO_CSV)) >>> reader.fieldnames

The problem with these is that each CSV file is 500MB+ in size, and it seems to be a gigantic waste to read in the entire file of each just to pull the header lines.

My end goal of all of this is to pull out unique column names. I can do that once I have a list of column headers that are in each of these files.

How can I extract only the header row of a CSV file, quickly?

683

asked Jul 25 '14 19:07

Andy

1 Answers

Expanding on the answer given by Jeff It is now possbile to use pandas without actually reading any rows.

In [1]: import pandas as pd In [2]: import numpy as np In [3]: pd.DataFrame(np.random.randn(10, 4), columns=list('abcd')).to_csv('test.csv', mode='w')  In [4]: pd.read_csv('test.csv', index_col=0, nrows=0).columns.tolist() Out[4]: ['a', 'b', 'c', 'd']

pandas can have the advantage that it deals more gracefully with CSV encodings.

answered Oct 14 '22 01:10

Jarno

Related questions
                            
                                psycopg2 leaking memory after large query
                            
                                getting list without k'th element efficiently and non-destructively
                            
                                Creating a dictionary from a CSV file
                            
                                Getting the r-squared value using curve_fit
                            
                                How to unlock a "secured" (read-protected) PDF in Python?
                            
                                Pip for Python 3.8
                            
                                Django form multiple choice
                            
                                If statement for strings in python? [duplicate]
                            
                                Legend not showing up in Matplotlib stacked area plot
                            
                                Difference between dictionary and OrderedDict
                            
                                Cleanest way to obtain the numeric prefix of a string
                            
                                How do I persist to disk a temporary file using Python?
                            
                                random.randint error
                            
                                how to kill process and child processes from python?
                            
                                How to check if a string is a valid python identifier? including keyword check?
                            
                                Database on the fly with scripting languages
                            
                                How to save an Excel worksheet as CSV
                            
                                Passing integer lists to python
                            
                                How to convert numbers to words without using num2word library?
                            
                                Bulk insert huge data into SQLite using Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I read only the header column of a CSV file using Python?

Tags:

python

pandas

csv

Andy

People also ask

1 Answers

Jarno

Recent Activity

Donate For Us