Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load only specific columns from csv file into a DataFrame

Tags:

python

pandas

csv

Suppose I have a csv file with 400 columns. I cannot load the entire file into a DataFrame (won't fit in memory). However, I only really want 50 columns, and this will fit in memory. I don't see any built in Pandas way to do this. What do you suggest? I'm open to using the PyTables interface, or pandas.io.sql.

The best-case scenario would be a function like: pandas.read_csv(...., columns=['name', 'age',...,'income']). I.e. we pass a list of column names (or numbers) that will be loaded.

like image 679
Ian Langmore Avatar asked Nov 05 '12 16:11

Ian Langmore


People also ask

How do I make a data frame with only certain columns?

You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.

How do I read a specific column in a CSV file in pandas?

Use pandas. read_csv() to read a specific column from a CSV file. To read a CSV file, call pd. read_csv(file_name, usecols=cols_list) with file_name as the name of the CSV file, delimiter as the delimiter, and cols_list as the list of specific columns to read from the CSV file.

How do I extract a specific data from a CSV file?

Step 1) To read data from CSV files, you must use the reader function to generate a reader object. The reader function is developed to take each row of the file and make a list of all columns. Then, you have to choose the column you want the variable data for.


1 Answers

Ian, I implemented a usecols option which does exactly what you describe. It will be in upcoming pandas 0.10; development version will be available soon.


Since 0.10, you can use usecols like

df = pd.read_csv(...., usecols=['name', 'age',..., 'income'])
like image 96
Wes McKinney Avatar answered Sep 28 '22 02:09

Wes McKinney