Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Usecols when columns exist in csv

Tags:

python

pandas

csv

Since the columns and list of usecols are different, it spits the error

"ValueError" Usecols do not match names.

How can I 'usecol' if columns exist in csv?

csv sample:

df.csv

AB,CD,EF,GH
foo,20160101,a,1
foo,20160102,a,3
foo,20160103,a,5

reading csv:

import pandas as pd


df = pd.read_csv('df.csv', 
    header=0,usecols=["AB", "CD", "IJ"])

This is what I'd like to get:

df

date       AB   CD
2016-01-01  a    1
2016-01-02  a    3
2016-01-03  a    5

Ignored "IJ".

like image 227
Lcy Avatar asked Oct 22 '25 15:10

Lcy


1 Answers

Use lambda in usecols to skip columns that not in csv:

import pandas as pd
from io import StringIO

txt = """AB,CD,EF,GH
foo,20160101,a,1
foo,20160102,a,3
foo,20160103,a,5"""

usecols = ['AB', 'CD', 'IJ']

df = pd.read_csv(StringIO(txt), usecols=lambda c: c in set(usecols))

print(df)

    AB        CD
0  foo  20160101
1  foo  20160102
2  foo  20160103

An explanation can be found in the pandas docs:

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.

like image 97
Alex Avatar answered Oct 25 '25 11:10

Alex