get dataframe slice with list of column names where not all columns are in dataframe

Question

consider df

df = pd.DataFrame(np.ones((2, 3)), columns=list('abc'))
df

enter image description here

col_list = list('bcd')

df[col_list]

generates an error

KeyError: "['d'] not in index"

How do I get as many of the columns as I can?

enter image description here

MaxU - stop WAR against UA · Accepted Answer

what about using Index.intersection()?

In [69]: df[df.columns.intersection(col_list)]
Out[69]:
     b    c
0  1.0  1.0
1  1.0  1.0

In [70]: df.columns
Out[70]: Index(['a', 'b', 'c'], dtype='object')  # <---------- Index

Timing:

In [21]: df_ = pd.concat([df] * 10**5, ignore_index=True)

In [22]: df_.shape
Out[22]: (200000, 3)

In [23]: df.columns
Out[23]: Index(['a', 'b', 'c'], dtype='object')

In [24]: col_list = list('bcd')

In [28]: %timeit df_[df_.columns.intersection(col_list)]
100 loops, best of 3: 6.24 ms per loop

In [29]: %timeit df_[[col for col in col_list if col in df_.columns]]
100 loops, best of 3: 5.69 ms per loop

let's test it on transposed DF (3 rows, 200K columns):

In [30]: t = df_.T

In [31]: t.shape
Out[31]: (3, 200000)

In [32]: t
Out[32]:
   0       1       2       3       4        ...    199995  199996  199997  199998  199999
a     1.0     1.0     1.0     1.0     1.0   ...       1.0     1.0     1.0     1.0     1.0
b     1.0     1.0     1.0     1.0     1.0   ...       1.0     1.0     1.0     1.0     1.0
c     1.0     1.0     1.0     1.0     1.0   ...       1.0     1.0     1.0     1.0     1.0

[3 rows x 200000 columns]

In [33]: col_list=[-10, -20, 10, 20, 100]

In [34]: %timeit t[t.columns.intersection(col_list)]
10 loops, best of 3: 52.8 ms per loop

In [35]: %timeit t[[col for col in col_list if col in t.columns]]
10 loops, best of 3: 103 ms per loop

Conclusion: almost always list comprehension wins for smaller lists and Pandas/NumPy wins for bigger data sets...

get dataframe slice with list of column names where not all columns are in dataframe

Tags:

python

pandas

numpy

piRSquared

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us

get dataframe slice with list of column names where not all columns are in dataframe

Tags:

python

pandas

numpy

piRSquared

1 Answers

MaxU - stop WAR against UA

Related questions

Recent Activity

Donate For Us