Is there a way to check if a column exists in a Pandas DataFrame?
Suppose that I have the following DataFrame:
>>> import pandas as pd >>> from random import randint >>> df = pd.DataFrame({'A': [randint(1, 9) for x in xrange(10)], 'B': [randint(1, 9)*10 for x in xrange(10)], 'C': [randint(1, 9)*100 for x in xrange(10)]}) >>> df A B C 0 3 40 100 1 6 30 200 2 7 70 800 3 3 50 200 4 7 50 400 5 4 10 400 6 3 70 500 7 8 30 200 8 3 40 800 9 6 60 200
and I want to calculate df['sum'] = df['A'] + df['C']
But first I want to check if df['A']
exists, and if not, I want to calculate df['sum'] = df['B'] + df['C']
instead.
You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.
To check if a value exists in the Index of a Pandas DataFrame, use the in keyword on the index property.
You can get column names in Pandas dataframe using df. columns statement. Usecase: This is useful when you want to show all columns in a dataframe in the output console (E.g. in the jupyter notebook console).
This will work:
if 'A' in df:
But for clarity, I'd probably write it as:
if 'A' in df.columns:
To check if one or more columns all exist, you can use set.issubset
, as in:
if set(['A','C']).issubset(df.columns): df['sum'] = df['A'] + df['C']
As @brianpck points out in a comment, set([])
can alternatively be constructed with curly braces,
if {'A', 'C'}.issubset(df.columns):
See this question for a discussion of the curly-braces syntax.
Or, you can use a generator comprehension, as in:
if all(item in df.columns for item in ['A','C']):
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With