In my code, I have several variables which can either contain a pandas DataFrame or nothing at all. Let's say I want to test and see if a certain DataFrame has been created yet or not. My first thought would be to test for it like this:
if df1:
# do something
However, that code fails in this way:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Fair enough. Ideally, I would like to have a presence test that works for either a DataFrame or Python None.
Here is one way this can work:
if not isinstance(df1, type(None)):
# do something
However, testing for type is really slow.
t = timeit.Timer('if None: pass')
t.timeit()
# approximately 0.04
t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')
t.timeit()
# approximately 0.4
Ouch. Along with being slow, testing for NoneType isn't very flexible, either.
A different solution would be to initialize df1
as an empty DataFrame, so that the type would be the same in both the null and non-null cases. I could then just test using len()
, or any()
, or something like that. Making an empty DataFrame seems kind of silly and wasteful, though.
Another solution would be to have an indicator variable: df1_exists
, which is set to False until df1
is created. Then, instead of testing df1
, I would be testing df1_exists
. But this doesn't seem all that elegant, either.
Is there a better, more Pythonic way of handling this issue? Am I missing something, or is this just an awkward side effect all the awesome things about pandas?
I could then just test using len() , or any() , or something like that. Making an empty DataFrame seems kind of silly and wasteful, though. Another solution would be to have an indicator variable: df1_exists , which is set to False until df1 is created. Then, instead of testing df1 , I would be testing df1_exists .
You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.
Use DataFrame. isnull(). Values. any() method to check if there are any missing data in pandas DataFrame, missing data is represented as NaN or None values in DataFrame.
Option 1 (my preferred option)
Please select his answer if you like this approach
It is very idiomatic python to initialize a variable with None
then check for None
prior to doing something with that variable.
df1 = None
if df1 is not None:
print df1.head()
Option 2
However, setting up an empty dataframe isn't at all a bad idea.
df1 = pd.DataFrame()
if not df1.empty:
print df1.head()
Option 3
Just try it.
try:
print df1.head()
# catch when df1 is None
except AttributeError:
pass
# catch when it hasn't even been defined
except NameError:
pass
When df1
is in initialized state or doesn't exist at all
When df1
is a dataframe with something in it
df1 = pd.DataFrame(np.arange(25).reshape(5, 5), list('ABCDE'), list('abcde'))
df1
In my code, I have several variables which can either contain a pandas DataFrame or nothing at all
The Pythonic way of indicating "nothing" is via None
, and for checking "not nothing" via
if df1 is not None:
...
I am not sure how critical time is here, but since you measured things:
In [82]: t = timeit.Timer('if x is not None: pass', setup='x=None')
In [83]: t.timeit()
Out[83]: 0.022536039352416992
In [84]: t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')
In [85]: t.timeit()
Out[85]: 0.11571192741394043
So checking that something is not None
, is also faster than the isinstance
alternative.
If the dataframe is stored as a dictionary value, you could test for its existence this way:
import pandas as pd
d = dict()
df = pd.DataFrame()
d['df'] = df
## the 'None' is default but including it for the example
if d.get('df', None) is not None:
## get df shape
print(df.shape)
else:
print('no df here')
Did you try %who_ls DataFrame ? It outputs a list with all the defined DataFrames. Then you can check if it contains an element named as the df you are looking for.
listdf=%who_ls DataFrame
if 'df1' in listdf: print("df1 exists!")
This still will not tell you if it's empty or not, only that it exists.
You can use %who_ls for other types of elements, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With