Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Testing if a pandas DataFrame exists

In my code, I have several variables which can either contain a pandas DataFrame or nothing at all. Let's say I want to test and see if a certain DataFrame has been created yet or not. My first thought would be to test for it like this:

if df1:
    # do something

However, that code fails in this way:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Fair enough. Ideally, I would like to have a presence test that works for either a DataFrame or Python None.

Here is one way this can work:

if not isinstance(df1, type(None)):
    # do something

However, testing for type is really slow.

t = timeit.Timer('if None: pass')
t.timeit()
# approximately 0.04
t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')
t.timeit()
# approximately 0.4

Ouch. Along with being slow, testing for NoneType isn't very flexible, either.

A different solution would be to initialize df1 as an empty DataFrame, so that the type would be the same in both the null and non-null cases. I could then just test using len(), or any(), or something like that. Making an empty DataFrame seems kind of silly and wasteful, though.

Another solution would be to have an indicator variable: df1_exists, which is set to False until df1 is created. Then, instead of testing df1, I would be testing df1_exists. But this doesn't seem all that elegant, either.

Is there a better, more Pythonic way of handling this issue? Am I missing something, or is this just an awkward side effect all the awesome things about pandas?

like image 565
J Jones Avatar asked Sep 05 '16 20:09

J Jones


People also ask

How do you check if a DataFrame exists in pandas?

I could then just test using len() , or any() , or something like that. Making an empty DataFrame seems kind of silly and wasteful, though. Another solution would be to have an indicator variable: df1_exists , which is set to False until df1 is created. Then, instead of testing df1 , I would be testing df1_exists .

How do you check if data exists in DataFrame?

You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.

How do you know if DF is none?

Use DataFrame. isnull(). Values. any() method to check if there are any missing data in pandas DataFrame, missing data is represented as NaN or None values in DataFrame.


4 Answers

Option 1 (my preferred option)

This is @Ami Tavory's

Please select his answer if you like this approach

It is very idiomatic python to initialize a variable with None then check for None prior to doing something with that variable.

df1 = None

if df1 is not None:
    print df1.head()

Option 2

However, setting up an empty dataframe isn't at all a bad idea.

df1 = pd.DataFrame()

if not df1.empty:
    print df1.head()

Option 3

Just try it.

try:
    print df1.head()
# catch when df1 is None
except AttributeError:
    pass
# catch when it hasn't even been defined
except NameError:
    pass

Timing

When df1 is in initialized state or doesn't exist at all

enter image description here

When df1 is a dataframe with something in it

df1 = pd.DataFrame(np.arange(25).reshape(5, 5), list('ABCDE'), list('abcde'))
df1

enter image description here

enter image description here

like image 161
piRSquared Avatar answered Oct 08 '22 13:10

piRSquared


In my code, I have several variables which can either contain a pandas DataFrame or nothing at all

The Pythonic way of indicating "nothing" is via None, and for checking "not nothing" via

if df1 is not None:
    ...

I am not sure how critical time is here, but since you measured things:

In [82]: t = timeit.Timer('if x is not None: pass', setup='x=None')

In [83]: t.timeit()
Out[83]: 0.022536039352416992

In [84]: t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')

In [85]: t.timeit()
Out[85]: 0.11571192741394043

So checking that something is not None, is also faster than the isinstance alternative.

like image 37
Ami Tavory Avatar answered Oct 08 '22 14:10

Ami Tavory


If the dataframe is stored as a dictionary value, you could test for its existence this way:

import pandas as pd

d = dict()
df = pd.DataFrame()

d['df'] = df

## the 'None' is default but including it for the example
if d.get('df', None) is not None:
    ## get df shape
    print(df.shape)
else:
    print('no df here')

like image 36
chiceman Avatar answered Oct 08 '22 12:10

chiceman


Did you try %who_ls DataFrame ? It outputs a list with all the defined DataFrames. Then you can check if it contains an element named as the df you are looking for.

listdf=%who_ls DataFrame
if 'df1' in listdf: print("df1 exists!")

This still will not tell you if it's empty or not, only that it exists.

You can use %who_ls for other types of elements, too.

like image 30
Fede Rico Avatar answered Oct 08 '22 13:10

Fede Rico