Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

calling apply() on an empty pandas DataFrame

Tags:

python

pandas

I'm having a problem with the apply() method of the pandas DataFrame. My issue is that apply() can return either a Series or a DataFrame, depending on the return type of the input function; however, when the frame is empty, apply() (almost) always returns a DataFrame. So I can't write code that expects a Series. Here's an example:

import pandas as pd

def area_from_row(row):
    return row['width'] * row['height']

def add_area_column(frame):
    # I know I can multiply the columns directly, but my actual function is
    # more complicated.
    frame['area'] = frame.apply(area_from_row, axis=1)

# This works as expected.
non_empty_frame = pd.DataFrame(data=[[2, 3]], columns=['width', 'height'])
add_area_column(non_empty_frame)

# This fails!
empty_frame = pd.DataFrame(data=None, columns=['width', 'height'])
add_area_column(empty_frame)

Is there a standard way of dealing with this? I can do the following, but it's silly:

def area_from_row(row):
    # The way we respond to an empty row tells pandas whether we're a
    # reduction or not.
    if not len(row):
        return None
    return row['width'] * row['height']

(I'm using pandas 0.11.0, but I checked this on 0.12.0-1100-g0c30665 as well.)

like image 828
traversable Avatar asked Nov 14 '13 22:11

traversable


People also ask

How do I add to an empty data frame?

Append Data to an Empty Pandas Dataframe loc , we can also use the . append() method to add rows. The . append() method works by, well, appending a dataframe to another dataframe.

How do I fill empty cells in pandas?

You can replace blank/empty values with DataFrame. replace() methods. The replace() method replaces the specified value with another specified value on a specified column or on all columns of a DataFrame; replaces every case of the specified value. Yields below output.

How do you apply a function in a data frame?

DataFrame - apply() function. The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).


1 Answers

You can set the result_type parameter in apply to 'reduce'.

From the documentation,

By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

And then,

‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.

In your code, update here:

def add_area_column(frame):
    # I know I can multiply the columns directly, but my actual function is
    # more complicated.
    frame['area'] = frame.apply(area_from_row, axis=1, result_type='reduce')
like image 191
Ian Avatar answered Sep 25 '22 06:09

Ian