Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why am I getting an empty row in my dataframe after using pandas apply?

I'm fairly new to Python and Pandas and trying to figure out how to do a simple split-join-apply. The problem I am having is that I am getting an blank row at the top of all the dataframes I'm getting back from Pandas' apply function and I'm not sure why. Can anyone explain?

The following is a minimal example that demonstrates the problem, not my actual code:

sorbet = pd.DataFrame({
  'flavour': ['orange', 'orange', 'lemon', 'lemon'],
  'niceosity' : [4, 5, 7, 8]})

def calc_vals(df, target) :
    return pd.Series({'total' : df[target].count(), 'mean' : df[target].mean()})

sorbet_grouped = sorbet.groupby('flavour')
sorbet_vals = sorbet_grouped.apply(calc_vals, target='niceosity')

if I then do print(sorted_vals) I get this output:

         mean  total
flavour                 <--- Why are there spaces here?
lemon     7.5      2
orange    4.5      2

[2 rows x 2 columns]

Compare this with print(sorbet):

  flavour  niceosity     <--- Note how column names line up
0  orange          4
1  orange          5
2   lemon          7
3   lemon          8

[4 rows x 2 columns]

What is causing this discrepancy and how can I fix it?

like image 560
Jack Aidley Avatar asked Mar 27 '14 16:03

Jack Aidley


People also ask

How do I get rid of blank rows in Pandas?

Use df. dropna() to drop rows with NaN from a Pandas dataframe. Call df. dropna(subset, inplace=True) with inplace set to True and subset set to a list of column names to drop all rows that contain NaN under those columns.

How do I apply a row to a function in Pandas?

Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.

How do I check if a DataFrame has empty rows?

shape() method returns the number of rows and number of columns as a tuple, you can use this to check if pandas DataFrame is empty. DataFrame. shape[0] return number of rows. If you have no rows then it gives you 0 and comparing it with 0 gives you True .


1 Answers

The groupby/apply operation returns is a new DataFrame, with a named index. The name corresponds to the column name by which the original DataFrame was grouped.

The name shows up above the index. If you reset it to None, then that row disappears:

In [155]: sorbet_vals.index.name = None

In [156]: sorbet_vals
Out[156]: 
        mean  total
lemon    7.5      2
orange   4.5      2

[2 rows x 2 columns]

Note that the name is useful -- I don't really recommend removing it. The name allows you to refer to that index by name rather than merely by number.


If you wish the index to be a column, use reset_index:

In [209]: sorbet_vals.reset_index(inplace=True); sorbet_vals
Out[209]: 
  flavour  mean  total
0   lemon   7.5      2
1  orange   4.5      2

[2 rows x 3 columns]
like image 91
unutbu Avatar answered Sep 27 '22 20:09

unutbu