Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expressing pandas subset using pipe

I have a dataframe that I subset like this:

   a  b   x  y
0  1  2   3 -1
1  2  4   6 -2
2  3  6   6 -3
3  4  8   3 -4

df = df[(df.a >= 2) & (df.b <= 8)]
df = df.groupby(df.x).mean()

How do I express this using the pandas pipe operator?

df = (df
      .pipe((x.a > 2) & (x.b < 6)
      .groupby(df.x)
      .apply(lambda x: x.mean())
like image 283
user308827 Avatar asked Feb 28 '16 19:02

user308827


People also ask

What is the use of pipe () in Python pandas?

The pipe() method allows you to apply one or more functions to the DataFrame object.

What is the use of pipe () in Python pandas give example?

Pipe is a method in pandas. DataFrame capable of passing existing functions from packages or self-defined functions to dataframe. It is part of the methods that enable method chaining. By using pipe, multiple processes can be combined with method chaining without nesting.

Is there piping in pandas?

Pandas pipeline feature allows us to string together various user-defined Python functions in order to build a pipeline of data processing. There are two ways to create a Pipeline in pandas. By calling . pipe() function and by importing pdpipe package.


1 Answers

As long as you can categorize a step as something that returns a DataFrame, and takes a DataFrame (with possibly more arguments), then you can use pipe. Whether there's an advantage to doing so, is another question.

Here, e.g., you can use

df\
    .pipe(lambda df_, x, y: df_[(df_.a >= x) & (df_.b <= y)], 2, 8)\
    .pipe(lambda df_: df_.groupby(df_.x))\
    .mean()

Notice how the first stage is a lambda that takes 3 arguments, with the 2 and 8 passed as parameters. That's not the only way to do so - it is equivalent to

    .pipe(lambda df_: df_[(df_.a >= 2) & (df_.b <= 8)])\

Also note that you can use

df\
    .pipe(lambda df_, x, y: df[(df.a >= x) & (df.b <= y)], 2, 8)\
    .groupby('x')\
    .mean()

Here the lambda takes df_, but operates on df, and the second pipe has been replaced with a groupby.

  • The first change works here, but is gragile. It happens to work since this is the first pipe stage. If it would be a later stage, it might take a DataFrame with one dimension, and attempt to filter it on a mask with another dimension, for example.

  • The second change is fine. In face, I think it is more readable. Basically, anything that takes a DataFrame and returns one, can be either be called directly or through pipe.

like image 163
Ami Tavory Avatar answered Sep 20 '22 13:09

Ami Tavory