Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Piping" output from one function to another using Python infix syntax

I'm trying to replicate, roughly, the dplyr package from R using Python/Pandas (as a learning exercise). Something I'm stuck on is the "piping" functionality.

In R/dplyr, this is done using the pipe-operator %>%, where x %>% f(y) is equivalent to f(x, y). If possible, I would like to replicate this using infix syntax (see here).

To illustrate, consider the two functions below.

import pandas as pd  def select(df, *args):     cols = [x for x in args]     df = df[cols]     return df  def rename(df, **kwargs):     for name, value in kwargs.items():         df = df.rename(columns={'%s' % name: '%s' % value})     return df 

The first function takes a dataframe and returns only the given columns. The second takes a dataframe, and renames the given columns. For example:

d = {'one' : [1., 2., 3., 4., 4.],      'two' : [4., 3., 2., 1., 3.]}  df = pd.DataFrame(d)  # Keep only the 'one' column. df = select(df, 'one')  # Rename the 'one' column to 'new_one'. df = rename(df, one = 'new_one') 

To achieve the same using pipe/infix syntax, the code would be:

df = df | select('one') \         | rename(one = 'new_one') 

So the output from the left-hand side of | gets passed as the first argument to the function on the right-hand side. Whenever I see something like this done (here, for example) it involves lambda functions. Is it possible to pipe a Pandas' dataframe between functions in the same manner?

I know Pandas has the .pipe method, but what's important to me is the syntax of the example I provided. Any help would be appreciated.

like image 499
Malthus Avatar asked Nov 11 '15 19:11

Malthus


People also ask

How do you use the pipe function in Python?

pipe() method in Python is used to create a pipe. A pipe is a method to pass information from one process to another process. It offers only one-way communication and the passed information is held by the system until it is read by the receiving process.

Can you pipe in Python?

Pipe is a Python library that enables you to use pipes in Python. A pipe ( | ) passes the results of one method to another method. I like Pipe because it makes my code look cleaner when applying multiple methods to a Python iterable. Since Pipe only provides a few methods, it is also very easy to learn Pipe.


1 Answers

It is hard to implement this using the bitwise or operator because pandas.DataFrame implements it. If you don't mind replacing | with >>, you can try this:

import pandas as pd  def select(df, *args):     cols = [x for x in args]     return df[cols]   def rename(df, **kwargs):     for name, value in kwargs.items():         df = df.rename(columns={'%s' % name: '%s' % value})     return df   class SinkInto(object):     def __init__(self, function, *args, **kwargs):         self.args = args         self.kwargs = kwargs         self.function = function      def __rrshift__(self, other):         return self.function(other, *self.args, **self.kwargs)      def __repr__(self):         return "<SinkInto {} args={} kwargs={}>".format(             self.function,              self.args,              self.kwargs         )  df = pd.DataFrame({'one' : [1., 2., 3., 4., 4.],                    'two' : [4., 3., 2., 1., 3.]}) 

Then you can do:

>>> df    one  two 0    1    4 1    2    3 2    3    2 3    4    1 4    4    3  >>> df = df >> SinkInto(select, 'one') \             >> SinkInto(rename, one='new_one') >>> df    new_one 0        1 1        2 2        3 3        4 4        4 

In Python 3 you can abuse unicode:

>>> print('\u01c1') ǁ >>> ǁ = SinkInto >>> df >> ǁ(select, 'one') >> ǁ(rename, one='new_one')    new_one 0        1 1        2 2        3 3        4 4        4 

[update]

Thanks for your response. Would it be possible to make a separate class (like SinkInto) for each function to avoid having to pass the functions as an argument?

How about a decorator?

def pipe(original):     class PipeInto(object):         data = {'function': original}          def __init__(self, *args, **kwargs):             self.data['args'] = args             self.data['kwargs'] = kwargs          def __rrshift__(self, other):             return self.data['function'](                 other,                  *self.data['args'],                  **self.data['kwargs']             )      return PipeInto   @pipe def select(df, *args):     cols = [x for x in args]     return df[cols]   @pipe def rename(df, **kwargs):     for name, value in kwargs.items():         df = df.rename(columns={'%s' % name: '%s' % value})     return df 

Now you can decorate any function that takes a DataFrame as the first argument:

>>> df >> select('one') >> rename(one='first')    first 0      1 1      2 2      3 3      4 4      4 

Python is awesome!

I know that languages like Ruby are "so expressive" that it encourages people to write every program as new DSL, but this is kind of frowned upon in Python. Many Pythonists consider operator overloading for a different purpose as a sinful blasphemy.

[update]

User OHLÁLÁ is not impressed:

The problem with this solution is when you are trying to call the function instead of piping. – OHLÁLÁ

You can implement the dunder-call method:

def __call__(self, df):     return df >> self 

And then:

>>> select('one')(df)    one 0  1.0 1  2.0 2  3.0 3  4.0 4  4.0 

Looks like it is not easy to please OHLÁLÁ:

In that case you need to call the object explicitly:
select('one')(df) Is there a way to avoid that? – OHLÁLÁ

Well, I can think of a solution but there is a caveat: your original function must not take a second positional argument that is a pandas dataframe (keyword arguments are ok). Lets add a __new__ method to our PipeInto class inside the docorator that tests if the first argument is a dataframe, and if it is then we just call the original function with the arguments:

def __new__(cls, *args, **kwargs):     if args and isinstance(args[0], pd.DataFrame):         return cls.data['function'](*args, **kwargs)     return super().__new__(cls) 

It seems to work but probably there is some downside I was unable to spot.

>>> select(df, 'one')    one 0  1.0 1  2.0 2  3.0 3  4.0 4  4.0  >>> df >> select('one')    one 0  1.0 1  2.0 2  3.0 3  4.0 4  4.0 
like image 51
Paulo Scardine Avatar answered Sep 22 '22 21:09

Paulo Scardine