Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform function on pairs of rows in Pandas dataframe

Tags:

python

pandas

Say I have the following dataframe:

>>> df=pd.DataFrame(data=['A','B','C','D','E'], columns=['Name'])
>>> df
  Name
0    A
1    B
2    C
3    D
4    E
>>> 

I want to create a list of values for adjacent rows in the dataframe. If I create an index of pairs I can get that result by using groupby:

>>> df.index=[0,0,1,1,2]
>>> df.groupby(level=0).agg(lambda x: list(x))
     Name
0  [A, B]
1  [C, D]
2     [E]

What is the most efficient way of doing this?

like image 759
AJG519 Avatar asked Nov 21 '15 00:11

AJG519


People also ask

How do you apply a function to each row of a DataFrame?

Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.

What is the difference between ILOC () and loc ()?

The main distinction between loc and iloc is: loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).


1 Answers

You can groupby by "adjacency" in one go (without mutating the DataFrame):

In [11]: g = df.groupby(df.index // 2)

and then do whatever it is you need to do:

In [12]: g.get_group(0)
Out[12]:
  Name
0    A
1    B

In [13]: g.sum()
Out[13]:
  Name
0   AB
1   CD
2    E
like image 67
Andy Hayden Avatar answered Oct 28 '22 11:10

Andy Hayden