Say I have the following dataframe:
>>> df=pd.DataFrame(data=['A','B','C','D','E'], columns=['Name'])
>>> df
  Name
0    A
1    B
2    C
3    D
4    E
>>> 
I want to create a list of values for adjacent rows in the dataframe. If I create an index of pairs I can get that result by using groupby:
>>> df.index=[0,0,1,1,2]
>>> df.groupby(level=0).agg(lambda x: list(x))
     Name
0  [A, B]
1  [C, D]
2     [E]
What is the most efficient way of doing this?
Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.
The main distinction between loc and iloc is: loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).
You can groupby by "adjacency" in one go (without mutating the DataFrame):
In [11]: g = df.groupby(df.index // 2)
and then do whatever it is you need to do:
In [12]: g.get_group(0)
Out[12]:
  Name
0    A
1    B
In [13]: g.sum()
Out[13]:
  Name
0   AB
1   CD
2    E
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With