Say I have the following dataframe:
>>> df=pd.DataFrame(data=['A','B','C','D','E'], columns=['Name'])
>>> df
Name
0 A
1 B
2 C
3 D
4 E
>>>
I want to create a list of values for adjacent rows in the dataframe. If I create an index of pairs I can get that result by using groupby:
>>> df.index=[0,0,1,1,2]
>>> df.groupby(level=0).agg(lambda x: list(x))
Name
0 [A, B]
1 [C, D]
2 [E]
What is the most efficient way of doing this?
Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.
The main distinction between loc and iloc is: loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).
You can groupby by "adjacency" in one go (without mutating the DataFrame):
In [11]: g = df.groupby(df.index // 2)
and then do whatever it is you need to do:
In [12]: g.get_group(0)
Out[12]:
Name
0 A
1 B
In [13]: g.sum()
Out[13]:
Name
0 AB
1 CD
2 E
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With