I'm running into a very strange issue ever since I have ported my code from one computer to another. I'm using pandas version 0.25.1 on this system, but am unsure on the pandas version I was using previously.
The issue is as follows:
I create a simple, unsorted (mock) dataframe on which I want to sort values and forward-fill all the NaN values.
In [1]: import pandas as pd
...: import numpy as np
In [2]: test = pd.DataFrame({"group" : ["A", "A", "A", "B", "B", "B", "C", "C"],
...: "count" : [2, 3, 1, 2, 1, 3, 1, 2],
...: "value" : [10, np.nan, 30, np.nan, 19, np.nan, 25, np.nan]})
In [3]: test
Out[3]:
group count value
0 A 2 10.0
1 A 3 NaN
2 A 1 30.0
3 B 2 NaN
4 B 1 19.0
5 B 3 NaN
6 C 1 25.0
7 C 2 NaN
However, when I do that I lose the entire "group" column, and it does not reappear in my index either.
In [4]: test.sort_values(["group", "count"]).groupby("group").ffill()
Out[4]:
count value
2 1 30.0
0 2 10.0
1 3 10.0
4 1 19.0
3 2 19.0
5 3 19.0
6 1 25.0
7 2 25.0
I've also tried to use the following using fillna, but that gives me the same result:
In [5]: test.sort_values(["group", "count"]).groupby("group").fillna(method = "ffill")
Out[5]:
count value
2 1 30.0
0 2 10.0
1 3 10.0
4 1 19.0
3 2 19.0
5 3 19.0
6 1 25.0
7 2 25.0
Does anyone know what I am doing wrong? The issue seems to be with the ffill method, since I CAN use .mean() on the groupby and retain my groupings.
IICU, you have to use 'update` to get the results back to the dataframe
test.update(test.sort_values(["group", "count"]).groupby("group").ffill())
print(test)
Output
group count value
0 A 2 10.0
1 A 3 10.0
2 A 1 30.0
3 B 2 19.0
4 B 1 19.0
5 B 3 19.0
6 C 1 25.0
7 C 2 25.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With