Filling empty "B column" cells with last known value if rows have same "A column" value

Question

Im really new to Python and Datascience.

I have a large Dataset(with 100K+ rows), in this dataset i have two columns A and B. A is a Datetime column and B is string.

Column B has some NaN values, i want to fill those NaN values with latest known B column value, given the condition that my empty B column row and already filled B column row are in the same day, month and year (Column A).

Lemme explain my Self:

Let's say that's my input:

df=pd.DataFrame({'A': ["2019-03-13 08:12:23", "2019-03-13 07:10:18", "2019-03-20 08:12:23", "2019-03-13 08:12:23", "2019-03-15 10:35:53", "2019-03-20 11:12:23"], 'B': ["B1", "B0", "B13", np.nan, "B10", "B12"]})
                     A    B
0  2019-03-13 08:12:23   B1
1  2019-03-13 07:10:18   B0
2  2019-03-20 08:12:23  B13
3  2019-03-13 08:12:23  NaN
4  2019-03-15 10:35:53  B10
5  2019-03-20 11:12:23  B12

I want to fill the NaN value with B1(B value that occurs the same day and has the biggest time given the condition that this "Biggest time" isn't ahead of the actual A column value).

So my output should look like this:

                     A    B
0  2019-03-13 08:12:23   B1
1  2019-03-13 07:10:18   B0
2  2019-03-20 08:12:23  B13
3  2019-03-13 08:12:23   B1
4  2019-03-15 10:35:53  B10
5  2019-03-20 11:12:23  B12

I tried to achieve this with no success, the best i could do is making NaN Value to B13 using this :

df['B']=df['B'].replace({'B': {0: np.nan}}).ffill()

Can you please guys tell me what's the fatest and most economic way to achieve this?

rafaelc · Accepted Answer

Use groupby+ffill

as_date = pd.to_datetime(df.A)
s = np.argsort(as_date)

df['B'] = df.B.loc[s].groupby(as_date.loc[s].dt.date).ffill().loc[df.index]

                     A    B
0  2019-03-13 08:12:23   B1
1  2019-03-13 07:10:18   B0
2  2019-03-20 08:12:23  B13
3  2019-03-13 08:12:23   B1
4  2019-03-15 10:35:53  B10
5  2019-03-20 11:12:23  B12

Filling empty "B column" cells with last known value if rows have same "A column" value

Tags:

python

pandas

dataset

data-science

Arès

1 Answers

rafaelc

Recent Activity

Donate For Us

Filling empty "B column" cells with last known value if rows have same "A column" value

Tags:

python

pandas

dataset

data-science

Arès

1 Answers

rafaelc

Related questions

Recent Activity

Donate For Us