I have a pandas data frame that looks like this (index, Place):
0 California*
1 LA
2 San Diego
3 Oakland
4 Washington*
5 Seattle
6 Tacoma
The values in Place that are states are marked with an asterisk, so I'd like to create a new column and fill it in with the state value. I want to assign the value for state based on any row that has an asterisk and then fill it for all the subsequent rows until a new row comes with a new value. In SAS, this would be done with a retain statement, but I don't know how it works with pandas.
The output should be (index, Place, State):
0 California* California
1 LA California
2 San Diego California
3 Oakland California
4 Washington* Washington
5 Seattle Washington
6 Tacoma Washington
etc.
What's a simple and elegant way of going about this?
You can use Series.where with mask created with indexing with str and ffill:
print (df.col.str[-1] == '*')
0 True
1 False
2 False
3 False
4 True
5 False
6 False
Name: col, dtype: bool
df['new'] = df.col.str[:-1].where(df.col.str[-1] == '*').ffill()
print (df)
col new
0 California* California
1 LA California
2 San Diego California
3 Oakland California
4 Washington* Washington
5 Seattle Washington
6 Tacoma Washington
You can use Series.str.extract() method in conjunction with Series.ffill():
In [236]: df['State'] = df.Place.str.extract(r'([^\*]+)\*', expand=False).ffill()
In [237]: df
Out[237]:
Place State
0 California* California
1 LA California
2 San Diego California
3 Oakland California
4 Washington* Washington
5 Seattle Washington
6 Tacoma Washington
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With