I have multiple simple functions that need to be implemented on every row of certain columns of my dataframe. The dataframe is very like, 10 million+ rows. My dataframe is something like this:
Date location city number value
12/3/2018 NY New York 2 500
12/1/2018 MN Minneapolis 3 600
12/2/2018 NY Rochester 1 800
12/3/2018 WA Seattle 2 400
I have functions like these:
def normalized_location(row):
if row['city'] == " Minneapolis":
return "FCM"
elif row['city'] == "Seattle":
return "FCS"
else:
return "Other"
and then I use:
df['Normalized Location'] =df.apply (lambda row: normalized_location (row),axis=1)
This is extremely slow, how can I make this more efficient?
The method argument of fillna() can be used to replace missing values with previous/next valid values. If method is set to 'ffill' or 'pad' , missing values are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill' , replaced with the next valid values (= backward fill).
Pandas Dataframe method in Python such as fillna can be used to replace the missing values. Methods such as mean(), median() and mode() can be used on Dataframe for finding their values.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
While NaN is the default missing value marker for reasons of computational speed and convenience, we need to be able to easily detect this value with data of different types: floating point, integer, boolean, and general object.
We can make this BLAZING fast using map
with a defaultdict
.
from collections import defaultdict
d = defaultdict(lambda: 'Other')
d.update({"Minneapolis": "FCM", "Seattle": "FCS"})
df['normalized_location'] = df['city'].map(d)
print(df)
Date location city number value normalized_location
0 12/3/2018 NY New York 2 500 Other
1 12/1/2018 MN Minneapolis 3 600 FCM
2 12/2/2018 NY Rochester 1 800 Other
3 12/3/2018 WA Seattle 2 400 FCS
...to circumvent a fillna
call, for performance reasons. This approach generalises to multiple replacements quite easily.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With