Replace values in pandas column with default value for missing keys

Tags:

I have multiple simple functions that need to be implemented on every row of certain columns of my dataframe. The dataframe is very like, 10 million+ rows. My dataframe is something like this:

Date      location   city        number  value
12/3/2018   NY       New York      2      500
12/1/2018   MN       Minneapolis   3      600
12/2/2018   NY       Rochester     1      800
12/3/2018   WA       Seattle       2      400

I have functions like these:

def normalized_location(row):
    if row['city'] == " Minneapolis":
        return "FCM"
    elif row['city'] == "Seattle":
        return "FCS"
    else:
        return "Other"

and then I use:

df['Normalized Location'] =df.apply (lambda row: normalized_location (row),axis=1)

This is extremely slow, how can I make this more efficient?

681

asked Dec 03 '18 21:12

Nazanin Zinouri

1 Answers

We can make this BLAZING fast using map with a defaultdict.

from collections import defaultdict

d = defaultdict(lambda: 'Other')
d.update({"Minneapolis": "FCM", "Seattle": "FCS"})

df['normalized_location'] = df['city'].map(d)

print(df)
        Date location         city  number  value normalized_location
0  12/3/2018       NY     New York       2    500               Other
1  12/1/2018       MN  Minneapolis       3    600                 FCM
2  12/2/2018       NY    Rochester       1    800               Other
3  12/3/2018       WA      Seattle       2    400                 FCS

...to circumvent a fillna call, for performance reasons. This approach generalises to multiple replacements quite easily.

183

answered Sep 21 '22 00:09

cs95

Related questions
                            
                                Alternative to nested np.where in Pandas DataFrame
                            
                                How can I make a pandas dataframe out of multiple numpy arrays
                            
                                Gunicorn - No access logs
                            
                                python-igraph how to add edges with weight?
                            
                                How to change python version in command prompt if I have 2 python version installed
                            
                                Select the max row per group - pandas performance issue
                            
                                Python Sort - Semi Ignore Case (a, aa, A, AA, b, bb, B, BB...)
                            
                                Error: non-constant-expression cannot be narrowed from type 'npy_intp' to 'int'
                            
                                Using Pandas, how do I split based on the first space.
                            
                                How do I replace a Python installed from source with a packaged version?
                            
                                AUTH_USER_MODEL refers to model '%s' that has not been installed"
                            
                                python matplotlib heatmap colorbar from transparent
                            
                                Compare multiple algorithms with sklearn pipeline
                            
                                How can I access the overall test result of a pytest test run during runtime?
                            
                                Google Colab: How to loop through images in a folder?
                            
                                Overload decorator in typings module doesn't seem to behave as expected
                            
                                Compare columns in different pandas dataframes
                            
                                AttributeError: module 'cv2.cv2' has no attribute 'xfeatures2d' [Opencv 3.4.3]
                            
                                Make a PyCharm project inheriting global site-packages, after creating the project?
                            
                                How to unpack an object as it was a tuple in a for loop?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replace values in pandas column with default value for missing keys

Tags:

python

replace

pandas

dataframe

lambda

Nazanin Zinouri

People also ask

1 Answers

cs95

Recent Activity

Donate For Us