Add new column to dataframe based on dictionary

Tags:

I have a dataframe and a dictionary. I need to add a new column to the dataframe and calculate its values based on the dictionary.

Machine learning, adding new feature based on some table:

score = {(1, 45, 1, 1) : 4, (0, 1, 2, 1) : 5} df = pd.DataFrame(data = {     'gender' :      [1,  1,  0, 1,  1,  0,  0,  0,  1,  0],     'age' :         [13, 45, 1, 45, 15, 16, 16, 16, 15, 15],     'cholesterol' : [1,  2,  2, 1, 1, 1, 1, 1, 1, 1],     'smoke' :       [0,  0,  1, 1, 7, 8, 3, 4, 4, 2]},      dtype = np.int64)  print(df, '\n') df['score'] = 0 df.score = score[(df.gender, df.age, df.cholesterol, df.smoke)] print(df)

I expect the following output:

   gender  age  cholesterol  smoke    score 0       1   13            1      0      0  1       1   45            2      0      0 2       0    1            2      1      5 3       1   45            1      1      4 4       1   15            1      7      0 5       0   16            1      8      0 6       0   16            1      3      0 7       0   16            1      4      0 8       1   15            1      4      0 9       0   15            1      2      0

533

asked Oct 29 '19 16:10

Roman Kazmin

2 Answers

Since score is a dictionary (so the keys are unique) we can use MultiIndex alignment

df = df.set_index(['gender', 'age', 'cholesterol', 'smoke']) df['score'] = pd.Series(score)  # Assign values based on the tuple df = df.fillna(0, downcast='infer').reset_index()  # Back to columns

   gender  age  cholesterol  smoke  score 0       1   13            1      0      0 1       1   45            2      0      0 2       0    1            2      1      5 3       1   45            1      1      4 4       1   15            1      7      0 5       0   16            1      8      0 6       0   16            1      3      0 7       0   16            1      4      0 8       1   15            1      4      0 9       0   15            1      2      0

136

answered Nov 09 '22 03:11

ALollz

Using assign with a list comprehension, getting a tuple of values (each row) from the score dictionary, defaulting to zero if not found.

>>> df.assign(score=[score.get(tuple(row), 0) for row in df.values])    gender  age  cholesterol  smoke  score 0       1   13            1      0      0 1       1   45            2      0      0 2       0    1            2      1      5 3       1   45            1      1      4 4       1   15            1      7      0 5       0   16            1      8      0 6       0   16            1      3      0 7       0   16            1      4      0 8       1   15            1      4      0 9       0   15            1      2      0

Timings

Given the variety of approaches, I though it would be interesting to compare some of the timings.

# Initial dataframe 100k rows (10 rows of identical data replicated 10k times). df = pd.DataFrame(data = {     'gender' :      [1,  1,  0, 1,  1,  0,  0,  0,  1,  0] * 10000,     'age' :         [13, 45, 1, 45, 15, 16, 16, 16, 15, 15] * 10000,     'cholesterol' : [1,  2,  2, 1, 1, 1, 1, 1, 1, 1] * 10000,     'smoke' :       [0,  0,  1, 1, 7, 8, 3, 4, 4, 2] * 10000},      dtype = np.int64)  %timeit -n 10 df.assign(score=[score.get(tuple(v), 0) for v in df.values]) # 223 ms ± 9.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  %%timeit -n 10  df.assign(score=[score.get(t, 0) for t in zip(*map(df.get, df))]) # 76.8 ms ± 2.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  %%timeit -n 10 df.assign(score=[score.get(v, 0) for v in df.itertuples(index=False)]) # 113 ms ± 2.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  %timeit -n 10 df.assign(score=df.apply(lambda x: score.get(tuple(x), 0), axis=1)) # 1.84 s ± 77.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  %%timeit -n 10 (df  .set_index(['gender', 'age', 'cholesterol', 'smoke'])  .assign(score=pd.Series(score))  .fillna(0, downcast='infer')  .reset_index() ) # 138 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  %%timeit -n 10 s=pd.Series(score) s.index.names=['gender','age','cholesterol','smoke'] df.merge(s.to_frame('score').reset_index(),how='left').fillna(0).astype(int) # 24 ms ± 2.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  %%timeit -n 10 df.assign(score=pd.Series(zip(df.gender, df.age, df.cholesterol, df.smoke))                 .map(score)                 .fillna(0)                 .astype(int)) # 191 ms ± 7.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  %%timeit -n 10 df.assign(score=df[['gender', 'age', 'cholesterol', 'smoke']]                 .apply(tuple, axis=1)                 .map(score)                 .fillna(0)) # 1.95 s ± 134 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

answered Nov 09 '22 05:11

Alexander

Related questions
                            
                                How to automatically reload .NET Core project in Visual Studio 2019
                            
                                google-chrome Failed to move to new namespace
                            
                                firebase-tools "Error: certificate has expired"
                            
                                MSVC cannot return an object that can be copied but cannot be moved
                            
                                My website got hacked.. What should I do? [closed]
                            
                                How do I get the current user's Local Settings folder path in C#?
                            
                                Making a PHP object behave like an array?
                            
                                Sending the same but modifed object over ObjectOutputStream
                            
                                Marshal "char *" in C#
                            
                                jQuery select descendants, including the parent
                            
                                Is there an upper limit on .txt file size?
                            
                                C# httpwebrequest and javascript

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With