Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Assigning multiple *new* columns simultaneously

Tags:

python

pandas

I have a DataFrame df with a column containing labels for each row (in addition to some relevant data for each row). I have a dictionary labeldict with keys equal to the possible labels and values equal to 2-tuples of information related to that label. I'd like to tack two new columns onto my frame, one for each part of the 2-tuple corresponding to the label for each row.

Here is the setup:

import pandas as pd import numpy as np  np.random.seed(1) n = 10  labels = list('abcdef') colors = ['red', 'green', 'blue'] sizes = ['small', 'medium', 'large']  labeldict = {c: (np.random.choice(colors), np.random.choice(sizes)) for c in labels}  df = pd.DataFrame({'label': np.random.choice(labels, n),                     'somedata': np.random.randn(n)}) 

I can get what I want by running:

df['color'], df['size'] = zip(*df['label'].map(labeldict)) print df    label  somedata  color    size 0     b  0.196643    red  medium 1     c -1.545214  green   small 2     a -0.088104  green   small 3     c  0.852239  green   small 4     b  0.677234    red  medium 5     c -0.106878  green   small 6     a  0.725274  green   small 7     d  0.934889    red  medium 8     a  1.118297  green   small 9     c  0.055613  green   small 

But how can I do this if I don't want to manually type out the two columns on the left side of the assignment? I.e. how can I create multiple new columns on the fly. For example, if I had 10-tuples in labeldict instead of 2-tuples, this would be a real pain as currently written. Here are a couple things that don't work:

# set up attrlist for later use attrlist = ['color', 'size']  # non-working idea 1) df[attrlist] = zip(*df['label'].map(labeldict))  # non-working idea 2) df.loc[:, attrlist] = zip(*df['label'].map(labeldict)) 

This does work, but seems like a hack:

for a in attrlist:     df[a] = 0 df[attrlist] = zip(*df['label'].map(labeldict)) 

Better solutions?

like image 342
8one6 Avatar asked Dec 29 '13 20:12

8one6


People also ask

How do I assign multiple columns in Pandas?

If you want to add multiple columns to a DataFrame as part of a method chain, you can use apply . The first step is to create a function that will transform a row represented as a Series into the form you want. Then you can call apply to use this function on each row.

How do I add multiple columns from one DataFrame to another in python?

import pandas as pd df = {'col_1': [0, 1, 2, 3], 'col_2': [4, 5, 6, 7]} df = pd. DataFrame(df) df[[ 'column_new_1', 'column_new_2','column_new_3']] = [np. nan, 'dogs',3] #thought this would work here...

Can Pandas apply return multiple columns?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.


2 Answers

Just use result_type='expand' in pandas apply

df Out[78]:     a  b 0  0  1 1  2  3 2  4  5 3  6  7 4  8  9  df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')  df Out[80]:     a  b  mean  std  max 0  0  1   0.5  0.5  1.0 1  2  3   2.5  0.5  3.0 2  4  5   4.5  0.5  5.0 3  6  7   6.5  0.5  7.0 4  8  9   8.5  0.5  9.0 

and here some copy paste code

import pandas as pd import numpy as np  df = pd.DataFrame(np.arange(10).reshape(5,2), columns=['a','b']) print('df',df, sep='\n') print() def mathOperationsTuple(arr):     return np.mean(arr), np.std(arr), np.amax(arr)  df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand') print('df',df, sep='\n') 
like image 191
Markus Dutschke Avatar answered Sep 20 '22 23:09

Markus Dutschke


You can use merge instead:

>>> ld = pd.DataFrame(labeldict).T >>> ld.columns = ['color', 'size'] >>> ld.index.name = 'label' >>> df.merge(ld.reset_index(), on='label')   label  somedata  color    size 0     b  1.462108    red  medium 1     c -2.060141  green   small 2     c  1.133769  green   small 3     c  0.042214  green   small 4     e -0.322417    red  medium 5     e -1.099891    red  medium 6     e -0.877858    red  medium 7     e  0.582815    red  medium 8     f -0.384054    red   large 9     d -0.172428    red  medium 
like image 45
alko Avatar answered Sep 20 '22 23:09

alko