Pandas: Assigning multiple new columns simultaneously

Tags:

I have a DataFrame df with a column containing labels for each row (in addition to some relevant data for each row). I have a dictionary labeldict with keys equal to the possible labels and values equal to 2-tuples of information related to that label. I'd like to tack two new columns onto my frame, one for each part of the 2-tuple corresponding to the label for each row.

Here is the setup:

import pandas as pd import numpy as np  np.random.seed(1) n = 10  labels = list('abcdef') colors = ['red', 'green', 'blue'] sizes = ['small', 'medium', 'large']  labeldict = {c: (np.random.choice(colors), np.random.choice(sizes)) for c in labels}  df = pd.DataFrame({'label': np.random.choice(labels, n),                     'somedata': np.random.randn(n)})

I can get what I want by running:

df['color'], df['size'] = zip(*df['label'].map(labeldict)) print df    label  somedata  color    size 0     b  0.196643    red  medium 1     c -1.545214  green   small 2     a -0.088104  green   small 3     c  0.852239  green   small 4     b  0.677234    red  medium 5     c -0.106878  green   small 6     a  0.725274  green   small 7     d  0.934889    red  medium 8     a  1.118297  green   small 9     c  0.055613  green   small

But how can I do this if I don't want to manually type out the two columns on the left side of the assignment? I.e. how can I create multiple new columns on the fly. For example, if I had 10-tuples in labeldict instead of 2-tuples, this would be a real pain as currently written. Here are a couple things that don't work:

# set up attrlist for later use attrlist = ['color', 'size']  # non-working idea 1) df[attrlist] = zip(*df['label'].map(labeldict))  # non-working idea 2) df.loc[:, attrlist] = zip(*df['label'].map(labeldict))

This does work, but seems like a hack:

for a in attrlist:     df[a] = 0 df[attrlist] = zip(*df['label'].map(labeldict))

Better solutions?

342

asked Dec 29 '13 20:12

8one6

2 Answers

Just use result_type='expand' in pandas apply

df Out[78]:     a  b 0  0  1 1  2  3 2  4  5 3  6  7 4  8  9  df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')  df Out[80]:     a  b  mean  std  max 0  0  1   0.5  0.5  1.0 1  2  3   2.5  0.5  3.0 2  4  5   4.5  0.5  5.0 3  6  7   6.5  0.5  7.0 4  8  9   8.5  0.5  9.0

and here some copy paste code

import pandas as pd import numpy as np  df = pd.DataFrame(np.arange(10).reshape(5,2), columns=['a','b']) print('df',df, sep='\n') print() def mathOperationsTuple(arr):     return np.mean(arr), np.std(arr), np.amax(arr)  df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand') print('df',df, sep='\n')

191

answered Sep 20 '22 23:09

Markus Dutschke

You can use merge instead:

>>> ld = pd.DataFrame(labeldict).T >>> ld.columns = ['color', 'size'] >>> ld.index.name = 'label' >>> df.merge(ld.reset_index(), on='label')   label  somedata  color    size 0     b  1.462108    red  medium 1     c -2.060141  green   small 2     c  1.133769  green   small 3     c  0.042214  green   small 4     e -0.322417    red  medium 5     e -1.099891    red  medium 6     e -0.877858    red  medium 7     e  0.582815    red  medium 8     f -0.384054    red   large 9     d -0.172428    red  medium

answered Sep 20 '22 23:09

alko

Related questions
                            
                                Emacs Inferior Python shell shows the send message with each python-shell-send-region command
                            
                                AppEngine bulkloader, high replication store and python27 runtime
                            
                                Logistic Regression PMML won't Produce Probabilities
                            
                                Out-of-core processing of sparse CSR arrays
                            
                                How can I define algebraic data types in Python?
                            
                                Python setuptools: how to include a config file for distribution into <prefix>/etc
                            
                                SQLAlchemy: Hybrid expression with relationship
                            
                                Can I write native iPhone, Android, Windows, Blackberry apps using Python? [duplicate]
                            
                                Return results from multiple models with Django REST Framework
                            
                                Why isn't __new__ in Python new-style classes a class method?
                            
                                Plug in django-allauth as endpoint in django-rest-framework
                            
                                Difference between different ways to create celery task
                            
                                Flask App: Update progress bar while function runs
                            
                                Specifying dtype float32 with pandas.read_csv on pandas 0.10.1
                            
                                Is there a Python language specification?
                            
                                Searching for a string in a large text file - profiling various methods in python
                            
                                Has anyone used Sphinx to document a C++ project? [closed]
                            
                                matplotlib legend location numbers
                            
                                Project design / FS layout for large django projects [closed]
                            
                                Python: efficiently check if integer is within *many* ranges

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: Assigning multiple new columns simultaneously

Tags:

python

pandas

8one6

People also ask

2 Answers

Markus Dutschke

alko

Recent Activity

Donate For Us

Pandas: Assigning multiple *new* columns simultaneously

Tags:

python

pandas

8one6

People also ask

2 Answers

Markus Dutschke

alko

Related questions

Recent Activity

Donate For Us

Pandas: Assigning multiple new columns simultaneously