Efficiently creating additional columns in a pandas DataFrame using .map()

Tags:

I am analyzing a data set that is similar in shape to the following example. I have two different types of data (abc data and xyz data):

   abc1  abc2  abc3  xyz1  xyz2  xyz3
0     1     2     2     2     1     2
1     2     1     1     2     1     1
2     2     2     1     2     2     2
3     1     2     1     1     1     1
4     1     1     2     1     2     1

I want to create a function that adds a categorizing column for each abc column that exists in the dataframe. Using lists of column names and a category mapping dictionary, I was able to get my desired result.

abc_columns = ['abc1', 'abc2', 'abc3']
xyz_columns = ['xyz1', 'xyz2', 'xyz3']
abc_category_columns = ['abc1_category', 'abc2_category', 'abc3_category']
categories = {1: 'Good', 2: 'Bad', 3: 'Ugly'}

for i in range(len(abc_category_columns)):
    df3[abc_category_columns[i]] = df3[abc_columns[i]].map(categories)

print df3

The end result:

   abc1  abc2  abc3  xyz1  xyz2  xyz3 abc1_category abc2_category abc3_category
0     1     2     2     2     1     2          Good           Bad           Bad
1     2     1     1     2     1     1           Bad          Good          Good
2     2     2     1     2     2     2           Bad           Bad          Good
3     1     2     1     1     1     1          Good           Bad          Good
4     1     1     2     1     2     1          Good          Good           Bad

While the for loop at the end works fine, I feel like I should be using Python's lambda function, but can't seem to figure it out.

Is there a more efficient way to map in a dynamic number of abc-type columns?

443

asked May 15 '13 22:05

Daniel Romero

1 Answers

You can use applymap with the dictionary get method:

In [11]: df[abc_columns].applymap(categories.get)
Out[11]:
   abc1  abc2  abc3
0  Good   Bad   Bad
1   Bad  Good  Good
2   Bad   Bad  Good
3  Good   Bad  Good
4  Good  Good   Bad

And put this to the specified columns:

In [12]: abc_categories = map(lambda x: x + '_category', abc_columns)

In [13]: abc_categories
Out[13]: ['abc1_category', 'abc2_category', 'abc3_category']

In [14]: df[abc_categories] = df[abc_columns].applymap(categories.get)

Note: you can construct abc_columns relatively efficiently using a list comprehension:

abc_columns = [col for col in df.columns if str(col).startswith('abc')]

167

answered Sep 26 '22 00:09

Andy Hayden

Related questions
                            
                                Chaining tests and passing an object from one test to another
                            
                                AttributeError: module 'numpy' has no attribute '__version__'
                            
                                Makefile can't use `conda activate`
                            
                                Cannot run apache airflow after fresh install, python import error
                            
                                Trie (Prefix Tree) in Python
                            
                                How to add a namespace to an attribute in lxml
                            
                                Does a library to prevent duplicate form submissions exist for django?
                            
                                Why are closures broken within exec?
                            
                                Does Django have a way to open a HTTP long poll connection?
                            
                                Python subprocess get children's output to file and terminal?
                            
                                Comparing two urls in Python
                            
                                Sorting a list of RGB triplets into a spectrum
                            
                                Python equivalent of LINQ All function?
                            
                                Deleting a Secure Cookie in tornado
                            
                                Create and stream a large archive without storing it in memory or on disk
                            
                                python: plotting a histogram with a function line on top
                            
                                How to get XML tag value in Python
                            
                                How can I defer the execution of Celery tasks?
                            
                                Django/Python: generate pdf with the proper language
                            
                                Changing position of vertical (z) axis of 3D plot (Matplotlib)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficiently creating additional columns in a pandas DataFrame using .map()

Tags:

python

pandas

dataframe

Daniel Romero

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us