Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently creating additional columns in a pandas DataFrame using .map()

I am analyzing a data set that is similar in shape to the following example. I have two different types of data (abc data and xyz data):

   abc1  abc2  abc3  xyz1  xyz2  xyz3
0     1     2     2     2     1     2
1     2     1     1     2     1     1
2     2     2     1     2     2     2
3     1     2     1     1     1     1
4     1     1     2     1     2     1

I want to create a function that adds a categorizing column for each abc column that exists in the dataframe. Using lists of column names and a category mapping dictionary, I was able to get my desired result.

abc_columns = ['abc1', 'abc2', 'abc3']
xyz_columns = ['xyz1', 'xyz2', 'xyz3']
abc_category_columns = ['abc1_category', 'abc2_category', 'abc3_category']
categories = {1: 'Good', 2: 'Bad', 3: 'Ugly'}

for i in range(len(abc_category_columns)):
    df3[abc_category_columns[i]] = df3[abc_columns[i]].map(categories)

print df3

The end result:

   abc1  abc2  abc3  xyz1  xyz2  xyz3 abc1_category abc2_category abc3_category
0     1     2     2     2     1     2          Good           Bad           Bad
1     2     1     1     2     1     1           Bad          Good          Good
2     2     2     1     2     2     2           Bad           Bad          Good
3     1     2     1     1     1     1          Good           Bad          Good
4     1     1     2     1     2     1          Good          Good           Bad

While the for loop at the end works fine, I feel like I should be using Python's lambda function, but can't seem to figure it out.

Is there a more efficient way to map in a dynamic number of abc-type columns?

like image 443
Daniel Romero Avatar asked May 15 '13 22:05

Daniel Romero


People also ask

Is pandas map faster than apply?

Series Map: We could also choose to map the function over each element within the Pandas Series. This is actually somewhat faster than Series Apply, but still relatively slow.

How do I add more columns in pandas?

In pandas you can add/append a new column to the existing DataFrame using DataFrame. insert() method, this method updates the existing DataFrame with a new column. DataFrame. assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.

What pandas method will you use to map columns between two data frames?

pandas. map() is used to map values from two series having one column same. For mapping two series, the last column of the first series should be same as index column of the second series, also the values should be unique.


1 Answers

You can use applymap with the dictionary get method:

In [11]: df[abc_columns].applymap(categories.get)
Out[11]:
   abc1  abc2  abc3
0  Good   Bad   Bad
1   Bad  Good  Good
2   Bad   Bad  Good
3  Good   Bad  Good
4  Good  Good   Bad

And put this to the specified columns:

In [12]: abc_categories = map(lambda x: x + '_category', abc_columns)

In [13]: abc_categories
Out[13]: ['abc1_category', 'abc2_category', 'abc3_category']

In [14]: df[abc_categories] = df[abc_columns].applymap(categories.get)

Note: you can construct abc_columns relatively efficiently using a list comprehension:

abc_columns = [col for col in df.columns if str(col).startswith('abc')]
like image 167
Andy Hayden Avatar answered Sep 26 '22 00:09

Andy Hayden