Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mapping values from one DataFrame to another

Tags:

python

pandas

I am trying to figure out some fast and clean way to map values from one DataFrame A to another. Let say I have DataFrame like this one:

    C1  C2  C3  C4  C5
1   a   b   c   a
2   d   a   e   b   a
3   a   c
4   b   e   e

And now I want to change those letter codes to actual values. My DataFrame Bwith explanations looks like that:

    Code    Value
1   a       'House'
2   b       'Bike'
3   c       'Lamp'
4   d       'Window'
5   e       'Car'

So far my brute-force approach was to just go through every element in A and check with isin() the value in B. I know that I can also use Series (or simple dictionary) as an B instead of DataFrame and use for example Code column as a index. But still I would need to use multiple loops to map everything.

Is there any other nice way to achieve my goal?

like image 213
sebap123 Avatar asked Jun 07 '16 17:06

sebap123


People also ask

What does it mean to map external values to a Dataframe?

Mapping external value to a dataframe means using different sets of values to add in that dataframe by keeping the keys of external dictionary as same as the one column of that dataframe.

How to get values from another Dataframe in pandas?

You can use Pandas merge function in order to get values and columns from another DataFrame. For this purpose you will need to have reference column between both DataFrames or use the index.

How to replace values from another Dataframe when different indices are used?

So to replace values from another DataFrame when different indices we can use: Now the values are correctly set: You can use Pandas merge function in order to get values and columns from another DataFrame. For this purpose you will need to have reference column between both DataFrames or use the index.

Is it possible to combine two DataFrames into one Dataframe?

This is possible because both DataFrames have identical indices and shapes. Otherwise error or unexpected results might happen. If you need to replace values for multiple columns from another DataFrame - this is the syntax:


2 Answers

Another alternative is map. Although it requires looping over columns, if I didn't mess up the tests, it is still faster than replace:

A = pd.DataFrame(np.random.choice(list("abcdef"), (1000, 1000)))
B = pd.DataFrame({'Code': ['a', 'b', 'c', 'd', 'e'],
                  'Value': ["'House'", "'Bike'", "'Lamp'", "'Window'", "'Car'"]})
B = B.set_index("Code")["Value"]

%timeit A.replace(B)
1 loop, best of 3: 970 ms per loop

C = pd.DataFrame()

%%timeit
for col in A:
    C[col] = A[col].map(B).fillna(A[col])
1 loop, best of 3: 586 ms per loop
like image 128
ayhan Avatar answered Oct 12 '22 08:10

ayhan


You could use replace:

A.replace(B.set_index('Code')['Value'])

import pandas as pd
A = pd.DataFrame(
    {'C1': ['a', 'd', 'a', 'b'],
     'C2': ['b', 'a', 'c', 'e'],
     'C3': ['c', 'e', '', 'e'],
     'C4': ['a', 'b', '', ''],
     'C5': ['', 'a', '', '']})
B = pd.DataFrame({'Code': ['a', 'b', 'c', 'd', 'e'],
                  'Value': ["'House'", "'Bike'", "'Lamp'", "'Window'", "'Car'"]})
print(A.replace(B.set_index('Code')['Value']))

yields

         C1       C2      C3       C4       C5
0   'House'   'Bike'  'Lamp'  'House'         
1  'Window'  'House'   'Car'   'Bike'  'House'
2   'House'   'Lamp'                          
3    'Bike'    'Car'   'Car'                  
like image 35
unutbu Avatar answered Oct 12 '22 07:10

unutbu