I am trying to figure out some fast and clean way to map values from one DataFrame
A
to another. Let say I have DataFrame
like this one:
C1 C2 C3 C4 C5
1 a b c a
2 d a e b a
3 a c
4 b e e
And now I want to change those letter codes to actual values. My DataFrame
B
with explanations looks like that:
Code Value
1 a 'House'
2 b 'Bike'
3 c 'Lamp'
4 d 'Window'
5 e 'Car'
So far my brute-force approach was to just go through every element in A
and check with isin()
the value in B
. I know that I can also use Series
(or simple dictionary) as an B
instead of DataFrame
and use for example Code
column as a index. But still I would need to use multiple loops to map everything.
Is there any other nice way to achieve my goal?
Mapping external value to a dataframe means using different sets of values to add in that dataframe by keeping the keys of external dictionary as same as the one column of that dataframe.
You can use Pandas merge function in order to get values and columns from another DataFrame. For this purpose you will need to have reference column between both DataFrames or use the index.
So to replace values from another DataFrame when different indices we can use: Now the values are correctly set: You can use Pandas merge function in order to get values and columns from another DataFrame. For this purpose you will need to have reference column between both DataFrames or use the index.
This is possible because both DataFrames have identical indices and shapes. Otherwise error or unexpected results might happen. If you need to replace values for multiple columns from another DataFrame - this is the syntax:
Another alternative is map. Although it requires looping over columns, if I didn't mess up the tests, it is still faster than replace:
A = pd.DataFrame(np.random.choice(list("abcdef"), (1000, 1000)))
B = pd.DataFrame({'Code': ['a', 'b', 'c', 'd', 'e'],
'Value': ["'House'", "'Bike'", "'Lamp'", "'Window'", "'Car'"]})
B = B.set_index("Code")["Value"]
%timeit A.replace(B)
1 loop, best of 3: 970 ms per loop
C = pd.DataFrame()
%%timeit
for col in A:
C[col] = A[col].map(B).fillna(A[col])
1 loop, best of 3: 586 ms per loop
You could use replace
:
A.replace(B.set_index('Code')['Value'])
import pandas as pd
A = pd.DataFrame(
{'C1': ['a', 'd', 'a', 'b'],
'C2': ['b', 'a', 'c', 'e'],
'C3': ['c', 'e', '', 'e'],
'C4': ['a', 'b', '', ''],
'C5': ['', 'a', '', '']})
B = pd.DataFrame({'Code': ['a', 'b', 'c', 'd', 'e'],
'Value': ["'House'", "'Bike'", "'Lamp'", "'Window'", "'Car'"]})
print(A.replace(B.set_index('Code')['Value']))
yields
C1 C2 C3 C4 C5
0 'House' 'Bike' 'Lamp' 'House'
1 'Window' 'House' 'Car' 'Bike' 'House'
2 'House' 'Lamp'
3 'Bike' 'Car' 'Car'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With