I wanted to merge two datasets on their key value and got strange results. I made a simple version to reproduce that problem.
df = pd.DataFrame({'key':[1, 2, 3]})
other = pd.DataFrame({'key':[1, 2, 3]})
df.join(
other,
on='key',
lsuffix='_caller'
)
I got this output:
key_caller key
0 1 2.0
1 2 3.0
2 3 NaN
I thought this was strange, so I decided to try this one:
df = pd.DataFrame({'key':[i for i in range(3)]})
other = pd.DataFrame({'key':[i for i in range(3)]})
df.join(
other,
on='key',
lsuffix='_caller'
)
And got the result I expected:
key_caller key
0 0 0
1 1 1
2 2 2
If there is no zero then the join is messed up, but if there is zero everything works fine.
So can someone explain what's going on?
If you can't hear or understand something, it's unintelligible (and probably frustrating too).
When you can say something in multiple ways using different words, you understand it really well. Being able to explain something in a simple, accessible way shows you've done the work required to learn. Skipping it leads to the illusion of knowledge—an illusion that can be quickly shattered when challenged.
Dyslexia is one type of reading disorder. It generally refers to difficulties reading individual words and can lead to problems understanding text. Most reading disorders result from specific differences in the way the brain processes written words and text. Usually, these differences are present from a young age.
The values of the two examples are different. In the first, they are 1, 2, and 3. In the second example, they are 0, 1, 2.
join
uses the column name in the left dataframe and the index in the right. In the second example, because you used range
, the index of the right dataframe is identical to the values of key in the left dataframe, so the match is perfect. In the first example, there is no index for 3, so you get NaN, which causes the values to be converted to float.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With