I have a dataframe which I merge with another dataframe by the column EQ_NR.
Here is the Structure of the first dataframe: (Rows: 320816)
Here ist the second one: (Rows: 135818)
After merging I have one dataframe with 'TA' added from the second dataframe to the first one by the value of 'EQ_NR'.
The problem is that I have 320816 rows BEFORE merging and 320871 AFTER merging the two dataframes. What could happen that there are 55 more rows than in the basic data?
I need the Data to do some calculations and the 55 more rows distort the results of the calculations...
There is problem with duplicates in joining column NR.
In sample there are duplicated values 100363 and 100594.
Sample:
import pandas as pd
df1 = pd.DataFrame({'NR':[100363,100363,100363, 100365],
'B':[4,5,6,7],
'C':[7,8,9,7]})
print (df1)
B C NR
0 4 7 100363
1 5 8 100363
2 6 9 100363
3 7 7 100365
df2 = pd.DataFrame({'NR':[1001380363,1001380363,1001380363,1001380363],
'B':[4,5,6,8],
'C':[7,8,9,3]})
print (df2)
B C EQ_NR
0 4 7 1001380363
1 5 8 1001380363
2 6 9 1001380363
3 8 3 1001380363
print (pd.merge(df1, df2, on=['NR']))
B_x C_x NR B_y C_y
0 4 7 100363 4 7
1 4 7 100363 5 8
2 4 7 100363 6 9
3 4 7 100363 8 3
4 5 8 100363 4 7
5 5 8 100363 5 8
6 5 8 100363 6 9
7 5 8 100363 8 3
8 6 9 100363 4 7
9 6 9 100363 5 8
10 6 9 100363 6 9
11 6 9 100363 8 3
EDIT1:
If dataframe df2 have no duplicates data in NR, use:
print (df1)
ART DAT LEIST_DAT KD BW NR MATERIAL \
0 AF 2015-12-10 2015-12-31 P B 100363 S
1 AF 2015-12-10 2015-12-31 P B 100363 S
2 AF 2015-12-10 2015-12-31 P B 100363 S
3 AF 2015-12-10 2015-12-31 P B 100363 S
4 AF 2015-12-10 2015-12-31 P B 100363 S
5 AF 2015-12-10 2015-12-31 P B 100363 S
6 AF 2015-12-10 2015-12-31 P B 100363 S
7 AF 2015-12-10 2015-12-31 P E 100594 S
8 AF 2015-12-10 2015-12-31 P B 100594 S
KW_WERT NETTO_EURO TA
0 0.150 18.90 H
1 0.145 18.27 H
2 0.145 18.27 NaN
3 0.150 18.90 H
4 0.150 18.90 NaN
5 0.145 18.27 H
6 0.150 18.90 H
7 3.011 252.92 AN
8 3.412 429.91 AN
print (df2)
NR TA
0 100363 Sonstiges
1 100704 Sonstiges
2 100823 Sonstiges
3 100985 Sonstiges
4 100774 D
5 100790 D
6 100097 NaN
print (pd.merge(df1, df2, on=['NR'], how='left', suffixes=('','_new')))
ART DAT LEIST_DAT KD MW_BW NR MATERIAL \
0 F 2015-12-10 2015-12-31 P B 100363 S
1 F 2015-12-10 2015-12-31 P B 100363 S
2 F 2015-12-10 2015-12-31 P B 100363 S
3 F 2015-12-10 2015-12-31 P B 100363 S
4 F 2015-12-10 2015-12-31 P B 100363 S
5 F 2015-12-10 2015-12-31 P B 100363 S
6 F 2015-12-10 2015-12-31 P B 100363 S
7 F 2015-12-10 2015-12-31 P E 100594 S
8 F 2015-12-10 2015-12-31 P B 100594 S
KW_WERT NETTO_EURO TA TA_new
0 0.150 18.90 H Sonstiges
1 0.145 18.27 H Sonstiges
2 0.145 18.27 NaN Sonstiges
3 0.150 18.90 H Sonstiges
4 0.150 18.90 NaN Sonstiges
5 0.145 18.27 H Sonstiges
6 0.150 18.90 H Sonstiges
7 3.011 252.92 AN NaN
8 3.412 429.91 AN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With