I have started learning pandas and got stumbled at the below problem:
Following is a table which has data like:
Book:
B_IDX B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID
1 ABC aaa 12.21 SCI 182
2 BCD bbb 98 ECN 920
3 CDE ccc 22.34 SCI 228
4 DEF ddd 44.11 LIT 761
5 EFG eee 0.99 MAT 10242
6 FGH fff 4.99 MAT 77721
UCODE:
U_ID U_CD
182 9982825
950 9992822
228 9999983
776 9912876
332 9003931
The requirement is to use the if..else logic to pull the data from the above mentioned tables.
Req.:
if B_UTYPE == 'SCI':
pull the record from 'UCODE'
elif B_UTYPE == 'MAT':
split the B_ID in 4 and 1 digits i.e. B_UTYPE.split[:2] and B_UTYPE.split[3:5]
else:
keep the data as it is.
Excepted O/P:
B_ID B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID U_ID U_CD N_COL1 N_COL2
1 ABC aaa 12.21 SCI 182 182 9982825 NA NA
2 BCD bbb 98 ECN 920 NA NA NA NA
3 CDE ccc 22.34 SCI 228 228 9999983 NA NA
4 DEF ddd 44.11 LIT 761 NA NA NA NA
5 EFG eee 0.99 MAT 10242 NA NA 102 42
6 FGH fff 4.99 MAT 77721 NA NA 777 21
Any help/tutorial where I can get some insight to achieve the expected output by meeting the above conditions?
(1) IF condition – Set of numbers Suppose that you created a DataFrame in Python that has 10 numbers (from 1 to 10). You then want to apply the following IF conditions: If the number is equal or lower than 4, then assign the value of 'True' Otherwise, if the number is greater than 4, then assign the value of 'False'
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
For readability sake, build each result separately and then concatenate the pieces together.
u_id = df.B_ID.astype(str).where(df.B_UTYPE.eq('SCI'))
u_cd = df.B_ID.map(ucode.set_index('U_ID').U_CD.astype(str))
ncol = (df.B_ID.astype(str)
.str.extract(r'(\d{3})(\d+)')
.where(df.B_UTYPE.eq('MAT'))
.rename(columns=lambda x: f'N_COL{x+1}'))
df = pd.concat([df, u_id, u_cd, ncol], axis=1)
print(df)
B_IDX B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID B_ID B_ID N_COL1 N_COL2
0 1 ABC aaa 12.21 SCI 182 182 9982825 NaN NaN
1 2 BCD bbb 98.00 ECN 920 NaN NaN NaN NaN
2 3 CDE ccc 22.34 SCI 228 228 9999983 NaN NaN
3 4 DEF ddd 44.11 LIT 761 NaN NaN NaN NaN
4 5 EFG eee 0.99 MAT 10242 NaN NaN 102 42
5 6 FGH fff 4.99 MAT 77721 NaN NaN 777 21
This is a two step approach. First, you need to figure out which rows match which condition. Then once you have your conditions and outputs, you can use masking and assign
to add the series to your DataFrame.
c1 = book.B_UTYPE.eq("SCI")
c2 = book.B_UTYPE.eq("MAT")
s1 = book.B_ID.map(ucode.set_index('U_ID').U_CD)
s2 = book.B_ID.astype(str)
Now for the fun part:
parts = {
'U_ID': book.B_ID.mask(~c1),
'U_CD': pd.Series(s1).mask(~c1),
'N_COL1': s2.str[:3].mask(~c2),
'N_COL2': s2.str[3:].mask(~c2)
}
book.assign(**parts)
ID B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID U_ID U_CD N_COL1 N_COL2
0 1 ABC aaa 12.21 SCI 182 182.0 9982825.0 NaN NaN
1 2 BCD bbb 98.00 ECN 920 NaN NaN NaN NaN
2 3 CDE ccc 22.34 SCI 228 228.0 9999983.0 NaN NaN
3 4 DEF ddd 44.11 LIT 761 NaN NaN NaN NaN
4 5 EFG eee 0.99 MAT 10242 NaN NaN 102 42
5 6 FGH fff 4.99 MAT 77721 NaN NaN 777 21
Setup so you can reproduce:
book = pd.DataFrame({'ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6},
'B_NAME': {0: 'ABC', 1: 'BCD', 2: 'CDE', 3: 'DEF', 4: 'EFG', 5: 'FGH'},
'B_AUTHOR': {0: 'aaa', 1: 'bbb', 2: 'ccc', 3: 'ddd', 4: 'eee', 5: 'fff'},
'B_PRICE': {0: 12.21, 1: 98.0, 2: 22.34, 3: 44.11, 4: 0.99, 5: 4.99},
'B_UTYPE': {0: 'SCI', 1: 'ECN', 2: 'SCI', 3: 'LIT', 4: 'MAT', 5: 'MAT'},
'B_ID': {0: 182, 1: 920, 2: 228, 3: 761, 4: 10242, 5: 77721}})
ucode = pd.DataFrame({'U_ID': {0: 182, 1: 950, 2: 228, 3: 776, 4: 332},
'U_CD': {0: 9982825, 1: 9992822, 2: 9999983, 3: 9912876, 4: 9003931}})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With