Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use if-else in pandas dataframes

I have started learning pandas and got stumbled at the below problem:

Following is a table which has data like:

Book:

B_IDX B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID
1     ABC    aaa      12.21   SCI     182
2     BCD    bbb      98      ECN     920
3     CDE    ccc      22.34   SCI     228
4     DEF    ddd      44.11   LIT     761
5     EFG    eee      0.99    MAT     10242
6     FGH    fff      4.99    MAT     77721

UCODE:

U_ID  U_CD
182   9982825
950   9992822
228   9999983
776   9912876
332   9003931

The requirement is to use the if..else logic to pull the data from the above mentioned tables.

Req.:

if B_UTYPE == 'SCI':
     pull the record from 'UCODE'
elif  B_UTYPE == 'MAT':
     split the  B_ID in 4 and 1 digits i.e. B_UTYPE.split[:2] and B_UTYPE.split[3:5]
else:
    keep the data as it is.

Excepted O/P:

B_ID B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID   U_ID    U_CD      N_COL1  N_COL2
1    ABC    aaa      12.21   SCI     182    182     9982825   NA      NA
2    BCD    bbb      98      ECN     920    NA      NA        NA      NA
3    CDE    ccc      22.34   SCI     228    228     9999983   NA      NA
4    DEF    ddd      44.11   LIT     761    NA      NA        NA      NA 
5    EFG    eee      0.99    MAT     10242  NA      NA        102     42
6    FGH    fff      4.99    MAT     77721  NA      NA        777     21 

Any help/tutorial where I can get some insight to achieve the expected output by meeting the above conditions?

like image 380
Alpha001 Avatar asked Oct 23 '18 19:10

Alpha001


People also ask

How do you add a condition to a DataFrame in Python?

(1) IF condition – Set of numbers Suppose that you created a DataFrame in Python that has 10 numbers (from 1 to 10). You then want to apply the following IF conditions: If the number is equal or lower than 4, then assign the value of 'True' Otherwise, if the number is greater than 4, then assign the value of 'False'

How do I compare two DataFrames in pandas?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.


2 Answers

For readability sake, build each result separately and then concatenate the pieces together.

u_id = df.B_ID.astype(str).where(df.B_UTYPE.eq('SCI'))
u_cd = df.B_ID.map(ucode.set_index('U_ID').U_CD.astype(str))
ncol = (df.B_ID.astype(str)
          .str.extract(r'(\d{3})(\d+)')
          .where(df.B_UTYPE.eq('MAT'))
          .rename(columns=lambda x: f'N_COL{x+1}'))

df = pd.concat([df, u_id, u_cd, ncol], axis=1)
print(df)

   B_IDX B_NAME B_AUTHOR  B_PRICE B_UTYPE   B_ID B_ID     B_ID N_COL1 N_COL2
0      1    ABC      aaa    12.21     SCI    182  182  9982825    NaN    NaN
1      2    BCD      bbb    98.00     ECN    920  NaN      NaN    NaN    NaN
2      3    CDE      ccc    22.34     SCI    228  228  9999983    NaN    NaN
3      4    DEF      ddd    44.11     LIT    761  NaN      NaN    NaN    NaN
4      5    EFG      eee     0.99     MAT  10242  NaN      NaN    102     42
5      6    FGH      fff     4.99     MAT  77721  NaN      NaN    777     21
like image 149
cs95 Avatar answered Oct 22 '22 06:10

cs95


This is a two step approach. First, you need to figure out which rows match which condition. Then once you have your conditions and outputs, you can use masking and assign to add the series to your DataFrame.

c1 = book.B_UTYPE.eq("SCI")
c2 = book.B_UTYPE.eq("MAT")

s1 = book.B_ID.map(ucode.set_index('U_ID').U_CD)
s2 = book.B_ID.astype(str)

Now for the fun part:

parts = {
    'U_ID': book.B_ID.mask(~c1),
    'U_CD': pd.Series(s1).mask(~c1),
    'N_COL1': s2.str[:3].mask(~c2),
    'N_COL2': s2.str[3:].mask(~c2)      
}

book.assign(**parts)

   ID B_NAME B_AUTHOR  B_PRICE B_UTYPE   B_ID   U_ID       U_CD N_COL1 N_COL2
0   1    ABC      aaa    12.21     SCI    182  182.0  9982825.0    NaN    NaN
1   2    BCD      bbb    98.00     ECN    920    NaN        NaN    NaN    NaN
2   3    CDE      ccc    22.34     SCI    228  228.0  9999983.0    NaN    NaN
3   4    DEF      ddd    44.11     LIT    761    NaN        NaN    NaN    NaN
4   5    EFG      eee     0.99     MAT  10242    NaN        NaN    102     42
5   6    FGH      fff     4.99     MAT  77721    NaN        NaN    777     21

Setup so you can reproduce:

book = pd.DataFrame({'ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6},
 'B_NAME': {0: 'ABC', 1: 'BCD', 2: 'CDE', 3: 'DEF', 4: 'EFG', 5: 'FGH'},
 'B_AUTHOR': {0: 'aaa', 1: 'bbb', 2: 'ccc', 3: 'ddd', 4: 'eee', 5: 'fff'},
 'B_PRICE': {0: 12.21, 1: 98.0, 2: 22.34, 3: 44.11, 4: 0.99, 5: 4.99},
 'B_UTYPE': {0: 'SCI', 1: 'ECN', 2: 'SCI', 3: 'LIT', 4: 'MAT', 5: 'MAT'},
 'B_ID': {0: 182, 1: 920, 2: 228, 3: 761, 4: 10242, 5: 77721}})

ucode = pd.DataFrame({'U_ID': {0: 182, 1: 950, 2: 228, 3: 776, 4: 332},
 'U_CD': {0: 9982825, 1: 9992822, 2: 9999983, 3: 9912876, 4: 9003931}})
like image 44
user3483203 Avatar answered Oct 22 '22 04:10

user3483203