Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas, fillna/bfill to concat and coalesce fields

Tags:

python

pandas

I am trying to do the following logic in a join_key. date + book + bdr + COALECSE(cusip,isin,deal,id)

+------------+------+------+-----------+--------------+------+------------+----------------------------+
|  endOfDay  | book | bdr  |   cusip   |     isin     | Deal |     Id     |          join_key          |
+------------+------+------+-----------+--------------+------+------------+----------------------------+
| 31/10/2019 |   15 | ITOR | 371494AM7 | US371494AM77 |  161 | 8013210731 | 20191031|15|ITOR|371494AM7 |
| 31/10/2019 |   15 | ITOR |           |              |      | 8011898573 | 20191031|15|ITOR|          |
| 31/10/2019 |   15 | ITOR |           |              |      | 8011898742 | 20191031|15|ITOR|          |
| 31/10/2019 |   15 | ITOR |           |              |      | 8011899418 | 20191031|15|ITOR|          |
+------------+------+------+-----------+--------------+------+------------+----------------------------+

I am trying to use :

df['join_key'] = ("20191031|" + df['book'].astype('str') + "|" + df['bdr'] + "|" + df[['cusip', 'isin', 'Deal', 'Id']].bfill(1)['cusip'].astype(str))

Also tried:

df['position_join_key'] = "20191031|" + df['book'].astype('str') + "|" + df['bdr'] + "|" + df['cusip'].fillna(df['isin']).fillna(df['Deal']).fillna(df['Id']).astype('str') 

For some reason this code isnt picking up Id as part of the key.

For example in the 2nd row I should get 20191031|15|ITOR|8011898573.

Also if it helps it comes from a csv where I use na_filter = False

Sample Input:

+------------+------+------+-----------+-------------+------+------------+
|  endOfDay  | book | bdr  |   cusip   |    isin     | Deal |     Id     |
+------------+------+------+-----------+-------------+------+------------+
| 31/10/2019 |   15 | ITOR | 371494AM7 |             |  161 | 8013210731 |
| 31/10/2019 |   15 | ITOR |           | 3.16248E+11 |      | 8011898573 |
| 31/10/2019 |   15 | ITOR |           |             |  352 | 8011898742 |
| 31/10/2019 |   15 | ITOR |           |             |      | 8011899418 |
+------------+------+------+-----------+-------------+------+------------+

Sample output:

+----------------------------+
|          join_key          |
+----------------------------+
| 43769|15|ITOR|371494AM7    |
| 43769|15|ITOR|316247735264 |
| 43769|15|ITOR|352          |
| 43769|15|ITOR|8011899418   |
+----------------------------+
like image 404
excelguy Avatar asked Oct 16 '22 09:10

excelguy


1 Answers

We can approach your problem in a general way the following:

  1. First we create a temporary column called temp which is the values backfilled.
  2. We insert the column after your bdr column
  3. We convert your date column to datetime
  4. We can '|'.join the first 4 columns and create join_key

notice: step 3 I added to keep your code general, so we don't hardcode 20191031 like you did yourself.

s = df[['cusip', 'isin', 'Deal', 'Id']].replace('', np.NaN).bfill(axis=1).iloc[:, 0]
df.insert(3, 'temp', s)

df['endOfDay'] = pd.to_datetime(df['endOfDay']).dt.strftime('%Y%m%d')

df['join_key'] = df.iloc[:, :4].apply(lambda x: '|'.join(x.astype(str).to_numpy()), axis=1)
df = df.drop(columns='temp')
   endOfDay  book   bdr      cusip          isin Deal          Id                     join_key
0  20191031    15  ITOR  371494AM7  US371494AM77  161  8013210731   20191031|15|ITOR|371494AM7
1  20191031    15  ITOR                                8011898573  20191031|15|ITOR|8011898573
2  20191031    15  ITOR                                8011898742  20191031|15|ITOR|8011898742
3  20191031    15  ITOR                                8011899418  20191031|15|ITOR|8011899418
like image 85
Erfan Avatar answered Oct 20 '22 11:10

Erfan