Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update target column with optional source columns

I am trying to load on prem transaction log table and update a databricks table.In Source table,the column reqdetails table hold all the information and expect ProductID remaining columns are dynamic(not all columns exists in the request) all time the optional/nullable.Could you please let me the process using python in databricks ? Explored StructType but it needs mandatory specific columns.

Source Table

Req ID  Type    Req Details Status
1   Update  ProductID=234;ProductName=LawnMover;Price=58    True
2   Update  ProductID=874;Price=478 True
3   Update  ProductID=678;ProductParentgroup=Watersuppuly;Price=1.6 True

Target table before Update

ProductID   ProductParentgroup  ProductName    Price
234         Utility             Mover          86
874         HOA                 Sink           450
678         Water               Filters        1.2

Target table after Update

ProductID   ProductParentgroup  ProductName        Price
234         Utility             LawnMover          58
874         HOA                 Sink               478
678         Watersupply         Filters            1.6
like image 242
RUC Avatar asked Dec 09 '25 16:12

RUC


1 Answers

Split ReqDetails from the source table using regexp_extract, join source and target table and then merge the respective columns using coalesce.

If regexp_extract does not match a value in ReqDetails, it returns an empty string.

from pyspark.sql import functions as F

srcDf = ...
targetDf = ...

src2Df = srcDf.withColumn('ProductID', F.regexp_extract('ReqDetails', 'ProductID=(.*?);', 1)) \
     .withColumn('SrcProductParentgroup', F.regexp_extract('ReqDetails', 'ProductParentgroup=(.*?);', 1)) \
     .withColumn('SrcProductName', F.regexp_extract('ReqDetails', 'ProductName=(.*?);', 1)) \
     .withColumn('SrcPrice', F.regexp_extract('ReqDetails', 'Price=(.*)', 1)) \
     .na.replace('', None)


targetDf.join(src2Df, on='ProductID', how='left') \
    .withColumn('ProductParentgroup', F.coalesce('SrcProductParentgroup', 'ProductParentgroup')) \
    .withColumn('ProductName', F.coalesce('SrcProductName', 'ProductName')) \
    .withColumn('Price', F.coalesce('SrcPrice', 'Price')) \
    .select('ProductID', 'ProductParentgroup', 'ProductName', 'Price') \
    .show()

Output:

+---------+------------------+-----------+-----+
|ProductID|ProductParentgroup|ProductName|Price|
+---------+------------------+-----------+-----+
|      234|           Utility|  LawnMover|   58|
|      678|      Watersuppuly|    Filters|  1.6|
|      874|               HOA|       Sink|  478|
+---------+------------------+-----------+-----+
like image 100
werner Avatar answered Dec 11 '25 04:12

werner



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!