I'm working on a data cleaning task and could use some help. I have two CSV files with thousands of rows each:
File A contains product shipment records. File B contains product descriptions and categories. Here’s a simplified example:
File A (shipments.csv):
shipment_id,product_code,quantity,date
S001,P123,10,2025-07-01
S002,P456,5,2025-07-02
S003,P789,8,2025-07-03 
File B (products.csv):
product_code,description,category
P123,Widget A,Tools
P456,Widget B,Hardware
I want to create a merged file where each row from File A is enriched with the matching product description and category from File B (based on product_code). If there's no match, I’d like to keep the row from File A and fill the missing columns with "N/A".
Expected Output:
shipment_id,product_code,quantity,date,description,category
S001,P123,10,2025-07-01,Widget A,Tools
S002,P456,5,2025-07-02,Widget B,Hardware
S003,P789,8,2025-07-03,N/A,N/A
I tried using pandas.merge() in Python but it drops unmatched rows unless I use how='left', and I’m not sure how to fill missing values properly.
Any help? Thanks in advance!
As tagged, using awk (note the order of the files):
awk -F , -v OFS=, '
  NR == FNR {a[$1] = $2 OFS $3; next}
  {print $0, ($2 in a? a[$2]: "N/A" OFS "N/A")}
' products.csv shipments.csv 
shipment_id,product_code,quantity,date,description,category
S001,P123,10,2025-07-01,Widget A,Tools
S002,P456,5,2025-07-02,Widget B,Hardware
S003,P789,8,2025-07-03,N/A,N/A
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With