I'm trying to map a dataset to a blank CSV file with different headers, so I'm essentially trying to map data from one CSV file which has different headers to a new CSV with different amount of headers and called different things, the reason this question is different is since the column names aren't the same but there are no overlapping columns either. And I can't overwrite the data file with new headers since the data file has other columns with irrelevant data, I'm certain I'm overcomplicating this.
I've seen this example code but how do I change this since this example is using a common header to join the data.
a = pd.read_csv("a.csv")
b = pd.read_csv("b.csv")
#a.csv = ID TITLE
#b.csv = ID NAME
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)
Sample Data
a.csv (blank format file, the format must match this file):
Headers: TOWN NAME LOCATION HEIGHT STAR
b.csv:
Headers: COUNTRY WEIGHT NAME AGE MEASUREMENT
Data: UK, 150lbs, John, 6, 6ft
Expected output file:
Headers: TOWN NAME LOCATION HEIGHT STAR
Data: (Blank) John, UK, 6ft (Blank)
To merge all CSV files, use the GLOB module. The os. path. join() method is used inside the concat() to merge the CSV files together.
From your example, it looks like you need to do some column renaming in addition to the merge
. This is easiest done before the merge
itself.
# Read the csv files
dfA = pd.read_csv("a.csv")
dfB = pd.read_csv("b.csv")
# Rename the columns of b.csv that should match the ones in a.csv
dfB = dfB.rename(columns={'MEASUREMENT': 'HEIGHT', 'COUNTRY': 'LOCATION'})
# Merge on all common columns
df = pd.merge(dfA, dfB, on=list(set(dfA.columns) & set(dfB.columns)), how='outer')
# Only keep the columns that exists in a.csv
df = df[dfA.columns]
# Save to a new csv
df.to_csv("output.csv", index=False)
This should give you what you are after.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With