Merging csv files with different headers with Pandas in Python

Tags:

I'm trying to map a dataset to a blank CSV file with different headers, so I'm essentially trying to map data from one CSV file which has different headers to a new CSV with different amount of headers and called different things, the reason this question is different is since the column names aren't the same but there are no overlapping columns either. And I can't overwrite the data file with new headers since the data file has other columns with irrelevant data, I'm certain I'm overcomplicating this.

I've seen this example code but how do I change this since this example is using a common header to join the data.

a = pd.read_csv("a.csv")
b = pd.read_csv("b.csv")
#a.csv = ID TITLE
#b.csv = ID NAME
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)

Sample Data

a.csv (blank format file, the format must match this file):

Headers: TOWN NAME LOCATION HEIGHT STAR

b.csv:

Headers: COUNTRY WEIGHT  NAME  AGE MEASUREMENT
 Data:    UK,     150lbs, John, 6,  6ft

Expected output file:

Headers: TOWN    NAME   LOCATION  HEIGHT  STAR
Data:    (Blank) John,  UK,       6ft    (Blank)

914

asked Mar 12 '20 08:03

MF DOOM

1 Answers

From your example, it looks like you need to do some column renaming in addition to the merge. This is easiest done before the merge itself.

# Read the csv files
dfA = pd.read_csv("a.csv")
dfB = pd.read_csv("b.csv")

# Rename the columns of b.csv that should match the ones in a.csv
dfB = dfB.rename(columns={'MEASUREMENT': 'HEIGHT', 'COUNTRY': 'LOCATION'})

# Merge on all common columns
df = pd.merge(dfA, dfB, on=list(set(dfA.columns) & set(dfB.columns)), how='outer')

# Only keep the columns that exists in a.csv
df = df[dfA.columns]

# Save to a new csv
df.to_csv("output.csv", index=False)

This should give you what you are after.

176

answered Nov 14 '22 23:11

Shaido

Related questions
                            
                                Installing data_files in setup.py with pip install -e
                            
                                Memory leak with tf.data
                            
                                In pycharm ImportError: DLL load failed: The specified module could not be found. while importing facerecognition
                            
                                understand sklearn QuantileTransformer
                            
                                Image in Jupyter Notebook ipynb doesn't show up in GitHub private repo but the same code works with public repo
                            
                                How to fix "module 'tensorflow' has no attribute 'estimator' " error
                            
                                Connection was closed in the middle of operation when accesing database using Python
                            
                                Tensorflow: Modern way to load large data
                            
                                tqdm and numpy vectorize
                            
                                Since latest python version retains insertion order of dict,will the meaning of equality (==) change?
                            
                                Absolute paths after freezing with cx_freeze (Qt5 / PySide2 App)
                            
                                How to establish TLS session in python using PKCS11
                            
                                How to plot predicted values vs the true value?
                            
                                Stop/fail docker build if tests fail
                            
                                GradienTape convergence much slower than Keras.model.fit
                            
                                Is there an equivalent of `sum()` builtin which uses augmented assignment?
                            
                                Strange performance results -- loop vs list comprehension and zip()
                            
                                Flask socket IO emit from another module
                            
                                Slicing arrays with lists
                            
                                Incorrect results with `annotate` + `values` + `union` in Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Merging csv files with different headers with Pandas in Python

Tags:

python

pandas

dataframe

csv

MF DOOM

People also ask

1 Answers

Shaido

Recent Activity

Donate For Us