Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing Headers in .csv files

Tags:

python

pandas

csv

Right now I am trying to read in data which is provided in a messy to read-in format. Here is an example

#Name,
#Comment,""
#ExtComment,""
#Source,
[Data]
1,2
3,4
5,6
#[END_OF_FILE]

When working with one or two of these files, I have manually changed the ['DATA'] header to ['x', 'y'] and am able to read in data just fine by skipping the first few rows and not reading the last line.

However, right now I have 30+ files, split between two different folders and I am trying to figure out the best way to read in the files and change the header of each file from ['DATA'] to ['x', 'y'].

The excel files are in a folder one path lower than the file that is supposed to read them (i.e. folder 1 contains set of code below, and folder 2 contains the excel files, folder 1 contains folder 2)

Here is what I have right now:

#sets - refers to the set containing the name of each file (i.e. [file1, file2])
#df - the dataframe which you are going to store the data in 
#dataLabels - the headers you want to search for within the .csv file
#skip - the number of rows you want to skip
#newHeader - what you want to change the column headers to be
#pathName - provide path where files are located

def reader (sets, df, dataLabels, skip, newHeader, pathName):
     for i in range(len(sets)):
        
        df_temp = pd.read_csv(glob.glob(pathName+ sets[i]+".csv"), sep=r'\s*,', skiprows = skip, engine = 'python')[:-1] 
        df_temp.column.value[0] = [newHeader]
        for j in range(len(dataLabels)):
           df_temp[dataLabels[j]] = pd.to_numeric(df_temp[dataLabels[j]],errors = 'coerce')       
        df.append(df_temp)       
     return df

When I run my code, I run into the error:

No columns to parse from file

I am not quite sure why - I have tried skipping past the [DATA] header and I still receive that error.

Note, for this example I would like the headers to be 'x', 'y' - I am trying to make a universal function so that I could change it to something more useful depending on what I am measuring.

like image 273
bigmac42 Avatar asked Jun 12 '26 13:06

bigmac42


1 Answers

If the #[DATA] row is to be replaced regardless, just ignore it. You can just tell pandas to ignore lines that start with # and then specify your own names:

import pandas as pd

df = pd.read_csv('test.csv', comment='#', names=['x', 'y'])

which gives

   x  y
0  1  2
1  3  4
2  5  6
like image 65
Kraigolas Avatar answered Jun 15 '26 02:06

Kraigolas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!