Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignore character while importing with pandas

Tags:

python

pandas

csv

I could not find such an option in the documentation. A measuring device spits out everything in Excel:

    <>    A    B    C
 1
 2
 3

When I delete the "<>" characters manually everything works fine. Is there a way to circumvent that (without conversion to csv)?

I do:

import pandas as pd 
df = pd.read_excel(filename,sheetname,skiprows=0,header=0,index_col=0)

skiprow = 1 does not do the trick since pandas uses the first row as names. If I supply names = list(range(1, 4)) the first data row is lost.

like image 607
Moritz Avatar asked Oct 29 '25 18:10

Moritz


2 Answers

Expanding on Peruz's answer:-

For your case, using regex

df = pd.read_csv(filename, sep="(?<!<>)\s+", engine='python')

This should read in the columns properly, except that the first column would be named <> A

To change this, simply alter the first column name

df.columns = pd.Series(df.columns.str.replace("<>\s", ""))

In the regex expression, \s+ matches any number of space characters except when preceded by whatever is mentioned in the negative lookaround denoted by (?<!charceters_to_ignore)

like image 161
Aritra Avatar answered Oct 31 '25 08:10

Aritra


Another option would be:

f = open(fname, 'r')
line1 = f.readline()
data1 = pd.read_csv(f, sep='\s+', names=line1.replace(' #', '').split(), dtype=np.float)

You might have a different separator though.

like image 45
wander95 Avatar answered Oct 31 '25 07:10

wander95