Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading files with multiple delimiter in column headers and skipping some rows at the end

I am new to Python and I would like to use pandas for reading the data. I have done some searching and effort to solve my issue but still I am struggling. thanks for your help in advance!

I have a.txt file looking like this;

skip1
 A1| A2 |A3 |A4# A5# A6 A7| A8 , A9
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9

END***
Some other data starts from here

The first task is that I would like to assign A1,A2,A3,A4,A5,A6,A7,A8 and A9 as column names. However, there are multiple separators such as ' ','|','#' and this makes hassle to assign separator when reading the file. I tried like this;

import pandas as pd
import glob
filelist=glob.glob('*.txt')
print(filelist)

df = pd.read_csv(filelist,skiprows=1,skipfooter=2,skipinitialspace=True, header=0, sep=r'\| |,|#',engine='python') 

But it seems that nothing is happened when I check Spyder's data explorer df.

The second task is that during the reading removing the data starting with the rows END*** that I don't need. The header has always the same length. However, skipfooter needs the number of lines to skip, which should be changed between the files.

Some several questions already been asked but It seems I can't make them work on my question!

how-to-read-txt-file-in-pandas-with-multiple-delimiters

pandas-read-delimited-file?

import-text-to-pandas-with-multiple-delimiters

pandas-ignore-all-lines-following-a-specific-string-when-reading-a-file-into-a

EDIT: about removing the the reading removing the data starting with the rows END

If the b.txt file like this b.txt

skip1
 A1| A2 |A3 |A4# A5# A6 A7| A8 , A9
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9

END123
Some other data starts from here

an by using the second solution below;

txt = open('b.txt').read().split('\nEND')[0]
_, h, txt = txt.split('\n', 2)
pat = r'[\|, ,#,\,]+'
names = re.split(pat, h.strip())

pd.read_csv(
    pd.io.common.StringIO(txt),
    names=names, header=None,
    engine='python')

Getting this,

   A1  A2  A3  A4  A5  A6  A7  A8  A9
0   1   2   3   4   5   6   7   8   9
1   1   2   3   4   5   6   7   8   9
2   1   2   3   4   5   6   7   8   9
like image 814
Alexander Avatar asked Dec 12 '25 07:12

Alexander


1 Answers

Split the file, then read from string

txt = open('test.txt').read().split('\nEND***')[0]
pd.read_csv(
    pd.io.common.StringIO(txt),
    sep=r'\W+',
    skiprows=1, engine='python')

   A1  A2  A3  A4  A5  A6  A7  A8  A9
0   1   2   3   4   5   6   7   8   9
1   1   2   3   4   5   6   7   8   9
2   1   2   3   4   5   6   7   8   9

We can be very explicit with the parsing of the header and parse the rest of the file as csv

txt = open('test.txt').read().split('\nEND***')[0]
_, h, txt = txt.split('\n', 2)
pat = r'[\|, ,#,\,]+'
names = re.split(pat, h.strip())

pd.read_csv(
    pd.io.common.StringIO(txt),
    names=names, header=None,
    engine='python')

   A1  A2  A3  A4  A5  A6  A7  A8  A9
0   1   2   3   4   5   6   7   8   9
1   1   2   3   4   5   6   7   8   9
2   1   2   3   4   5   6   7   8   9
like image 64
piRSquared Avatar answered Dec 13 '25 21:12

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!