Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas extract comment lines

Tags:

python

pandas

I have a data file containing a first few lines of comments and then the actual data.

#param1 : val1
#param2 : val2
#param3 : val3
12
2
1
33
12
0
12
...

I can read the data as pandas.read_csv(filename, comment='#',header=None). However I also wish to separately read the comment lines in order to extract read the parameter values. So far I only came across skipping or removing the comment lines, but how to also separately extract the comment lines?

like image 970
jaydeepsb Avatar asked Sep 27 '16 12:09

jaydeepsb


2 Answers

In the call to read_csv you can't really. If you're just processing a header you can open the file, extract the commented lines and process them, then read in the data in a separate call.

from itertools import takewhile
with open(filename, 'r') as fobj:
    # takewhile returns an iterator over all the lines 
    # that start with the comment string
    headiter = takewhile(lambda s: s.startswith('#'), fobj)
    # you may want to process the headers differently, 
    # but here we just convert it to a list
    header = list(headiter)
df = pandas.read_csv(filename)
like image 70
Elliot Avatar answered Oct 23 '22 11:10

Elliot


Maybe you can read this file again in normal way, read each line to get your parameters.

def get_param( filename):
    para_dic = {}
    with  open(filename,'r') as cmt_file:    # open file
        for line in cmt_file:    # read each line
            if line[0] == '#':    # check the first character
                line = line[1:]    # remove first '#'
                para = line.split(':')     # seperate string by ':'
                if len(para) == 2:
                    para_dic[ para[0].strip()] = para[1].strip()
    return para_dic

This function will return a dictionary contain parameters.

{'param3': 'val3', 'param2': 'val2', 'param1': 'val1'}
like image 2
Warren Wong Avatar answered Oct 23 '22 11:10

Warren Wong