Keep CSV file's comment lines in pandas?

Question

I have just started delving into the world of Pandas, and the first strange CSV file I've found is one where there are two lines of comments (with different column widths) right at the beginning.

sometext, sometext2
moretext, moretext1, moretext2
*header*
actual data ---
---------------

I know how to skip these lines with skiprows or header=, but, instead, how would I retain these comments while using read_csv? Sometimes comments are necessary as file meta information, and I do not want to throw them away.

jpp · Accepted Answer

Pandas is designed to read structured data.

For unstructured data, just use the built-in open:

with open('file.csv') as f:
    reader = csv.reader(f)
    row1 = next(reader)  # gets the first line
    row2 = next(reader)  # gets the second line

You can attach strings to the dataframe like this:

df.comments = 'My Comments'

But note:

Note, however, that while you can attach attributes to a DataFrame, operations performed on the DataFrame (such as groupby, pivot, join or loc to name just a few) may return a new DataFrame without the metadata attached. Pandas does not yet have a robust method of propagating metadata attached to DataFrames.

jezrael · Answer

You can read first metadata and then use read_csv:

with open('f.csv') as file:
    #read first 2 rows to metadata
    header = [file.readline() for x in range(2)]
    meta = [value.strip().split(',') for value in header]
    print (meta)
    [['sometext', ' sometext2'], ['moretext', ' moretext1', ' moretext2']]

    df = pd.read_csv(file)
    print (df)

          *header*
    0  actual data

Keep CSV file's comment lines in pandas?

Tags:

python

comments

import

pandas

csv

Coolio2654

2 Answers

jpp

jezrael

Recent Activity

Donate For Us

Keep CSV file's comment lines in pandas?

Tags:

python

comments

import

pandas

csv

Coolio2654

2 Answers

jpp

jezrael

Related questions

Recent Activity

Donate For Us