Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep CSV file's comment lines in pandas?

I have just started delving into the world of Pandas, and the first strange CSV file I've found is one where there are two lines of comments (with different column widths) right at the beginning.

sometext, sometext2
moretext, moretext1, moretext2
*header*
actual data ---
---------------

I know how to skip these lines with skiprows or header=, but, instead, how would I retain these comments while using read_csv? Sometimes comments are necessary as file meta information, and I do not want to throw them away.

like image 640
Coolio2654 Avatar asked Feb 27 '26 01:02

Coolio2654


2 Answers

Pandas is designed to read structured data.

For unstructured data, just use the built-in open:

with open('file.csv') as f:
    reader = csv.reader(f)
    row1 = next(reader)  # gets the first line
    row2 = next(reader)  # gets the second line

You can attach strings to the dataframe like this:

df.comments = 'My Comments'

But note:

Note, however, that while you can attach attributes to a DataFrame, operations performed on the DataFrame (such as groupby, pivot, join or loc to name just a few) may return a new DataFrame without the metadata attached. Pandas does not yet have a robust method of propagating metadata attached to DataFrames.

like image 57
jpp Avatar answered Mar 01 '26 14:03

jpp


You can read first metadata and then use read_csv:

with open('f.csv') as file:
    #read first 2 rows to metadata
    header = [file.readline() for x in range(2)]
    meta = [value.strip().split(',') for value in header]
    print (meta)
    [['sometext', ' sometext2'], ['moretext', ' moretext1', ' moretext2']]

    df = pd.read_csv(file)
    print (df)

          *header*
    0  actual data
like image 44
jezrael Avatar answered Mar 01 '26 16:03

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!