Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing multiple header lines in pandas.DataFrame.to_csv

I am putting my data into NASA's ICARTT format for archvival. This is a comma-separated file with multiple header lines, and has commas in the header lines. Something like:

46, 1001
lastname, firstname
location
instrument
field mission
1, 1
2011, 06, 21, 2012, 02, 29
0
Start_UTC, seconds, number_of_seconds_from_0000_UTC
14
1, 1
-999, -999
measurement name, units
measurement name, units
column1 label, column2 label, column3 label, column4 label, etc.

I have to make a separate file for each day that data were collected, so I will end up creating around thirty files in all. When I create a csv file via pandas.DataFrame.to_csv I cannot (as far as I know) simply write the header lines to the file before writing the data, so I have had to trick it to doing what I want via

# assuming <df> is a pandas dataframe
df.to_csv('dst.ict',na_rep='-999',header=True,index=True,index_label=header_lines)

where "header_lines" is the header string

What this give me is exactly what I want, except "header_lines" is bracketed by double-quotes. Is there any way to write text to the head of a csv file using to_csv or remove the double quotes? I have already tried setting quotechar='' and doublequote=False in to_csv(), but the double quotes still come up.

What I am doing now (and it works for now, but I would like to move to something better) is simply opening a file via open('dst.ict','w') and printing to that line by line, which is quite slow.

like image 862
tnknepp Avatar asked Nov 21 '14 21:11

tnknepp


People also ask

What does To_csv do in pandas?

Pandas DataFrame to_csv() function converts DataFrame into CSV data. We can pass a file object to write the CSV data into a file. Otherwise, the CSV data is returned in the string format.

How do I assign headers to pandas DataFrame?

You can add header to pandas dataframe using the df. colums = ['Column_Name1', 'column_Name_2'] method. You can use the below code snippet to set column headers to the dataframe.

Does pandas To_csv overwrite?

When you write pandas DataFrame to an existing CSV file, it overwrites the file with the new contents. To append a DataFrame to an existing CSV file, you need to specify the append write mode using mode='a' .


1 Answers

You can, indeed, just write the header lines before the data. pandas.DataFrame.to_csv takes a path_or_buf as its first argument, not just a pathname:

pandas.DataFrame.to_csv(path_or_buf, *args, **kwargs)

  • path_or_buf : string or file handle, default None

    File path or object, if None is provided the result is returned as a string.

Here's an example:

#!/usr/bin/python2

import pandas as pd
import numpy as np
import sys

# Make an example data frame.
df = pd.DataFrame(np.random.randint(100, size=(5,5)),
                  columns=['a', 'b', 'c', 'd', 'e'])

header = '\n'.join(
    # I like to make sure the header lines are at least utf8-encoded.
    [unicode(line, 'utf8') for line in 
        [ '1001',
        'Daedalus, Stephen',
        'Dublin, Ireland',
        'Keys',
        'MINOS',
        '1,1',
        '1904,06,16,1922,02,02',
        'time_since_8am', # Ends up being the header name for the index.
        ]
    ]
)

with open(sys.argv[1], 'w') as ict:
    # Write the header lines, including the index variable for
    # the last one if you're letting Pandas produce that for you.
    # (see above).
    for line in header:
        ict.write(line)

    # Just write the data frame to the file object instead of
    # to a filename. Pandas will do the right thing and realize
    # it's already been opened.
    df.to_csv(ict)

The result is just what you wanted - to write the header lines, and then call .to_csv() and write that:

$ python example.py test && cat test
1001
Daedalus, Stephen
Dublin, Ireland
Keys to the tower
MINOS
1, 1
1904, 06, 16, 1922, 02, 02
time_since_8am,a,b,c,d,e
0,67,85,66,18,32
1,47,4,41,82,84
2,24,50,39,53,13
3,49,24,17,12,61
4,91,5,69,2,18

Sorry if this is too late to be useful. I work in archiving these files (and use Python), so feel free to drop me a line if you have future questions.

like image 97
ndt Avatar answered Sep 28 '22 08:09

ndt