Writing multiple header lines in pandas.DataFrame.to_csv

Tags:

I am putting my data into NASA's ICARTT format for archvival. This is a comma-separated file with multiple header lines, and has commas in the header lines. Something like:

46, 1001
lastname, firstname
location
instrument
field mission
1, 1
2011, 06, 21, 2012, 02, 29
0
Start_UTC, seconds, number_of_seconds_from_0000_UTC
14
1, 1
-999, -999
measurement name, units
measurement name, units
column1 label, column2 label, column3 label, column4 label, etc.

I have to make a separate file for each day that data were collected, so I will end up creating around thirty files in all. When I create a csv file via pandas.DataFrame.to_csv I cannot (as far as I know) simply write the header lines to the file before writing the data, so I have had to trick it to doing what I want via

# assuming <df> is a pandas dataframe
df.to_csv('dst.ict',na_rep='-999',header=True,index=True,index_label=header_lines)

where "header_lines" is the header string

What this give me is exactly what I want, except "header_lines" is bracketed by double-quotes. Is there any way to write text to the head of a csv file using to_csv or remove the double quotes? I have already tried setting quotechar='' and doublequote=False in to_csv(), but the double quotes still come up.

What I am doing now (and it works for now, but I would like to move to something better) is simply opening a file via open('dst.ict','w') and printing to that line by line, which is quite slow.

862

asked Nov 21 '14 21:11

tnknepp

1 Answers

You can, indeed, just write the header lines before the data. pandas.DataFrame.to_csv takes a path_or_buf as its first argument, not just a pathname:

pandas.DataFrame.to_csv(path_or_buf, *args, **kwargs)

path_or_buf : string or file handle, default None

File path or object, if None is provided the result is returned as a string.

Here's an example:

#!/usr/bin/python2

import pandas as pd
import numpy as np
import sys

# Make an example data frame.
df = pd.DataFrame(np.random.randint(100, size=(5,5)),
                  columns=['a', 'b', 'c', 'd', 'e'])

header = '\n'.join(
    # I like to make sure the header lines are at least utf8-encoded.
    [unicode(line, 'utf8') for line in 
        [ '1001',
        'Daedalus, Stephen',
        'Dublin, Ireland',
        'Keys',
        'MINOS',
        '1,1',
        '1904,06,16,1922,02,02',
        'time_since_8am', # Ends up being the header name for the index.
        ]
    ]
)

with open(sys.argv[1], 'w') as ict:
    # Write the header lines, including the index variable for
    # the last one if you're letting Pandas produce that for you.
    # (see above).
    for line in header:
        ict.write(line)

    # Just write the data frame to the file object instead of
    # to a filename. Pandas will do the right thing and realize
    # it's already been opened.
    df.to_csv(ict)

The result is just what you wanted - to write the header lines, and then call .to_csv() and write that:

$ python example.py test && cat test
1001
Daedalus, Stephen
Dublin, Ireland
Keys to the tower
MINOS
1, 1
1904, 06, 16, 1922, 02, 02
time_since_8am,a,b,c,d,e
0,67,85,66,18,32
1,47,4,41,82,84
2,24,50,39,53,13
3,49,24,17,12,61
4,91,5,69,2,18

Sorry if this is too late to be useful. I work in archiving these files (and use Python), so feel free to drop me a line if you have future questions.

answered Sep 28 '22 08:09

ndt

Related questions
                            
                                Python - Pandas '.isin' on a list
                            
                                Understanding python class attributes
                            
                                How to draw crosshair and plot mouse position in pyqtgraph?
                            
                                Extract values from a list based on a condition
                            
                                Efficiently Removing Very-Near-Duplicates From Python List
                            
                                GetUserTimeline always returns my own timeline
                            
                                Subtract a column from one pandas dataframe from another
                            
                                How to count the number of zeros in Python?
                            
                                get a "raw" request\response from MITM Proxy
                            
                                scrapy how spider returns value to another spider
                            
                                how to store decision tree
                            
                                Lowering process priority of multiprocessing.Pool on Windows
                            
                                Nesting dictionaries while looping through data
                            
                                python, is function an object?
                            
                                Make the main thread wait until all threads finish
                            
                                mock.patch() not patching class who called a couples of levels inside function call
                            
                                What is the priority of importing a name, submodule or subpackage from a package in python 2.7?
                            
                                how can I calc pow of fractions or nth root square in numpy?
                            
                                Make isinstance(obj, cls) work with a decorated class
                            
                                Python Pandas Pivot - Why Fails

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Writing multiple header lines in pandas.DataFrame.to_csv

Tags:

pandas

csv

python-2.7

tnknepp

People also ask

1 Answers

ndt

Recent Activity

Donate For Us