I would like to write some comments in my CSV file created with pandas
. I haven't found any option for this in DataFrame.to_csv
(even though read_csv
can skip comments) neither in the standard csv
module. I can open the file, write the comments (line starting with #
) and then pass it to to_csv
. Does any body have a better option?
An alternative approach @Vor's solution is to first write the comment to a file, and then use mode='a' with to_csv() to add the content of the data frame to the same file.
Pandas is a very powerful and popular framework for data analysis and manipulation. One of the most striking features of Pandas is its ability to read and write various types of files including CSV and Excel.
df.to_csv
accepts a file object. So you can open a file in a
mode, write you comments and pass it to the dataframe to_csv function.
For example:
In [36]: df = pd.DataFrame({'a':[1,2,3], 'b':[1,2,3]}) In [37]: f = open('foo', 'a') In [38]: f.write('# My awesome comment\n') In [39]: f.write('# Here is another one\n') In [40]: df.to_csv(f) In [41]: f.close() In [42]: more foo # My awesome comment # Here is another one ,a,b 0,1,1 1,2,2 2,3,3
An alternative approach @Vor's solution is to first write the comment to a file, and then use mode='a'
with to_csv()
to add the content of the data frame to the same file. According to my benchmarks (below), this takes about as long as opening the file in append mode, adding the comment and then passing the file handler to pandas (as per @Vor's answer). The similar timings make sense considering that this is what pandas in doing internally (DataFrame.to_csv()
calls CSVFormatter.save()
, which uses _get_handles()
to read in the file via open()
.
On a separate note, it is convenient work with file IO via with
statement which ensures that opened files close when you're done with them and leave the with
statement. See examples in the benchmarks below.
import pandas as pd # Read in the iris data frame from the seaborn GitHub location iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') # Create a bigger data frame while iris.shape[0] < 100000: iris = iris.append(iris) # `iris.shape` is now (153600, 5)
%%timeit -n 5 -r 5 # Open a file in append mode to add the comment # Then pass the file handle to pandas with open('test1.csv', 'a') as f: f.write('# This is my comment\n') iris.to_csv(f)
972 ms ± 31.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
to_csv(mode='a')
%%timeit -n 5 -r 5 # Open a file in write mode to add the comment # Then close the file and reopen it with pandas in append mode with open('test2.csv', 'w') as f: f.write('# This is my comment\n') iris.to_csv('test2.csv', mode='a')
949 ms ± 19.3 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With