Writing single CSV header with pandas

Tags:

I'm parsing data into lists and using pandas to frame and write to an CSV file. First my data is taken into a set where inv, name, and date are all lists with numerous entries. Then I use concat to concatenate each iteration through the datasets I parse through to a CSV file like so:

counter = True
data = {'Invention': inv, 'Inventor': name, 'Date': date}

if counter is True:
  df = pd.DataFrame(data)
  df = df[['Invetion', 'Inventor', 'Date']]

else:
  df = pd.concat([df, pd.DataFrame(data)])
  df = df[['Invention', 'Inventor', 'Date']]

  with open('./new.csv', 'a', encoding = utf-8) as f:
    if counter is True:
      df.to_csv(f, index = False, header = True)
    else:
      df.to_csv(f, index = False, header = False)

counter = False

The counter = True statement resides outside of my iteration loop for all the data I'm parsing so it's not overwriting every time.

So this means it only runs once through my data to grab the first df set then concats it thereafter. The problem is that even though counter is only True the first round and works for my first if-statement for df it does not work for my writing to file.

What happens is that the header is written over and over again - regardless to the fact that counter is only True once. When I swap the header = False for when counter is True then it never writes the header.

I think this is because of the concatenation of df holding onto the header somehow but other than that I cannot figure out the logic error.

Is there perhaps another way I could also write a header once and only once to the same CSV file?

729

asked Jan 01 '18 21:01

HelloToEarth

1 Answers

It's hard to tell what might be going wrong without seeing the rest of the code. I've developed some test data and logic that works; you can adapt it to fit your needs.

Please try this:

import pandas as pd

early_inventions = ['wheel', 'fire', 'bronze']
later_inventions = ['automobile', 'computer', 'rocket']

early_names = ['a', 'b', 'c']
later_names = ['z', 'y', 'x']

early_dates = ['2000-01-01', '2001-10-01', '2002-03-10']
later_dates = ['2010-01-28', '2011-10-10', '2012-12-31']

early_data = {'Invention': early_inventions,
    'Inventor': early_names,
    'Date': early_dates}

later_data = {'Invention': later_inventions,
    'Inventor': later_names,
    'Date': later_dates}

datasets = [early_data, later_data]

columns = ['Invention', 'Inventor', 'Date']
header = True
for dataset in datasets:
    df = pd.DataFrame(dataset)
    df = df[columns]
    mode = 'w' if header else 'a'
    df.to_csv('./new.csv', encoding='utf-8', mode=mode, header=header, index=False)
    header = False

Alternatively, you can concatenate all of the data in the loop and write out the dataframe at the end:

df = pd.DataFrame(columns=columns)
for dataset in datasets:
    df = pd.concat([df, pd.DataFrame(dataset)])
    df = df[columns]
df.to_csv('./new.csv', encoding='utf-8', index=False)

If your code cannot be made to conform to this API, you can forego writing the header in to_csv altogether. You can detect whether the output file exists and write the header to it first if it does not:

import os

fn = './new.csv'
if not os.path.exists(fn):
    with open(fn, mode='w', encoding='utf-8') as f:
        f.write(','.join(columns) + '\n')
# Now append the dataframe without a header
df.to_csv(fn, encoding='utf-8', mode='a', header=False, index=False)

165

answered Oct 01 '22 04:10

Tom Lynch

Related questions
                            
                                Add element at the start of array and delete at the end numpy
                            
                                Pandas.plotting doesn't show graph
                            
                                Using a normal function f(x) in python
                            
                                Get the column names of a python numpy array
                            
                                Treating NaN as zero in arithmetic operations?
                            
                                python gobject.mainloop gobbles signal events
                            
                                Configuring ALE plugin with Pylint
                            
                                Is there a way I can log when python "requests-cache" hits the cache?
                            
                                Is there better way to create lazy variable initialization?
                            
                                Creating a Numpy structure scalar instead of array
                            
                                Fast way to quantize numpy vectors
                            
                                save pandas plot with subplots to one file
                            
                                Mixing asyncio and Kivy: How to start the asyncio loop and the Kivy application at the same time?
                            
                                Is there a way to tell which kernel a jupyter notebook was built with?
                            
                                Creating a pivot table in pandas and grouping at the same time the dates per week
                            
                                How to unpack key,value pairs in python? [duplicate]
                            
                                Jupyter: How to update plot on button click (ipywidgets)
                            
                                Heatmap in python to represent (x,y) coordinates in a given rectangular area
                            
                                HTML Logger in python
                            
                                run multi command in the same jupyter cells

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Writing single CSV header with pandas

Tags:

python

pandas

dataframe

csv

header

HelloToEarth

People also ask

1 Answers

Tom Lynch

Recent Activity

Donate For Us