Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to append new dataframe rows to a csv using pandas?

I have a new dataframe, how to append it to an existed csv?

I tried the following code:

f = open('test.csv', 'w')
df.to_csv(f, sep='\t')
f.close()

But it doesn't append anything to test.csv. The csv is big, I only want to use append, rather than read the whole csv as dataframe and concatenate it to and write it to a new csv. Is there any good method to solve the problem? Thanks.

like image 466
Haven Shi Avatar asked Nov 01 '17 16:11

Haven Shi


2 Answers

Try this:

df.to_csv('test.csv', sep='\t', header=None, mode='a')
# NOTE:                              ----->  ^^^^^^^^   
like image 50
MaxU - stop WAR against UA Avatar answered Oct 05 '22 08:10

MaxU - stop WAR against UA


TL:DR Answer from MaxU is correct.

df.to_csv('old_file.csv', header=None, mode='a')

I had the same problem, wishing to append to DataFrame and save to a CSV inside a loop. It seems to be a common pattern. My criteria was:

  1. Write back to the same file
  2. Don't write data more than necessary.
  3. Keep appending new data to the dataframe during the loop.
  4. Save on each iteration (in case long running loop crashes)
  5. Don't store index in the CSV file.

Note the different values of mode and header. In a complete write, mode='w' and header=True, but in an append, mode='a' and header='False'.

import pandas as pd

# Create a CSV test file with 3 rows
data = [['tom', 10], ['nick', 15], ['juli', 14]] 
test_df = pd.DataFrame(data, columns = ['Name', 'Age']) 
test_df.to_csv('test.csv', mode='w', header=True, index=False)

# Read CSV into a new frame
df = pd.read_csv('test.csv')
print(df)

# MAIN LOOP
# Create new data in a new DataFrame
for i in range(0, 2):
    newdata = [['jack', i], ['jill', i]] 
    new_df  = pd.DataFrame(newdata, columns = ['Name', 'Age']) 

    # Write the new data to the CSV file in append mode
    new_df.to_csv('test.csv', mode='a', header=False, index=False)
    print('check test.csv')

    # Combine the new data into the frame ready for the next loop.
    test_df = pd.concat([test_df, new_df], ignore_index=True)

# At completion, it shouldn't be necessary, but to write the complete data 
test_df.to_csv('completed.csv', mode='w', header=True, index=False)
# completed.csv and test.csv should be identical.
like image 20
intotecho Avatar answered Oct 05 '22 08:10

intotecho