Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read and reverse data chunk by chunk from a csv file and copy to a new csv file

Assume I'm dealing with a very large csv file. So, I can only read the data chunk by chunk into the memory. The expected flow of events should be as follows:

1) Read chunk (eg: 10 rows) of data from csv using pandas.

2) Reverse the order of data

3) Copy each row to new csv file in reverse. So each chunk (10 rows) is written to csv from beginning in reversed order.

In the end the csv file should be in reversed order and this should be done without loading entire file into memory for windows OS.

I am trying to do a time series forecasting I need data to be from old to latest (1st row oldest entry). I can't load entire file into memory I'm looking for a way to do it each chunk at a time if it's possible.

The dataset I tried on train.csv of the Rossmann dataset from kaggle. You can get it from this github repo

My attempt does not copy the rows into the new csv file properly.

Show below is my code:

import pandas as pd
import csv

def reverse():

    fields = ["Store","DayOfWeek","Date","Sales","Customers","Open","Promo","StateHoliday",
              "SchoolHoliday"]
    with open('processed_train.csv', mode='a') as stock_file:
        writer = csv.writer(stock_file,delimiter=',', quotechar='"', 
                                                quoting=csv.QUOTE_MINIMAL)
        writer.writerow(fields)

    for chunk in pd.read_csv("train.csv", chunksize=10):
        store_data = chunk.reindex(index=chunk.index[::-1])
        append_data_csv(store_data)

def append_data_csv(store_data):
    with open('processed_train.csv', mode='a') as store_file:
        writer = csv.writer(store_file,delimiter=',', quotechar='"',
                                           quoting=csv.QUOTE_MINIMAL)
        for index, row in store_data.iterrows():
            print(row)
            writer.writerow([row['Store'],row['DayOfWeek'],row['Date'],row['Sales'],
            row['Customers'],row['Open'],row['Promo'],
            row['StateHoliday'],row['SchoolHoliday']])

reverse()

Thank you, in advance

like image 314
Suleka_28 Avatar asked Oct 29 '18 06:10

Suleka_28


1 Answers

Using bash, you can tail the whole file except the first line and then reverse it and store it with this:

tail -n +2 train.csv  | tac > train_rev.csv

If you want to keep the header in the reversed file, write it first and then append the reversed content

head -1 train.csv > train_rev.csv; tail -n +2 train.csv  | tac >> train_rev.csv
like image 129
gustavovelascoh Avatar answered Nov 17 '22 19:11

gustavovelascoh