Assume I'm dealing with a very large csv file. So, I can only read the data chunk by chunk into the memory. The expected flow of events should be as follows:
1) Read chunk (eg: 10 rows) of data from csv using pandas.
2) Reverse the order of data
3) Copy each row to new csv file in reverse. So each chunk (10 rows) is written to csv from beginning in reversed order.
In the end the csv file should be in reversed order and this should be done without loading entire file into memory for windows OS.
I am trying to do a time series forecasting I need data to be from old to latest (1st row oldest entry). I can't load entire file into memory I'm looking for a way to do it each chunk at a time if it's possible.
The dataset I tried on train.csv
of the Rossmann dataset from kaggle. You can get it from this github repo
My attempt does not copy the rows into the new csv file properly.
Show below is my code:
import pandas as pd
import csv
def reverse():
fields = ["Store","DayOfWeek","Date","Sales","Customers","Open","Promo","StateHoliday",
"SchoolHoliday"]
with open('processed_train.csv', mode='a') as stock_file:
writer = csv.writer(stock_file,delimiter=',', quotechar='"',
quoting=csv.QUOTE_MINIMAL)
writer.writerow(fields)
for chunk in pd.read_csv("train.csv", chunksize=10):
store_data = chunk.reindex(index=chunk.index[::-1])
append_data_csv(store_data)
def append_data_csv(store_data):
with open('processed_train.csv', mode='a') as store_file:
writer = csv.writer(store_file,delimiter=',', quotechar='"',
quoting=csv.QUOTE_MINIMAL)
for index, row in store_data.iterrows():
print(row)
writer.writerow([row['Store'],row['DayOfWeek'],row['Date'],row['Sales'],
row['Customers'],row['Open'],row['Promo'],
row['StateHoliday'],row['SchoolHoliday']])
reverse()
Thank you, in advance
Using bash, you can tail the whole file except the first line and then reverse it and store it with this:
tail -n +2 train.csv | tac > train_rev.csv
If you want to keep the header in the reversed file, write it first and then append the reversed content
head -1 train.csv > train_rev.csv; tail -n +2 train.csv | tac >> train_rev.csv
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With