My code is the following:
import pandas as pd
import numpy as np
df = pd.read_csv("path/to/my/infile.csv")
df = df.sort_values(['distance', 'time'])
df.to_csv("path/to/my/outfile.csv")
this code reads from infile.csv which is a 3GB csv file successfully, sorts it and fails when trying to write to outfile.csv with the following error:
OSError Traceback (most recent call last)
<ipython-input-10-3a5c8279658d> in <module>
----> 1 df.to_csv('/Users/joaomatos/Desktop/cluster22_sorted_training.csv',index=False)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
1743 doublequote=doublequote,
1744 escapechar=escapechar, decimal=decimal)
-> 1745 formatter.save()
1746
1747 if path_or_buf is None:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/formats/csvs.py in save(self)
164 encoding=encoding,
165 compression=self.compression)
--> 166 f.write(buf)
167 f.close()
168 for _fh in handles:
OSError: [Errno 22] Invalid argument
My question is why?
Thank you for your help
I just had a similar issue and I was using back slash "\" which usually works in the past but this time turn out I had to use "/" instead which is extremely weird but it worked. Maybe you can try that?
After exploring a lot of options, including the pandas library update to the latest version (1.2.4 as of today), changing the engine to "python" or "c", debugging, etc. I finally discovered what the issue was:
I had my CSV files stored in a folder that was constantly being synchronized in real-time with OneDrive.
YES! I discovered that the tray icon was becoming crazy and OneDrive was consuming resources at the same time I was doing algorithmic trading backtesting to my pet project. I paused sync and then it never failed again!!
I guess you can also exclude the folder from OneDrive or simply change the location where the CSVs are stored/written/accessed.
Apparently this problem is caused by a known bug reported here associated with a previous version of pandas. All I had to do was pip3 install --upgrade pandas
and then restart the computer.
In my case it worked once I specified the absolute, rather than the relative, path. I don't know why though--it hasn't happened before. Maybe because I'm working on an external hard drive?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With