Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas to_csv causes OSError: [Errno 22] Invalid argument

Tags:

python

pandas

csv

My code is the following:

import pandas as pd
import numpy as np

df = pd.read_csv("path/to/my/infile.csv")
df = df.sort_values(['distance', 'time'])
df.to_csv("path/to/my/outfile.csv")

this code reads from infile.csv which is a 3GB csv file successfully, sorts it and fails when trying to write to outfile.csv with the following error:

OSError                                   Traceback (most recent call last)
<ipython-input-10-3a5c8279658d> in <module>
----> 1 df.to_csv('/Users/joaomatos/Desktop/cluster22_sorted_training.csv',index=False)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
   1743                                  doublequote=doublequote,
   1744                                  escapechar=escapechar, decimal=decimal)
-> 1745         formatter.save()
   1746 
   1747         if path_or_buf is None:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/formats/csvs.py in save(self)
    164                                          encoding=encoding,
    165                                          compression=self.compression)
--> 166                 f.write(buf)
    167                 f.close()
    168                 for _fh in handles:

OSError: [Errno 22] Invalid argument

My question is why?

Thank you for your help

like image 507
João Matos Avatar asked Feb 21 '19 16:02

João Matos


4 Answers

I just had a similar issue and I was using back slash "\" which usually works in the past but this time turn out I had to use "/" instead which is extremely weird but it worked. Maybe you can try that?

like image 191
Vivian Ge Avatar answered Sep 20 '22 13:09

Vivian Ge


After exploring a lot of options, including the pandas library update to the latest version (1.2.4 as of today), changing the engine to "python" or "c", debugging, etc. I finally discovered what the issue was:

I had my CSV files stored in a folder that was constantly being synchronized in real-time with OneDrive.

YES! I discovered that the tray icon was becoming crazy and OneDrive was consuming resources at the same time I was doing algorithmic trading backtesting to my pet project. I paused sync and then it never failed again!!

I guess you can also exclude the folder from OneDrive or simply change the location where the CSVs are stored/written/accessed.

like image 44
Nicolás M. Avatar answered Sep 20 '22 13:09

Nicolás M.


Apparently this problem is caused by a known bug reported here associated with a previous version of pandas. All I had to do was pip3 install --upgrade pandas and then restart the computer.

like image 21
João Matos Avatar answered Sep 20 '22 13:09

João Matos


In my case it worked once I specified the absolute, rather than the relative, path. I don't know why though--it hasn't happened before. Maybe because I'm working on an external hard drive?

like image 30
kjohnsen Avatar answered Sep 20 '22 13:09

kjohnsen