Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas read_csv from BytesIO

I have a BytesIO file-like object, containing a CSV. I want to read it into a Pandas dataframe, without writing to disk in between.

MWE

In my use case I downloaded the file straight into BytesIO. For this MWE I'll have a file on disk, read it into BytesIO, then read that into Pandas. The disk step is just to make a MWE.

file.csv

a,b
1,2
3,4

Script:

import pandas as pd
from io import BytesIO
bio = BytesIO()
with open('file.csv', 'rb') as f:
   bio.write(f.read())

# now we have a BytesIO with a CSV
df = pd.read_csv(bio)

Result:

Traceback (most recent call last):
  File "pandas-io.py", line 8, in <module>
    df = pd.read_csv(bio)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1917, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

Note that this sounds like a similar problem to the title of this post, but the error messages are different, and that post has the X-Y problem.

like image 755
falsePockets Avatar asked Dec 10 '22 00:12

falsePockets


1 Answers

The error says the file is empty.

That's because after writing to a BytesIO object, the file pointer is at the end of the file, ready to write more. So when Pandas tries to read it, it starts reading after the last byte that was written.

So you need to move the pointer back to the start, for Pandas to read.

bio.seek(0)
df = pd.read_csv(bio)
like image 107
falsePockets Avatar answered Dec 31 '22 21:12

falsePockets