Encoding with pandas.read_csv when file name has accents

Tags:

I'm trying to load a CSV with pandas, but am running into a problem if the file name has accents. It's clearly an encoding problem, but although read_csv lets you set encoding for text within the file, I can't figure out how to encode the file name properly.

input_file = r'C:\...\Datasets\%s\Provinces\Points\%s.csv' % (country, province)
self.locs = pandas.read_csv(input_file,sep=',',skipinitialspace=True)

The CSV file is Anzoátegui.csv. When I'm getting errors,

input_file = 'C:\\...\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv

Error code:

OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist

So maybe it's converting my string to bytes? I tried using io.StringIO(input_file) as well, which puts the correct file name as a column header on an empty DataFrame:

Empty DataFrame
Columns: [C:\PF2\QGIS Valmiera\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv]
Index: []

Any ideas on how to get this file to load? Unfortunately I can't just strip out accents, as I have to interface with software that requires the proper name, and I have a ton of files to format (not just the one). Thanks!

Edit: Full error

Traceback (most recent call last):
  File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_comm.py", line 891, in doIt
    result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
  File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_vars.py", line 486, in evaluateExpression
    result = eval(compiled, updated_globals, frame.f_locals)
  File "<string>", line 1, in <module>
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 404, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 486, in __init__
    self._make_engine(self.engine)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 594, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 952, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "parser.pyx", line 330, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3040)
  File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5387)
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist

248

asked Jun 04 '14 17:06

khe

1 Answers

Ok folks, I got a little lost in dependency hell, but it turns out that this issue was fixed in pandas 0.14.0. Install the updated version to get files named with accents to import correctly.

Comments at github.

Thanks for the input!

answered Sep 19 '22 21:09

khe

Related questions
                            
                                I need an efficient shared dictionary in a Python multiprocessing environment
                            
                                matplotlib show figure again
                            
                                Translating Pig Latin into English Using Python 3
                            
                                in Python trying to use cv2.matchShapes() from OpenCV
                            
                                Using descriptors in unhashable classes - python
                            
                                Detecting a can or bottle in opencv
                            
                                Class constant dictionary in Python
                            
                                Python parallel execution - how to debug efficiently?
                            
                                xlrd cannot read xlsx file downloaded from email attachment
                            
                                Concept of different systems of measurement in Django project
                            
                                Cython build resulting in undefined symbol
                            
                                Is there a C++/C++11 analogue to python iteration over both index and value "for i, v in enumerate(listVar):"? [duplicate]
                            
                                How to use doctest with a decorated function in python?
                            
                                IIR response in Python
                            
                                What is f2py used for while building numpy source?
                            
                                Fit curve to segmented image
                            
                                how to get direct child nodes not sub-child nodes with same tag name xml minidom python
                            
                                filling in input fields with splinter
                            
                                How to make new folder in askdirectory dialog?
                            
                                How to get APIViews and ViewSets to show on API Root when using Router with Django REST Framework?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Encoding with pandas.read_csv when file name has accents

Tags:

python

python-3.x

pandas

csv

encoding

khe

People also ask

1 Answers

khe

Recent Activity

Donate For Us