I have a .csv file encoded in UTF-8, which contains both latin and cyrillic symbols. <pre class="prettyprint"><code>;F1;F2;abcdefg3;F200 ;ABSOLUTE;NOMINAL;NOMINAL;NOMINAL o1;1;USA;Новосибирск;1223 </code></pre> I'm trying to execute following script in IronPython 2.7.1: <pre class="prettyprint"><code>import codecs f = codecs.open(r"file.csv", "rb", "utf-8") f.next() </code></pre> During the execution of f.next() an exception occurs: <pre class="prettyprint"><code>Traceback (most recent call last): File "c:\Program Files\Microsoft Visual Studio 10.0\Common7\IDE\Extensions\Microsoft\Python Tools for Visual Studio\1.1\visualstudio_py_repl.py", line 492, in run_file_as_main code.Execute(self.exec_mod) File "<string>", line 4, in <module> File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 684, in next return self.reader.next() File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 615, in next line = self.readline() File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 530, in readline data = self.read(readsize, firstline=True) File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 477, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeEncodeError: ('unknown', '\x00', 0, 1, '') </code></pre> At the same time in CPython 2.7 the script works correctly. Also in the IronPython 2.7.1 following script works fine: <pre class="prettyprint"><code>import codecs f = codecs.open(r"file.csv", "rb", "utf-8") f.readlines() </code></pre> Does anybody know what may cause such strange behaviour?

Looks like it could be a bug in how <code>next()</code> handles codecs. Can you please open an issue with the files to reproduce attached?

Reading UTF-8 file with codecs in IronPython

Tags:

python

csv

encoding

utf-8

ironpython

I have a .csv file encoded in UTF-8, which contains both latin and cyrillic symbols.

;F1;F2;abcdefg3;F200
;ABSOLUTE;NOMINAL;NOMINAL;NOMINAL
o1;1;USA;Новосибирск;1223

I'm trying to execute following script in IronPython 2.7.1:

import codecs

f = codecs.open(r"file.csv", "rb", "utf-8")
f.next()

During the execution of f.next() an exception occurs:

Traceback (most recent call last):
  File "c:\Program Files\Microsoft Visual Studio 10.0\Common7\IDE\Extensions\Microsoft\Python Tools for Visual Studio\1.1\visualstudio_py_repl.py", line 492, in run_file_as_main
    code.Execute(self.exec_mod)
  File "<string>", line 4, in <module>
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 684, in next
    return self.reader.next()
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 615, in next
    line = self.readline()
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 530, in readline
    data = self.read(readsize, firstline=True)
  File "C:\Program Files\IronPython 2.7.1\Lib\codecs.py", line 477, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeEncodeError: ('unknown', '\x00', 0, 1, '')

At the same time in CPython 2.7 the script works correctly. Also in the IronPython 2.7.1 following script works fine:

import codecs

f = codecs.open(r"file.csv", "rb", "utf-8")
f.readlines()

Does anybody know what may cause such strange behaviour?

911

asked Apr 12 '12 12:04

Rustam Miftakhutdinov

1 Answers

Looks like it could be a bug in how next() handles codecs. Can you please open an issue with the files to reproduce attached?

125

answered Oct 21 '22 16:10

Jeff Hardy

Related questions
                            
                                Output widget appears outside tab widget when using nbconvert on jupyter notebook with ipywidgets
                            
                                Converting a pandas Interval into a string (and back again)
                            
                                How to differentiate between cases of ValueError
                            
                                JupyterLab: WebGL is not supported by your browser
                            
                                AWS Lambda in Python: Import parent package/directory in Lambda function handler
                            
                                Make GPU available again after numba.cuda.close()?
                            
                                Anaconda Integration with Cuda 9.0 shows Incompatible Package Error
                            
                                How to run Windows IIS on top of an ASGI server like hypercorn or uvicorn?
                            
                                Explode index level of DataFrame
                            
                                Why are duplicate UUIDs being generated from python on GCP?
                            
                                How to solve an assignment problem (like Hungarian/linear_sum_assignment) with an edge case in PySpark UDF
                            
                                how to hook to events / messages in windows using python
                            
                                Python Timeout
                            
                                numpy calling sse2 via ctypes
                            
                                Parsing a document with BeautifulSoup while not-parsing the contents of <code> tags
                            
                                Java and whitespace-as-syntax (ala Python)?
                            
                                Using pip to install single-file python modules
                            
                                Python - error when zipping files, 'L' format requires 0 <= number <= 4294967295
                            
                                matplotlib: when using append_axes, how can I indicate the axes I want to add the subpanel to?
                            
                                Path routing in Flask

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With