Is it possible to "sniff" the Character encoding?

Tags:

I have a webpage that accepts CSV files. These files may be created in a variety of places. (I think) there is no way to specify the encoding in a CSV file - so I can not reliably treat all of them as utf-8 or any other encoding.

Is there a way to intelligently guess the encoding of the CSV I am getting? I am working with Python, but willing to work with language agnostic methods too.

695

asked May 27 '13 10:05

shabda

1 Answers

There is no correct way to determine the encoding of a file by looking at only the file itself, but you can use some heuristics-based solution, eg.: chardet

109

answered Oct 03 '22 01:10

asciimoo

Related questions
                            
                                'Module' Object Has no Attribute 'models' error in django
                            
                                Django JSON custom serializing losing datetime type
                            
                                Untracked dirs on commit with pygit2
                            
                                Django + MongoDB
                            
                                Clear and exit_fullscreen seem not to work
                            
                                python to wait for shell command to complete
                            
                                pySerial - Is there a way to select on multiple ports at once?
                            
                                lxml/Python : get previous-sibling
                            
                                Create closed polygon from boundary points
                            
                                drop duplicates in Python Pandas DataFrame not removing duplicates
                            
                                Python: Writing to files within packages?
                            
                                Python Nose tests from generator not running concurrently
                            
                                update tables with computed columns in sqlalchemy
                            
                                Splitting long string without breaking words fulfilling lines
                            
                                Remove Matplotlib Toolbar from the Graph
                            
                                Vectorizing multiple vector-matrix multiplications in NumPy
                            
                                How to import python modules and expose the methods in Robot Ride
                            
                                Returning object of same subclass in __add__ operator
                            
                                Is there a equivalent to commit in bulbs framework for neo4j
                            
                                Most efficient way to add prefix to Python dictionary keys

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to "sniff" the Character encoding?

Tags:

python

character-encoding

csv

unicode

shabda

People also ask

1 Answers

asciimoo

Recent Activity

Donate For Us