How to detect string byte encoding?

Tags:

I've got about 1000 filenames read by os.listdir(), some of them are encoded in UTF8 and some are CP1252.

I want to decode all of them to Unicode for further processing in my script. Is there a way to get the source encoding to correctly decode into Unicode?

Example:

for item in os.listdir(rootPath):      #Convert to Unicode     if isinstance(item, str):         item = item.decode('cp1252')  # or item = item.decode('utf-8')     print item

945

asked Apr 10 '13 06:04

Philipp

1 Answers

Use chardet library. It is super easy

import chardet  the_encoding = chardet.detect('your string')['encoding']

and that's it!

in python3 you need to provide type bytes or bytearray so:

import chardet the_encoding = chardet.detect(b'your string')['encoding']

answered Sep 18 '22 15:09

george

Related questions
                            
                                How to find table like structure in image
                            
                                Does Python have anything Like Capybara/Cucumber?
                            
                                Dynamically limiting queryset of related field
                            
                                'module' object is not callable - calling method in another file
                            
                                Python scikit-learn: exporting trained classifier
                            
                                numpy.r_ is not a function. What is it?
                            
                                Pre-populate an inline FormSet?
                            
                                How to build a single python file from multiple scripts?
                            
                                GridSearch for an estimator inside a OneVsRestClassifier
                            
                                Catch "socket.error: [Errno 111] Connection refused" exception
                            
                                How would I access variables from one class to another?
                            
                                Django equivalent of PHP's form value array/associative array
                            
                                Parentheses in Python Conditionals
                            
                                Merging a Python script's subprocess' stdout and stderr while keeping them distinguishable
                            
                                OpenCV Python: Draw minAreaRect ( RotatedRect not implemented)
                            
                                How to delete an instantiated object Python?
                            
                                Python & Pandas: How to query if a list-type column contains something?
                            
                                Basic method chaining
                            
                                when does Python allocate new memory for identical strings?
                            
                                In Python, heapq.heapify doesn't take cmp or key functions as arguments like sorted does

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to detect string byte encoding?

Tags:

python

string

encoding

unicode

byte

Philipp

People also ask

1 Answers

george

Recent Activity

Donate For Us