What's the best way to identify unicode encoded text files in Windows?

Tags:

I am working on a codebase which has some unicode encoded files scattered throughout as a result of multiple team members developing with different editors (and default settings). I would like to clean up our code base by finding all the unicode encoded files and converting them back to ANSI encoding.

Any thoughts on how to accomplish the "finding" part of this task would be truly appreciated.

226

asked Jan 12 '11 18:01

HOCA

1 Answers

See “How to detect the character encoding of a text-file?” or “How to reliably guess the encoding [...]?”

UTF-8 can be detected with validation. You can also look for the BOM EF BB BF, but don't rely on it.
UTF-16 can be detected by looking for the BOM.
UTF-32 can be detected by validation, or by the BOM.
Otherwise assume the ANSI code page.

Our codebase doesn't include any non-ASCII chars. I will try to grep for the BOM in files in our codebase. Thanks for the clarification.

Well that makes things a lot simpler. UTF-8 without non-ASCII chars is ASCII.

142

answered Sep 18 '22 05:09

dan04

Related questions
                            
                                Unable to launch SparkR in RStudio
                            
                                wget.exe for windows 10 [closed]
                            
                                Capture shutdown command for graceful close in .NET Core
                            
                                Hiding monitor from windows, working with it from my app only
                            
                                Are Visual C++ dynamic runtime libraries part of the Windows OS nowadays?
                            
                                Is there any way of throttling CPU/Memory of a process?
                            
                                How would you start automating my job?
                            
                                Imagick PHP 5.4 extension does not work with relative paths. (windows)
                            
                                Are font names on Windows English-only?
                            
                                Visual Studio 2015 slow
                            
                                External HDD on bash on ubuntu on windows [closed]
                            
                                How to check if a program is installed on Windows system [duplicate]
                            
                                Is set single step trap available on win 7?
                            
                                Why does the android apk size differs when built from windows and mac
                            
                                Does multiprocessing.pool.imap has a variant (like starmap) that allows for multiple arguments?
                            
                                "pip install jq" generates errors on Mac and Windows
                            
                                Internet explorer ignores flash mms.cfg settings
                            
                                Java, UTF-8, and Windows console
                            
                                Testing for an invalid windows handle: should I compare with 'NULL', '0' or even 'nullptr'?
                            
                                Is it possible to share an opengl framebuffer object between contexts/threads?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the best way to identify unicode encoded text files in Windows?

Tags:

search

windows

unicode

HOCA

People also ask

1 Answers

dan04

Recent Activity

Donate For Us