Read a file with unicode characters

Tags:

I have an asp.net c# page and am trying to read a file that has the following charater ’ and convert it to '. (From slanted apostrophe to apostrophe).

FileInfo fileinfo = new FileInfo(FileLocation);
string content = File.ReadAllText(fileinfo.FullName);

//strip out bad characters
content = content.Replace("’", "'");

This doesn't work and it changes the slanted apostrophes into ? marks.

794

asked Apr 27 '11 00:04

chris

2 Answers

I suspect that the problem is not with the replacement, but rather with the reading of the file itself. When I tried this the nieve way (using Word and copy-paste) I ended up with the same results as you, however examining content showed that the .Net framework believe that the character was Unicode character 65533, i.e. the "WTF?" character before the string replacement. You can check this yourself by examining the relevant character in the Visual Studio debugger, where it should show the character code:

content[0]; // 65533 '�'

The reason why the replace isn't working is simple - content doesn't contain the string you gave it:

content.IndexOf("’"); // -1

As for why the file reading isn't working properly - you are probably using the wrong encoding when reading the file. (If no encoding is specified then the .Net framework will try to determine the correct encoding for you, however there is no 100% reliable way to do this and so often it can get it wrong). The exact encoding you need depends on the file itself, however in my case the encoding being used was Extended ASCII, and so to read the file I just needed to specify the correct encoding:

string content = File.ReadAllText(fileinfo.FullName, Encoding.GetEncoding("iso-8859-1"));

(See this question).

You also need to make sure that you specify the correct character in your replacement string - when using "odd" characters in code you may find it more reliable to specify the character by its character code, rather than as a string literal (which may cause problems if the encoding of the source file changes), for example the following worked for me:

content = content.Replace("\u0092", "'");

answered Nov 15 '22 18:11

Justin

// This should replace smart single quotes with a straight single quote

Regex.Replace(content, @"(\u2018|\u2019)", "'");

//However the better approach seems to be to read the page with the proper encoding and leave the quotes alone
var sreader= new StreamReader(fileInfo.Create(), Encoding.GetEncoding(1252));

answered Nov 15 '22 18:11

Trey Carroll

Related questions
                            
                                what is the best collection type to return in an API
                            
                                Application.Quit() method failing to clear process
                            
                                ASP.NET Routing - Ignore routes for files with specific extension, regardless of directory
                            
                                How do I format a number in C# with commas and decimals?
                            
                                ASP.NET MVC 2 - Html.EditorFor a nullable type?
                            
                                Wrapping an element with Html.ActionLink..?
                            
                                How do you determine the physical path of a file without an HttpContext?
                            
                                LINQ Comparing Two Lists - Add new, remove old, leave the ones in common
                            
                                C# automatic properties - is it possible to have custom getter with default setter?
                            
                                C# "Method not found" exception on runtime without usage of reflection
                            
                                Invoking a method of a Generic Class
                            
                                Problems with an OData filter and a Guid field
                            
                                Split C# collection into equal parts, maintaining sort
                            
                                How do I use AutoMapper with Ninject.Web.Mvc?
                            
                                Recommended way to check file size on upload
                            
                                Detect dead code in C#
                            
                                C# Threading.Suspend in Obsolete, thread has been deprecated?
                            
                                How to determine if two generic type values are equal?
                            
                                How to merge two lists using LINQ?
                            
                                Parse string into a LINQ query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read a file with unicode characters

Tags:

c#

asp.net

unicode

chris

People also ask

2 Answers

Justin

Trey Carroll

Recent Activity

Donate For Us