C# HTMLAgilityPack HTML to Text - Parse Errors

Tags:

I need to extract text from an HTML file using C#. I am trying to use HTMLAgilityPack but I am seeing some parse errors (tags not closed). I am using these two options:

        htmlDoc.OptionFixNestedTags = true;
        htmlDoc.OptionAutoCloseOnEnd = true;

Is there any "Fix all" type option. I don't care about the errors, I just want the content or close.

599

asked Sep 27 '10 09:09

tvr

1 Answers

Maybe this is workaround but once I had to extract text from HTML I used regex:

result = Regex.Replace(result, @"<(.|\n)*?>", String.Empty);
result = Regex.Replace(result, @"^\n*", String.Empty, RegexOptions.Singleline | RegexOptions.IgnoreCase);
result = Regex.Replace(result, @"\n*$", String.Empty, RegexOptions.Singleline | RegexOptions.IgnoreCase);
result = result.Replace("\n", " ");

117

answered Sep 26 '22 08:09

Ichibann

Related questions
                            
                                how to speed up the compilation of large c# solutions
                            
                                A Better C# Poker Framework Design?
                            
                                Native threads in a .Net application
                            
                                OutofMemoryException - Loading Extremely Large Images
                            
                                migrating to C# from Java
                            
                                HTML E-mail: Must you send an alternate plain text view as well?
                            
                                The transaction has aborted
                            
                                A different object with same identifier was already associated with the session error
                            
                                Catching a StackOverflowException
                            
                                WCF and Authentication
                            
                                How do I update a foreign key efficiently in LINQ to SQL/SQLMetal?
                            
                                Designing library performance comparison tests
                            
                                Reflect Over List of Controllers
                            
                                What do you look for in a dependency to determine if it should be an injected dependency?
                            
                                C# ordered combinations algorithm
                            
                                .NET - Copying an executable across LAN to another computer, and executing it
                            
                                How are my C# and PHP decryption methods different?
                            
                                How to launch program with user permissions instead of active permissions
                            
                                trapping unhandled exceptions in .NET excel addins (XLL)?
                            
                                What does this error mean? The remote host closed the connection. The error code is 0x80070057

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C# HTMLAgilityPack HTML to Text - Parse Errors

Tags:

c#

html-parsing

html-agility-pack

tvr

People also ask

1 Answers

Ichibann

Recent Activity

Donate For Us