I'm trying to learn Python by working on a fun project - a Facebook message analyzer. I've downloaded my data off Facebook, which includes a set of html files. One of these - messages.htm - contains all of my messages. My goal is to take this html file and parse it out to output fun data like most common word, # of messages, etc.
The problem is that my messages.htm file is 270MB. I can inspect it fine in vim, but there's interesting patterns in the file and I'd like to compare the html code with how it's actually rendered properly on a browser so I can compare the code with the visuals and get a better sense of what's going on. But when I try to open this file in Firefox, FF crashes. I can open it in Chrome, but it just starts loading all the messages, and ~10 minutes in it hasn't even fully loaded one single message thread no matter how tiny the scroll bar gets. So this isn't feasible.
Is it even possible to fully render such a large and long HTML file?
You can use lynx which is a text based browser to view a large html file. I have a 139M html file and I was able to view it very easily using lynx
. lynx
divides the entire document into pages and is able to load any given page very quickly. It also supports hyper-linking, so navigating within the html document (which was my use case) worked like a charm.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With