Suppose I have local copies of news articles. How can I run newspaper on those articles? According to the documentation, the normal use of the newspaper library looks something like this:
from newspaper import Article
url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
article.download()
article = Article(url)
article.parse()
# ...
In my case, I do not need to download the article from a web page because I already have a local copy of the page. How can I use newspaper on a local copy of the web page?
There is indeed an official way to solve this as mentioned here
Once you've loaded your html in the program you can use the set_html()
method to set it to article.html
import newspaper
with open("file.html", 'rb') as fh:
ht = fh.read()
article = newspaper.Article(url = ' ')
article.set_html(ht)
article.parse()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With