Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Newspaper3k library without downloading articles?

Suppose I have local copies of news articles. How can I run newspaper on those articles? According to the documentation, the normal use of the newspaper library looks something like this:

from newspaper import Article

url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
article.download()
article = Article(url)
article.parse()
# ...

In my case, I do not need to download the article from a web page because I already have a local copy of the page. How can I use newspaper on a local copy of the web page?

like image 675
Flux Avatar asked Jun 20 '19 00:06

Flux


1 Answers

There is indeed an official way to solve this as mentioned here

Once you've loaded your html in the program you can use the set_html() method to set it to article.html

import newspaper
with open("file.html", 'rb') as fh:
    ht = fh.read()
article = newspaper.Article(url = ' ')
article.set_html(ht)
article.parse()
like image 67
LucyDrops Avatar answered Sep 22 '22 07:09

LucyDrops