I have changed my Python 2.7 routine to accept a file path as a parameter for the routine so I don't have to duplicate code by inserting multiple file paths inside the method.
When my method is called I get the following error:
looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.
'"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)
My method implementation is:
def extract_data_from_report3(filename):
html_report_part1 = open(filename,'r').read()
soup = BeautifulSoup(filename, "html.parser")
th = soup.find_all('th')
td = soup.find_all('td')
headers = [header.get_text(strip=True) for header in soup.find_all("th")]
rows = [dict(zip(headers, [td.get_text(strip=True) for td in row.find_all("td")]))
for row in soup.find_all("tr")[1:-1]]
print(rows)
return rows
To call the method is as follows:
rows_part1 = report.extract_data_from_report3(r"E:\test_runners\selenium_regression_test_5_1_1\TestReport\SeleniumTestReport_part1.html")
print "part1 = "
print rows_part1
How can I pass the file name as a parameter?
To use beautiful soup, you need to install it: $ pip install beautifulsoup4 . Beautiful Soup also relies on a parser, the default is lxml . You may already have it, but you should check (open IDLE and attempt to import lxml). If not, do: $ pip install lxml or $ apt-get install python-lxml .
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work. The latest Version of Beautifulsoup is v4. 9.3 as of now.
The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string: Python3.
Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.
If you want to pass a file handle then don't call read, just pass open(filename)
or the file handle without calling read :
def extract_data_from_report3(filename):
html_report_part1 = open(filename,'r')
soup = BeautifulSoup( html_report_part1, "html.parser")
Or:
def extract_data_from_report3(filename):
soup = BeautifulSoup(open(filename), "html.parser")
You can pass html_report_part1
after calling read as suggested but you don't need to, BeautifulSoup can take a file object.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With