Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python filename, not markup. open this file and pass the filehandle into Beautiful Soup

I have changed my Python 2.7 routine to accept a file path as a parameter for the routine so I don't have to duplicate code by inserting multiple file paths inside the method.

When my method is called I get the following error:

looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.
  '"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)

My method implementation is:

def extract_data_from_report3(filename):
    html_report_part1 = open(filename,'r').read()
    soup = BeautifulSoup(filename, "html.parser")
    th = soup.find_all('th')
    td = soup.find_all('td')

    headers = [header.get_text(strip=True) for header in soup.find_all("th")]
    rows = [dict(zip(headers, [td.get_text(strip=True) for td in row.find_all("td")]))
        for row in soup.find_all("tr")[1:-1]]
    print(rows)
    return rows

To call the method is as follows:

rows_part1 =  report.extract_data_from_report3(r"E:\test_runners\selenium_regression_test_5_1_1\TestReport\SeleniumTestReport_part1.html")
print "part1 = "
print rows_part1

How can I pass the file name as a parameter?

like image 882
Riaz Ladhani Avatar asked May 17 '16 12:05

Riaz Ladhani


People also ask

How do you use BeautifulSoup 4 in Python?

To use beautiful soup, you need to install it: $ pip install beautifulsoup4 . Beautiful Soup also relies on a parser, the default is lxml . You may already have it, but you should check (open IDLE and attempt to import lxml). If not, do: $ pip install lxml or $ apt-get install python-lxml .

What is BeautifulSoup package in Python?

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work. The latest Version of Beautifulsoup is v4. 9.3 as of now.

How does the prettify () method change a BeautifulSoup object?

The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with a separate line for each tag and each string: Python3.

What does Soup mean in Python?

Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.


1 Answers

If you want to pass a file handle then don't call read, just pass open(filename) or the file handle without calling read :

def extract_data_from_report3(filename):
    html_report_part1 = open(filename,'r')
    soup = BeautifulSoup( html_report_part1, "html.parser")

Or:

def extract_data_from_report3(filename):
    soup = BeautifulSoup(open(filename), "html.parser")

You can pass html_report_part1 after calling read as suggested but you don't need to, BeautifulSoup can take a file object.

like image 136
Padraic Cunningham Avatar answered Oct 01 '22 15:10

Padraic Cunningham