Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup: Can't convert NavigableString to string

I'm starting to learn Python and I've decided to code a simple scraper. One problem I'm encountering is I cannot convert a NavigableString to a regular string.

Using BeautifulSoup4 and Python 3.5.1. Should I just bite the bullet and go to an earlier version of Python and BeautifulSoup? Or is there a way I can code my own function to cast a NavigableString to a regular unicode string?

for tag in soup.find_all("span"):
    for child in tag.children:
        if "name" in tag.string: #triggers error, can't compare string to NavigableString/bytes
            return child

    #things i've tried:
    #if "name" in str(tag.string)
    #if "name" in unicode(tag.string) #not in 3.5?
    #if "name" in strring(tag.string, "utf-8")
    #tried regex, didn't work. Again, doesn't like NavigableSTring type. 
    #... bunch of other stuff too!
like image 224
Saustin Avatar asked Feb 11 '16 01:02

Saustin


People also ask

How do I find a navigablestring in a beautifulsoup?

if you have spaces in your markup in between nodes BeautifulSoup will turn those into NavigableString 's. So if you use the index of the contents to grab nodes, you might grab a NavigableString instead of the next Tag. To avoid this, query for the node you are looking for: Searching the Parse Tree

What is the use of navigablestring in Beautiful Soup?

string attribute is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. If a tag has only one child, and that child is a NavigableString, the child can be accessed using .string.

What is the use of string Attribute in Beautiful Soup?

string attribute is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. If a tag has only one child, and that child is a NavigableString, the child can be accessed using.string.

Why do I get navigablestring instead of the next tag?

So if you use the index of the contents to grab nodes, you might grab a NavigableString instead of the next Tag. To avoid this, query for the node you are looking for: Searching the Parse Tree


1 Answers

For Python 3...

... the answer is merely str(tag.string)

Other answers will fail.

unicode() is not a built-in in Python 3.

tag.string.encode('utf-8') will convert the string to a byte string, which you don't want..

like image 178
Konchog Avatar answered Nov 14 '22 22:11

Konchog