Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple python / Beautiful Soup type question

I'm trying to do some simple string manipulation with the href attribute of a hyperlink extracted using Beautiful Soup:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<a href="http://www.some-site.com/">Some Hyperlink</a>')
href = soup.find("a")["href"]
print href
print href[href.indexOf('/'):]

All I get is:

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print href[href.indexOf('/'):]
AttributeError: 'unicode' object has no attribute 'indexOf'

How should I convert whatever href is into a normal string?

like image 261
Justin Avatar asked Jul 20 '09 12:07

Justin


People also ask

How do you make a Beautiful Soup in Python?

To use beautiful soup, you need to install it: $ pip install beautifulsoup4 . Beautiful Soup also relies on a parser, the default is lxml . You may already have it, but you should check (open IDLE and attempt to import lxml). If not, do: $ pip install lxml or $ apt-get install python-lxml .

What is Beautiful Soup used for in Python?

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

Which method is Beautiful Soup?

There are many Beautifulsoup methods, which allows us to search a parse tree. The two most common and used methods are find() and find_all(). Before talking about find() and find_all(), let us see some examples of different filters you can pass into these methods.

Is Python a soup?

10/01/2020 In other words, SOUP is a software of unknown provenance. It is an already developed software that was not initially designed for a medical application. For example, python interpreter falls within the scope of SOUP.


1 Answers

Python strings do not have an indexOf method.

Use href.index('/')

href.find('/') is similar. But find returns -1 if the string is not found, while index raises a ValueError.

So the correct thing is to use index (since '...'[-1] will return the last character of the string).

like image 91
codeape Avatar answered Sep 27 '22 00:09

codeape