Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the comment tag <!--...--> with BeautifulSoup?

I tried soup.find('!--') but it doesn't seem to work. Thanks in advance.

Edit: Thanks for the tip on how to find all comments. I have a follow up question. How do I specifically search out for a comment?

For example, I have the following comment tag:

<!-- <span class="titlefont"> <i>Wednesday 110518</i>(05:00PM)<br /></span> -->

I really just want this stuff <i>Wednesday 110518</i>. The "110518" is the date YYMMDD which I'm leaning on using as my search target. However, I don't know how to find something within a specific comment tag.

like image 827
1stsage Avatar asked May 19 '11 17:05

1stsage


People also ask

Is Comment The object of BeautifulSoup?

Comment Object: The Comment object is just a special type of NavigableString and is used to make the codebase more readable.

What is the difference between Find_all () and find () in BeautifulSoup?

find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.


1 Answers

You can find all the comments in a document with via the findAll method. See this example showing how to do exactly what you're trying to do Removing elements:

In brief, you want this:

comments = soup.findAll(text=lambda text:isinstance(text, Comment))

Edit: If you're trying to search within the columns, you can try:

import re
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
  e = re.match(r'<i>([^<]*)</i>', comment.string).group(1)
  print e
like image 72
yan Avatar answered Sep 19 '22 06:09

yan