beautifulsoup, Find th with text 'price', then get price from next th

Tags:

beautifulsoup

My html looks like:

<td>
   <table ..>
      <tr>
         <th ..>price</th>
         <th>$99.99</th>
      </tr>
   </table>
</td>

So I am in the current table cell, how would I get the 99.99 value?

I have so far:

td[3].findChild('th')

But I need to do:

Find th with text 'price', then get next th tag's string value.

541

asked Jul 31 '10 04:07

1 Answers

Think about it in "steps"... given that some x is the root of the subtree you're considering,

x.findAll(text='price')

is the list of all items in that subtree containing text 'price'. The parents of those items then of course will be:

[t.parent for t in x.findAll(text='price')]

and if you only want to keep those whose "name" (tag) is 'th', then of course

[t.parent for t in x.findAll(text='price') if t.parent.name=='th']

and you want the "next siblings" of those (but only if they're also 'th's), so

[t.parent.nextSibling for t in x.findAll(text='price')
 if t.parent.name=='th' and t.parent.nextSibling and t.parent.nextSibling.name=='th']

Here you see the problem with using a list comprehension: too much repetition, since we can't assign intermediate results to simple names. Let's therefore switch to a good old loop...:

Edit: added tolerance for a string of text between the parent th and the "next sibling" as well as tolerance for the latter being a td instead, per OP's comment.

for t in x.findAll(text='price'):
  p = t.parent
  if p.name != 'th': continue
  ns = p.nextSibling
  if ns and not ns.name: ns = ns.nextSibling
  if not ns or ns.name not in ('td', 'th'): continue
  print ns.string

I've added ns.string, that will give the next sibling's contents if and only if they're just text (no further nested tags) -- of course you can instead analize further at this point, depends on your application's needs!-). Similarly, I imagine you won't be doing just print but something smarter, but I'm giving you the structure.

Talking about the structure, notice that twice I use if...: continue: this reduces nesting compared to the alternative of inverting the if's condition and indenting all the following statements in the loop -- and "flat is better than nested" is one of the koans in the Zen of Python (import this at an interactive prompt to see them all and meditate;-).

answered Sep 21 '22 08:09

Alex Martelli

Related questions
                            
                                pyplot: really slow creating heatmaps
                            
                                Is there a more efficient way to organize random outcomes by size in Python?
                            
                                WTForms error:TypeError: formdata should be a multidict-type wrapper
                            
                                Adding a small photo/image to a large graph in Matplotlib/Python
                            
                                How to change the amount of increments in pyplot axis
                            
                                On Mac OS X, do you use the shipped python or your own?
                            
                                Python: Cannot concatenate str and NoneType objects
                            
                                What is the difference between getiterator() and iter() wrt to lxml
                            
                                Extracting semantic/stylistic features from text
                            
                                Google App Engine bulkloader issue when using yaml autogenerated configuration and entities with numeric ID
                            
                                Good Example of Twisted IRC Server?
                            
                                Escape arguments for paramiko.SSHClient().exec_command
                            
                                Tree matching algorithm?
                            
                                Building a list of months by iterating between two dates in a list (Python)
                            
                                ksh-style left and right string stripping up to matched expression?
                            
                                python dictionary key Vs object attribute
                            
                                How do I set a breakpoint in a module other than the one I am running in Python IDLE?
                            
                                Determining three in a row in Python 2d array
                            
                                from ... import * with __import__ function [duplicate]
                            
                                Best practices for doing accounting in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

beautifulsoup, Find th with text 'price', then get price from next th

Tags:

python

beautifulsoup

Blankman

People also ask

1 Answers

Alex Martelli

Recent Activity

Donate For Us