Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract a string between 2 other strings in python?

Tags:

python

string

Like if I have a string like str1 = "IWantToMasterPython"

If I want to extract "Py" from the above string. I write:

extractedString = foo("Master","thon")

I want to do all this because i am trying to extract lyrics from an html page. The lyrics are written like <div class = "lyricbox"> ....lyrics goes here....</div>.

Any suggestions on how can I implement.

like image 933
Abhijeet Rastogi Avatar asked Sep 04 '09 00:09

Abhijeet Rastogi


4 Answers

The solution is to use a regexp:

import re
r = re.compile('Master(.*?)thon')
m = r.search(str1)
if m:
    lyrics = m.group(1)
like image 81
tonfa Avatar answered Sep 27 '22 20:09

tonfa


BeautifulSoup is the easiest way to do what you want. It can be installed like:

sudo easy_install beautifulsoup

The sample code to do what you want is:

from BeautifulSoup import BeautifulSoup

doc = ['<div class="lyricbox">Hey You</div>']
soup = BeautifulSoup(''.join(doc))
print soup.find('div', {'class': 'lyricbox'}).string

You can use Python's urllib to grab content from the url directly. The Beautiful Soup doc is helpful too if you want to do some more parsing.

like image 43
Thierry Lam Avatar answered Sep 27 '22 19:09

Thierry Lam


def foo(s, leader, trailer):
  end_of_leader = s.index(leader) + len(leader)
  start_of_trailer = s.index(trailer, end_of_leader)
  return s[end_of_leader:start_of_trailer]

this raises ValueError if the leader is not present in string s, or the trailer is not present after that (you have not specified what behavior you want in such anomalous conditions; raising an exception is a pretty natural and Pythonic thing to do, letting the caller handle that with a try/except if it know what to do in such cases).

A RE-based approach is also possible, but I think this pure-string approach is simpler.

like image 40
Alex Martelli Avatar answered Sep 27 '22 19:09

Alex Martelli


If you're extracting any data from a html page, I'd strongly suggest using BeautifulSoup library. I used it also for extracting data from html and it works great.

like image 38
uolot Avatar answered Sep 27 '22 21:09

uolot