Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split tags in python

I have a file that contains this:

<html>
  <head>
    <title> Hello! - {{ today }}</title>
  </head>
  <body>
    {{ runner_up }} 
         avasd
         {{ blabla }}
        sdvas
        {{ oooo }}
   </body>
</html>

What is the best or most Pythonic way to extract the {{today}}, {{runner_up}}, etc.?

I know it can be done with splits/regular expressions, but I wondered if there were another way.

PS: consider the data loaded in a variable called thedata.

Edit: I think that the HTML example was bad, because it directed some commenters to BeautifulSoup. So, here is a new input data:

Fix grammatical or {{spelling}} errors.

Clarify meaning without changing it.

Correct minor {{mistakes}}.

Add related resources or links.

Always respect the original {{author}}.

Output:

spelling
mistakes
author
like image 585
Jon Romero Avatar asked Dec 08 '22 08:12

Jon Romero


1 Answers

Mmkay, well here's a generator solution that seems to work well for me. You can also provide different open and close tags if you like.

def get_tags(s, open_delim  ='{{', 
                close_delim ='}}' ):

   while True:

      # Search for the next two delimiters in the source text
      start = s.find(open_delim)
      end   = s.find(close_delim)

      # We found a non-empty match
      if -1 < start < end:

         # Skip the length of the open delimiter
         start += len(open_delim)

         # Spit out the tag
         yield s[start:end].strip()

         # Truncate string to start from last match
         s = s[end+len(close_delim):]

      else:
         return

Run against your target input like so:

# prints: today, runner_up, blabla, oooo
for tag in get_tags(html):
    print tag

Edit: it also works against your new example :). In my obviously quick testing, it also seemed to handle malformed tags in a reasonable way, though I make no guarantees of its robustness!

like image 173
Kenan Banks Avatar answered Dec 21 '22 23:12

Kenan Banks