Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String separation in required format, Pythonic way? (with or w/o Regex)

I have a string in the format:

t='@abc @def Hello this part is text'

I want to get this:

l=["abc", "def"] 
s='Hello this part is text'

I did this:

a=t[t.find(' ',t.rfind('@')):].strip()
s=t[:t.find(' ',t.rfind('@'))].strip()
b=a.split('@')
l=[i.strip() for i in b][1:]

It works for the most part, but it fails when the text part has the '@'. Eg, when:

t='@abc @def My email is [email protected]'

it fails. The @names are there in the beginning and there can be text after @names, which may possibly contain @.

Clearly I can append initally with a space and find out first word without '@'. But that doesn't seem an elegant solution.

What is a pythonic way of solving this?

like image 302
lprsd Avatar asked Feb 17 '09 18:02

lprsd


1 Answers

Building unashamedly on MrTopf's effort:

import re
rx = re.compile("((?:@\w+ +)+)(.*)")
t='@abc   @def  @xyz Hello this part is text and my email is [email protected]'
a,s = rx.match(t).groups()
l = re.split('[@ ]+',a)[1:-1]
print l
print s

prints:

['abc', 'def', 'xyz']
Hello this part is text and my email is [email protected]


Justly called to account by hasen j, let me clarify how this works:

/@\w+ +/

matches a single tag - @ followed by at least one alphanumeric or _ followed by at least one space character. + is greedy, so if there is more than one space, it will grab them all.

To match any number of these tags, we need to add a plus (one or more things) to the pattern for tag; so we need to group it with parentheses:

/(@\w+ +)+/

which matches one-or-more tags, and, being greedy, matches all of them. However, those parentheses now fiddle around with our capture groups, so we undo that by making them into an anonymous group:

/(?:@\w+ +)+/

Finally, we make that into a capture group and add another to sweep up the rest:

/((?:@\w+ +)+)(.*)/

A last breakdown to sum up:

((?:@\w+ +)+)(.*)
 (?:@\w+ +)+
 (  @\w+ +)
    @\w+ +

Note that in reviewing this, I've improved it - \w didn't need to be in a set, and it now allows for multiple spaces between tags. Thanks, hasen-j!

like image 139
Brent.Longborough Avatar answered Oct 16 '22 03:10

Brent.Longborough