Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slice a string after a certain phrase?

Tags:

python

I've got a batch of strings that I need to cut down. They're basically a descriptor followed by codes. I only want to keep the descriptor.

'a descriptor dps 23 fd'
'another 23 fd'
'and another fd'
'and one without a code'

The codes above are dps, 23 and fd. They can come in any order, are unrelated to each other and might not exist at all (as in the last case).

The list of codes is fixed (or can be predicted, at least), so assuming a code is never used within a legitimate descriptor, how can I strip off everything after the first instance of a code.

I'm using Python.

like image 977
Oli Avatar asked Oct 27 '09 22:10

Oli


1 Answers

The short answer, as @THC4K points out in a comment:

string.split(pattern, 1)[0]

where string is your original string, pattern is your "break" pattern, 1 indicates to split no more than 1 time, and [0] means take the first element returned by split.

In action:

>>> s = "a descriptor 23 fd"
>>> s.split("23", 1)[0]
'a descriptor '
>>> s.split("fdasfdsafdsa", 1)[0]
'a descriptor 23 fd'

This is a much shorter way of expressing what I had written earlier, which I will keep here anyway.

And if you need to remove multiple patterns, this is a great candidate for the reduce builtin:

>>> string = "a descriptor dps foo 23 bar fd quux"
>>> patterns = ["dps", "23", "fd"]
>>> reduce(lambda s, pat: s.split(pat, 1)[0], patterns, string)
'a descriptor '
>>> reduce(lambda s, pat: s.split(pat, 1)[0], patterns, "uiopuiopuiopuipouiop")
'uiopuiopuiopuipouiop'

This basically says: for each pat in patterns: take string and repeatedly apply string.split(pat, 1)[0] (like explained above), operating on the result of the previously returned value each time. As you can see, if none of the patterns are in the string, the original string is still returned.


The simplest answer is a list/string slice combined with a string.find:

>>> s = "a descriptor 23 fd"
>>> s[:s.find("fd")]
'a descriptor 23 '
>>> s[:s.find("23")]  
'a descriptor '
>>> s[:s.find("gggfdf")] # <-- look out! last character got cut off
'a descriptor 23 f'

A better approach (to avoid cutting off the last character in a missing pattern when s.find returns -1) might be to wrap in a simple function:

>>> def cutoff(string, pattern):
...     idx = string.find(pattern)
...     return string[:idx if idx != -1 else len(string)]
... 
>>> cutoff(s, "23")
'a descriptor '
>>> cutoff(s, "asdfdsafdsa")
'a descriptor 23 fd'

The [:s.find(x)] syntax means take the part of the string from index 0 until the right-hand side of the colon; and in this case, the RHS is the result of s.find, which returns the index of the string you passed.

like image 66
Mark Rushakoff Avatar answered Sep 21 '22 15:09

Mark Rushakoff