I've got a batch of strings that I need to cut down. They're basically a descriptor followed by codes. I only want to keep the descriptor.
'a descriptor dps 23 fd'
'another 23 fd'
'and another fd'
'and one without a code'
The codes above are dps
, 23
and fd
. They can come in any order, are unrelated to each other and might not exist at all (as in the last case).
The list of codes is fixed (or can be predicted, at least), so assuming a code is never used within a legitimate descriptor, how can I strip off everything after the first instance of a code.
I'm using Python.
The short answer, as @THC4K points out in a comment:
string.split(pattern, 1)[0]
where string
is your original string, pattern
is your "break" pattern, 1
indicates to split no more than 1 time, and [0]
means take the first element returned by split.
In action:
>>> s = "a descriptor 23 fd"
>>> s.split("23", 1)[0]
'a descriptor '
>>> s.split("fdasfdsafdsa", 1)[0]
'a descriptor 23 fd'
This is a much shorter way of expressing what I had written earlier, which I will keep here anyway.
And if you need to remove multiple patterns, this is a great candidate for the reduce
builtin:
>>> string = "a descriptor dps foo 23 bar fd quux"
>>> patterns = ["dps", "23", "fd"]
>>> reduce(lambda s, pat: s.split(pat, 1)[0], patterns, string)
'a descriptor '
>>> reduce(lambda s, pat: s.split(pat, 1)[0], patterns, "uiopuiopuiopuipouiop")
'uiopuiopuiopuipouiop'
This basically says: for each pat
in patterns
: take string
and repeatedly apply string.split(pat, 1)[0]
(like explained above), operating on the result of the previously returned value each time. As you can see, if none of the patterns are in the string, the original string is still returned.
The simplest answer is a list/string slice combined with a string.find
:
>>> s = "a descriptor 23 fd"
>>> s[:s.find("fd")]
'a descriptor 23 '
>>> s[:s.find("23")]
'a descriptor '
>>> s[:s.find("gggfdf")] # <-- look out! last character got cut off
'a descriptor 23 f'
A better approach (to avoid cutting off the last character in a missing pattern when s.find
returns -1) might be to wrap in a simple function:
>>> def cutoff(string, pattern):
... idx = string.find(pattern)
... return string[:idx if idx != -1 else len(string)]
...
>>> cutoff(s, "23")
'a descriptor '
>>> cutoff(s, "asdfdsafdsa")
'a descriptor 23 fd'
The [:s.find(x)]
syntax means take the part of the string from index 0 until the right-hand side of the colon; and in this case, the RHS is the result of s.find
, which returns the index of the string you passed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With