I'm trying to strip the end of the strings in this column. I've seen how to rstrip a specific character, or a set number of characters at the end of a string, but how do you do it based on a pattern?
I'd like to remove the entire end of the strings in the 'team'
column at where we see a lowercase followed by an upper case. Then remove starting at the uppercase. I would like the below 'team'
column:
team pts/g
St. Louis RamsSt. Louis 32.875
Washington RedskinsWashington 27.6875
Minnesota VikingsMinnesota 24.9375
Indianapolis ColtsIndianapolis 26.4375
Oakland RaidersOakland 24.375
Carolina PanthersCarolina 26.3125
Jacksonville JaguarsJacksonville 24.75
Chicago BearsChicago 17.0
Green Bay PackersGreen Bay 22.3125
San Francisco 49ersSan Francisco 18.4375
Buffalo BillsBuffalo 20.0
to look like this:
team pts/g
St. Louis Rams 32.875
Washington Redskins 27.6875
Minnesota Vikings 24.9375
Indianapolis Colts 26.4375
Oakland Raiders 24.375
Carolina Panthers 26.3125
Jacksonville Jaguars 24.75
Chicago Bears 17.0
Green Bay Packers 22.3125
San Francisco 49ers 18.4375
Buffalo Bills 20.0
In Python you can use the replace() and translate() methods to specify which characters you want to remove from the string and return a new modified string result. It is important to remember that the original string will not be altered because strings are immutable.
Use the . strip() method to remove whitespace and characters from the beginning and the end of a string.
The rstrip() method returns a copy of the string by removing the trailing characters specified as argument. If the characters argument is not provided, all trailing whitespaces are removed from the string.
You can use re.sub(pattern, repl, string)
for that.
Let's use this regular expression for matching:
([a-z])[A-Z].*?( )
It matches a lowercase character ([a-z])
, followed by an uppercase character [A-Z]
and any character .*?
until it hits two spaces ( )
.
The lowercase character and the two spaces are in a group, so they can be re-inserted using \1
for the first and \2
for the second group when using re.sub
:
new_text = re.sub(r"([a-z])[A-Z].*?( )", r"\1\2", text)
Output for your example:
team pts/g
St. Louis Rams 32.875
Washington Redskins 27.6875
Minnesota Vikings 24.9375
Indianapolis Colts 26.4375
Oakland Raiders 24.375
Carolina Panthers 26.3125
Jacksonville Jaguars 24.75
Chicago Bears 17.0
Green Bay Packers 22.3125
San Francisco 49ers 18.4375
Buffalo Bills 20.0
This messed the space-alignment up. Might not be relevant for you, but if you want to replace the wiped characters with space, you can pass a function instead of a replacement string to re.sub
, which takes a Match
object and returns a str
:
def replace_with_spaces(match):
return match.group(1) + " "*len(match.group(2)) + match.group(3)
And then use it like this (notice how I put the to-be-replaced part into a regex-group too):
new_text = re.sub(r"([a-z])([A-Z].*?)( )", replace_with_spaces, text)
This produces:
team pts/g
St. Louis Rams 32.875
Washington Redskins 27.687
Minnesota Vikings 24.937
Indianapolis Colts 26.437
Oakland Raiders 24.375
Carolina Panthers 26.312
Jacksonville Jaguars 24.75
Chicago Bears 17.0
Green Bay Packers 22.312
San Francisco 49ers 18.437
Buffalo Bills 20.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With