I'm trying to split a column in a pandas dataframe based on a separator character, and obtain the last section.
pandas has the str.rsplit and the str.rpartition functions.
If I try:
df_client["Subject"].str.rsplit("-", 1)
I get
0 [Activity -Location , UserCode]
1 [Activity -Location , UserCode]
and if I try
df_client["Subject"].str.rpartition("-")
I get
0 1 2
0 Activity -Location - UserCode
1 Activity -Location - UserCode
If I do
df_client["Subject"].str.rpartition("-")[2]
I get
0 UserCode
which is what I want.
To me, str.rsplit seems unintuitive.
After getting the list of the split string, how would I then select the single item that I need?
The rfind() method finds the last occurrence of the specified value. The rfind() method returns -1 if the value is not found. The rfind() method is almost the same as the rindex() method.
Python String rsplit() Method The rsplit() method splits a string into a list, starting from the right. If no "max" is specified, this method will return the same as the split() method. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.
split() function. The str. split() function is used to split strings around given separator/delimiter. The function splits the string in the Series/Index from the beginning, at the specified delimiter string.
I think need indexing by str working with iterables:
#select last lists
df_client["Subject"].str.rsplit("-", 1).str[-1]
#select second lists
df_client["Subject"].str.rsplit("-", 1).str[1]
If performance is important use list comprehension
:
df_client['last_col'] = [x.rsplit("-", 1)[-1] for x in df_client["Subject"]]
print (df_client)
Subject last_col
0 Activity-Location-UserCode UserCode
1 Activity-Location-UserCode UserCode
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With