I'm shocked that no one has asked this on SO before.. since it seems like a simple enough of a problem.
I have a single column in a pandas Dataframe that looks like this:
df = pd.DataFrame(data=[['APPLEGATE WINERY 455.292049'],['AMAND FARM 849.827192'],['COBB FARM ST 1039.49357'],['DIRIGIA 2048.947284']], columns = ['Col1'])
Col1
0 APPLEGATE WINERY 455.292049
1 AMAND FARM 849.827192
2 COBB FARM ST 1039.49357
3 DIRIGIA 2048.947284
And I just want to separate the string characters from the numeric, so the result should look like this
Name Area
APPLEGATE WINERY 455.292049
AMAND FARM 849.827192
COBB FARM ST 1039.49357
DIRIGIA 2048.947284
I know I can use Regular Expressions in python, but this seems like overkill since a) it's just a separation of data types and b) the strings have different lengths and the numerics have different numbers of digits.
So one result would start to look like this:
df['Name'] = df.Col1.str.extract('([A-Z]\w{0,})', expand=True)
df['Area'] = df.Col1.str.extract('(\d)', expand=True)
But is there a nice, clean solution out there to solve this problem without going through the hassle of using RegEx and instead separating strings from numerics into two columns?
split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.
To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list. Let's see it action with the help of an example.
Use a single extract
call. You'll also want to strip trailing whitespaces from the result if you use this regex.
df2 = (df['Col1'].str.extract(r'(?P<Name>.*?)(?P<Area>\d+(?:\.\d+)?)')
.applymap(str.strip))
df2
Name Area
0 APPLEGATE WINERY 455.292049
1 AMAND FARM 849.827192
2 COBB FARM ST 1039.49357
3 DIRIGIA 2048.947284
Regex Breakdown
(?P<Name> # first named capture group - "Name"
.*? # match anything (non-greedy)
)
(?P<Area> # second named group - "Area"
\d+ # match one or more digits,
(?:
\. # decimal
\d+ # trailing digits
)? # the `?` indicates floating point is optional
)
PS, to convert the "Area" column to numeric, use pd.to_numeric
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With