Separate string from numeric in single Pandas Dataframe column and create two new columns

Tags:

I'm shocked that no one has asked this on SO before.. since it seems like a simple enough of a problem.

I have a single column in a pandas Dataframe that looks like this:

df = pd.DataFrame(data=[['APPLEGATE WINERY    455.292049'],['AMAND FARM  849.827192'],['COBB FARM ST    1039.49357'],['DIRIGIA 2048.947284']], columns = ['Col1'])

    Col1
0   APPLEGATE WINERY 455.292049
1   AMAND FARM 849.827192
2   COBB FARM ST 1039.49357
3   DIRIGIA 2048.947284

And I just want to separate the string characters from the numeric, so the result should look like this

Name                Area
APPLEGATE WINERY    455.292049
AMAND FARM          849.827192
COBB FARM ST        1039.49357
DIRIGIA             2048.947284

I know I can use Regular Expressions in python, but this seems like overkill since a) it's just a separation of data types and b) the strings have different lengths and the numerics have different numbers of digits.

So one result would start to look like this:

df['Name'] = df.Col1.str.extract('([A-Z]\w{0,})', expand=True)
df['Area'] = df.Col1.str.extract('(\d)', expand=True)

But is there a nice, clean solution out there to solve this problem without going through the hassle of using RegEx and instead separating strings from numerics into two columns?

472

asked Jun 19 '19 16:06

JAG2024

1 Answers

Use a single extract call. You'll also want to strip trailing whitespaces from the result if you use this regex.

df2 = (df['Col1'].str.extract(r'(?P<Name>.*?)(?P<Area>\d+(?:\.\d+)?)')
                 .applymap(str.strip))
df2
               Name         Area
0  APPLEGATE WINERY   455.292049
1        AMAND FARM   849.827192
2      COBB FARM ST   1039.49357
3           DIRIGIA  2048.947284

Regex Breakdown

(?P<Name>   # first named capture group - "Name"
    .*?     # match anything (non-greedy)
)
(?P<Area>   # second named group - "Area"
    \d+     # match one or more digits,
    (?:     
       \.   # decimal
       \d+  # trailing digits
    )?      # the `?` indicates floating point is optional
)

PS, to convert the "Area" column to numeric, use pd.to_numeric.

111

answered Oct 04 '22 23:10

cs95

Related questions
                            
                                vscode intellisense not working with PyTorch
                            
                                Extract names from string with python Regex
                            
                                Why is the curve of my permutation test analysis not smooth?
                            
                                What format to export pandas dataframe while retaining data types? Not CSV; Sqlite? Parquet?
                            
                                PySpark Dataframe melt columns into rows
                            
                                Building SVM with tensorflow's LinearClassifier and Panda's Dataframes
                            
                                Pandas DataFrame Get Header Names based on values
                            
                                Convert Google-Ads API GoogleAdsRow to json?
                            
                                Reading numpy ndarrays into R?
                            
                                Importing COCO datasets to google colaboratory
                            
                                How to configure a decorator in Python
                            
                                python asyncio exceptions raised from loop.create_task()
                            
                                django.db.utils.ProgrammingError: syntax error at or near "WITH ORDINALITY" LINE 6:
                            
                                I need to generate 1000 unique first name In Python
                            
                                Fitting a Logistic Curve to Data
                            
                                Inconsistent behavior concatenating lists and tuples in python
                            
                                Pytest mocker patch Attribute:Error 'function' object has no attribute 'patch'
                            
                                How to fix "ssl module in Python is not available" in CentOs
                            
                                Various list concatenation method and their performance
                            
                                Why does a pandas dataframe consumes much more RAM than the size of the original text file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Separate string from numeric in single Pandas Dataframe column and create two new columns

Tags:

python

pandas

dataframe

extract

JAG2024

People also ask

1 Answers

cs95

Recent Activity

Donate For Us