How to separate Pandas column that contains values stored as text and numbers into two seperate columns

Tags:

I have a Pandas column that contains results from a survey, which are either free text or numbers from 1-5. I am retrieving these from an API in JSON format and convert them into a DataFrame. Each row represents one question with the answer of a participant like this:

Memberid | Question | Answer
       1   Q1             3
       1   Q2             2
       1   Q3         Test Text
       2   Q1             3
       2   Q2             2
       2   Q3         Test Text

The column that has the results stores all of them as string for now, so when exporting them to excel the numbers are stored as text.

My goal is to have a separate column for the text answers and leave the field they were originally in empty, so that we have separate columns for the text results and the numeric results for calculation purposes.

Memberid | Question | Numeric Answers | Freetext answers
       1   Q1             3
       1   Q2             2
       1   Q3                             Test Text
       2   Q1             3
       2   Q2             2
       2   Q3                             Test Text

I am generating this df from lists like this:

d = {'Memberid':memberid, 'Question':title, 'Answer':results}
df = pd.DataFrame(d)

So the first thing I tried was to convert the numeric values in the column from string to numbers via this:

df["Answer"] = pd.to_numeric(df['Answer'], errors='ignore')

Idea was that if it works I can simply do a for loop to check if a value in the answer column is string and then move that value into a new column.

The issue is, that the errors command does not work as intended for me. When I leave it on ignore, nothing gets converted. When I change it to coerce, the numbers get converted from str to numeric, but the fields where there freetext answers are now empty in Excel.

925

asked Jul 02 '19 11:07

Tim

1 Answers

You can use Series.str.extract with a regex pattern:

(\d+)? will extract consecutive digits
(\D+) will extract consecutive non-digit characters
The ?P<text> syntax will name your match group - making this the column heading.

df.join(df.pop('Answer').str.extract('(?P<numbers>\d+)?(?P<text>\D+)?').fillna(''))

[out]

   Memberid Question numbers       text
0         1       Q1       3           
1         1       Q2       2           
2         1       Q3          Test Text
3         2       Q1       3           
4         2       Q2       2           
5         2       Q3          Test Text

answered Sep 28 '22 11:09

Chris Adams

Related questions
                            
                                How do you capitalize the first two letters
                            
                                How can I interpolate variables in Python heredocs?
                            
                                ImportError: cannot import name 'db'
                            
                                'End of statement expected' in pycharm [duplicate]
                            
                                Django queryset order_by absolute value
                            
                                Django Rest Framework to return dicts instead of OrderedDicts
                            
                                Finding count of tuples with same first and third item in list of tuples
                            
                                loading EMNIST-letters dataset
                            
                                How to install pyzmq on an Alpine Linux container?
                            
                                Python request gives 415 error while post data
                            
                                Frame from video is upside down after extracting
                            
                                change column name pandas
                            
                                A pythonic and uFunc-y way to turn pandas column into "increasing" index? [duplicate]
                            
                                Double header in Matplotlib Table
                            
                                Can't start spyder because of PyQt5.QtWebKitWidgets
                            
                                How to rewrite this simple loop using assignment expressions introduced in Python 3.8 alpha?
                            
                                Django messages middleware issue while testing post request
                            
                                How to convert single list's elements in form of dictionary
                            
                                PyInstaller exe returning error on a Tkinter script
                            
                                find index of a value before the maximum for each column in python dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to separate Pandas column that contains values stored as text and numbers into two seperate columns

Tags:

python

python-3.x

pandas

Tim

People also ask

1 Answers

Chris Adams

Recent Activity

Donate For Us