Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to separate Pandas column that contains values stored as text and numbers into two seperate columns

I have a Pandas column that contains results from a survey, which are either free text or numbers from 1-5. I am retrieving these from an API in JSON format and convert them into a DataFrame. Each row represents one question with the answer of a participant like this:

Memberid | Question | Answer
       1   Q1             3
       1   Q2             2
       1   Q3         Test Text
       2   Q1             3
       2   Q2             2
       2   Q3         Test Text

The column that has the results stores all of them as string for now, so when exporting them to excel the numbers are stored as text.

My goal is to have a separate column for the text answers and leave the field they were originally in empty, so that we have separate columns for the text results and the numeric results for calculation purposes.

Memberid | Question | Numeric Answers | Freetext answers
       1   Q1             3
       1   Q2             2
       1   Q3                             Test Text
       2   Q1             3
       2   Q2             2
       2   Q3                             Test Text

I am generating this df from lists like this:

d = {'Memberid':memberid, 'Question':title, 'Answer':results}
df = pd.DataFrame(d)

So the first thing I tried was to convert the numeric values in the column from string to numbers via this:

df["Answer"] = pd.to_numeric(df['Answer'], errors='ignore')

Idea was that if it works I can simply do a for loop to check if a value in the answer column is string and then move that value into a new column.

The issue is, that the errors command does not work as intended for me. When I leave it on ignore, nothing gets converted. When I change it to coerce, the numbers get converted from str to numeric, but the fields where there freetext answers are now empty in Excel.

like image 925
Tim Avatar asked Jul 02 '19 11:07

Tim


People also ask

How do I split a text column into two separate columns?

Select the cell or column that contains the text you want to split. Select Data > Text to Columns. In the Convert Text to Columns Wizard, select Delimited > Next. Select the Delimiters for your data.

How do I separate data in one column into two Pandas?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.


1 Answers

You can use Series.str.extract with a regex pattern:

  • (\d+)? will extract consecutive digits
  • (\D+) will extract consecutive non-digit characters
  • The ?P<text> syntax will name your match group - making this the column heading.

df.join(df.pop('Answer').str.extract('(?P<numbers>\d+)?(?P<text>\D+)?').fillna(''))

[out]

   Memberid Question numbers       text
0         1       Q1       3           
1         1       Q2       2           
2         1       Q3          Test Text
3         2       Q1       3           
4         2       Q2       2           
5         2       Q3          Test Text
like image 75
Chris Adams Avatar answered Sep 28 '22 11:09

Chris Adams