I have a Pandas column that contains results from a survey, which are either free text or numbers from 1-5. I am retrieving these from an API in JSON format and convert them into a DataFrame. Each row represents one question with the answer of a participant like this:
Memberid | Question | Answer
1 Q1 3
1 Q2 2
1 Q3 Test Text
2 Q1 3
2 Q2 2
2 Q3 Test Text
The column that has the results stores all of them as string for now, so when exporting them to excel the numbers are stored as text.
My goal is to have a separate column for the text answers and leave the field they were originally in empty, so that we have separate columns for the text results and the numeric results for calculation purposes.
Memberid | Question | Numeric Answers | Freetext answers
1 Q1 3
1 Q2 2
1 Q3 Test Text
2 Q1 3
2 Q2 2
2 Q3 Test Text
I am generating this df from lists like this:
d = {'Memberid':memberid, 'Question':title, 'Answer':results}
df = pd.DataFrame(d)
So the first thing I tried was to convert the numeric values in the column from string to numbers via this:
df["Answer"] = pd.to_numeric(df['Answer'], errors='ignore')
Idea was that if it works I can simply do a for loop to check if a value in the answer column is string and then move that value into a new column.
The issue is, that the errors command does not work as intended for me. When I leave it on ignore, nothing gets converted. When I change it to coerce, the numbers get converted from str to numeric, but the fields where there freetext answers are now empty in Excel.
Select the cell or column that contains the text you want to split. Select Data > Text to Columns. In the Convert Text to Columns Wizard, select Delimited > Next. Select the Delimiters for your data.
split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.
You can use Series.str.extract
with a regex pattern:
(\d+)?
will extract consecutive digits(\D+)
will extract consecutive non-digit characters?P<text>
syntax will name your match group - making this the column heading.df.join(df.pop('Answer').str.extract('(?P<numbers>\d+)?(?P<text>\D+)?').fillna(''))
[out]
Memberid Question numbers text
0 1 Q1 3
1 1 Q2 2
2 1 Q3 Test Text
3 2 Q1 3
4 2 Q2 2
5 2 Q3 Test Text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With