Spacy NLP with data from a Pandas DataFrame

Question

I have a large pandas data frame of survey string responses, and we would like to trial some features of Spacy's NLP. We are just exploring the capabilities at the moment, but struggling with how to format the data into a format that works with the nlp function of spacy.

Eventually we would like to be able to look at popular topics in the string responses against their user data.

How do I run the nlp pipeline on a column of a dataframe? Or am I going around this the wrong way?

user3471881 · Accepted Answer

You begin by calling spacy.load() with a language model. This will, depending on which model you choose, load tokenizer, tagger, parser, NER and word vectors for the language of your choice. This is stored in a variable called nlp in the spaCy documentation.

nlp = spacy.load(language_model)

We can now call nlp() with any type of text string. So why does not: nlp(df['column_with_strings']) work? Because df['column_with_strings'] is not a string, it is a pandas.Series:

TypeError: Argument 'string' has incorrect type (expected str, got Series)

So what you need to do is call nlp() on each value in the pandas.Series. You can do this by constructing a function and using df['column_with_strings'].apply() or by iterating over each row in the series.

Paschalis Ag · Answer

There is a more efficient and quick way to parse a Series with texts with the nlp pipeline by spaCy. SpaCy suggests using nlp.pipe() when processing large volumes of text.

Following the instructions that are given in the documentation you can do the following:

texts = dataframe['series_with_text]

(Make sure that you have converted the type of the values into strings and you have removed any NaN values that might exist in your data frame).

Then:

docs = list(nlp.pipe(texts))

Spacy NLP with data from a Pandas DataFrame

Tags:

python

pandas

dataframe

nlp

spacy

Ben Peddie

2 Answers

user3471881

Paschalis Ag

Recent Activity

Donate For Us

Spacy NLP with data from a Pandas DataFrame

Tags:

python

pandas

dataframe

nlp

spacy

Ben Peddie

2 Answers

user3471881

Paschalis Ag

Related questions

Recent Activity

Donate For Us