Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get the first word from each string in my Dataframe using Python?

Tags:

pandas

I have a Pandas DataFrame called "data" with 2 columns and 50 rows filled with one or two lines of text each, imported from a .tsv file. Some of the questions may contain integers and floats, besides strings. I am trying to extract the first word of every sentence (in both columns), but consistently get this error: AttributeError: 'DataFrame' object has no attribute 'str'.

At first, I thought the error was due to my wrong use of "data.str.split", but all changes I could Google failed. Then I through the file might not be composed of all strings. So I tried "data.astype(str)" on the file, but the same error remained. Any suggestions? Thanks a lot!

Here is my code:

import pandas as pd
questions = "questions.tsv"
data = pd.read_csv(questions, usecols = [3], nrows = 50, header=1, sep="\t")
data = data.astype(str)
first_words = data.str.split(None, 1)[0]
like image 382
twhale Avatar asked Sep 15 '17 04:09

twhale


People also ask

How do you extract the first word of a string in python?

Use str. split() and list indexing to get the first word in a string. Call str. split() to create a list of all words in str separated by space or newline character.

How do I find a word in a DataFrame in python?

Use df. apply() to apply string search along an axis of the dataframe and returns the matching rows. Use df. applymap() to apply string search to a Dataframe elementwise and returns the matching rows.

What is first () in pandas?

Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.


1 Answers

Use:

first_words = data.apply(lambda x: x.str.split().str[0])

Or:

first_words = data.applymap(lambda x: x.split()[0])

Sample:

data = pd.DataFrame({'a':['aa ss ss','ee rre', 1, 'r'],
                   'b':[4,'rrt ee', 'ee www ee', 6]})
print (data)
          a          b
0  aa ss ss          4
1    ee rre     rrt ee
2         1  ee www ee
3         r          6

data = data.astype(str)
first_words = data.apply(lambda x: x.str.split().str[0])
print (first_words)
    a    b
0  aa    4
1  ee  rrt
2   1   ee
3   r    6

first_words = data.applymap(lambda x: x.split()[0])
print (first_words)
    a    b
0  aa    4
1  ee  rrt
2   1   ee
3   r    6
like image 177
jezrael Avatar answered Oct 08 '22 09:10

jezrael