I have a Pandas DataFrame called "data" with 2 columns and 50 rows filled with one or two lines of text each, imported from a .tsv file. Some of the questions may contain integers and floats, besides strings. I am trying to extract the first word of every sentence (in both columns), but consistently get this error: AttributeError: 'DataFrame' object has no attribute 'str'.
At first, I thought the error was due to my wrong use of "data.str.split", but all changes I could Google failed. Then I through the file might not be composed of all strings. So I tried "data.astype(str)" on the file, but the same error remained. Any suggestions? Thanks a lot!
Here is my code:
import pandas as pd
questions = "questions.tsv"
data = pd.read_csv(questions, usecols = [3], nrows = 50, header=1, sep="\t")
data = data.astype(str)
first_words = data.str.split(None, 1)[0]
Use str. split() and list indexing to get the first word in a string. Call str. split() to create a list of all words in str separated by space or newline character.
Use df. apply() to apply string search along an axis of the dataframe and returns the matching rows. Use df. applymap() to apply string search to a Dataframe elementwise and returns the matching rows.
Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.
Use:
first_words = data.apply(lambda x: x.str.split().str[0])
Or:
first_words = data.applymap(lambda x: x.split()[0])
Sample:
data = pd.DataFrame({'a':['aa ss ss','ee rre', 1, 'r'],
'b':[4,'rrt ee', 'ee www ee', 6]})
print (data)
a b
0 aa ss ss 4
1 ee rre rrt ee
2 1 ee www ee
3 r 6
data = data.astype(str)
first_words = data.apply(lambda x: x.str.split().str[0])
print (first_words)
a b
0 aa 4
1 ee rrt
2 1 ee
3 r 6
first_words = data.applymap(lambda x: x.split()[0])
print (first_words)
a b
0 aa 4
1 ee rrt
2 1 ee
3 r 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With