I want to count the number of times a word is being repeated in the review string
I am reading the csv file and storing it in a python dataframe using the below line
reviews = pd.read_csv("amazon_baby.csv")
The code in the below lines work when I apply it to a single review.
print reviews["review"][1] a = reviews["review"][1].split("disappointed") print a b = len(a) print b
The output for the above lines were
it came early and was not disappointed. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it. ['it came early and was not ', '. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.'] 2
When I apply the same logic to the entire dataframe using the below line. I receive an error message
reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
Error message:
Traceback (most recent call last): File "C:/Users/gouta/PycharmProjects/MLCourse1/Classifier.py", line 12, in <module> reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1 File "C:\Users\gouta\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2360, in __getattr__ (type(self).__name__, name)) AttributeError: 'Series' object has no attribute 'split'
Series and DataFrame methods define a . explode() method that explodes lists into separate rows. See the docs section on Exploding a list-like column. Since you have a list of comma separated strings, split the string on comma to get a list of elements, then call explode on that column.
To split cell into multiple rows in a Python Pandas dataframe, we can use the apply method. to call apply with a lambda function that calls str. split to split the x string value. And then we call explode to fill new rows with the split values.
To split text in a column into multiple rows with Python Pandas, we can use the str. split method. to create the df data frame.
to call apply with a lambda function that calls str.split to split the x string value. And then we call explode to fill new rows with the split values. Finally, we call `reset_index to reset the index numbers after filling the rows with the split values. To split cell into multiple rows in a Python Pandas dataframe, we can use the apply method.
What it does is split or breakup a string and add the data to a string array using a defined separator. If no separator is defined when you call upon the function, whitespace will be used by default.
What you want to do is apply a function to each row of the data frame, which you can do by calling apply on the data frame: Show activity on this post. pandas 0.20.3 has pandas.Series.str.split () which acts on every string of the series and does the split.
The default value of max is -1. In case the max parameter is not specified, the split () function splits the given string or the line whenever a separator is encountered Manipulation of strings is necessary for all of the programs dealing with strings. In such cases, you need to make use of a function called split () function in Python.
You're trying to split the entire review column of the data frame (which is the Series mentioned in the error message). What you want to do is apply a function to each row of the data frame, which you can do by calling apply on the data frame:
f = lambda x: len(x["review"].split("disappointed")) -1 reviews["disappointed"] = reviews.apply(f, axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With