Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Column with list of strings in python

I have a pandas dataframe like the following:

                                          categories  review_count
0                  [Burgers, Fast Food, Restaurants]           137
1                         [Steakhouses, Restaurants]           176
2  [Food, Coffee & Tea, American (New), Restaurants]           390
...                                          ....              ...
...                                          ....              ...
...                                          ....              ...

From this dataFrame,I would like to extract only those rows wherein the list in the 'categories' column of that row contains the category 'Restaurants'. I have so far tried: df[[df.categories.isin('Restaurants'),review_count]],

as I also have other columns in the dataFrame, I specified these two columns that I want to extract. But I get the error:

TypeError: unhashable type: 'list'

I don't have much idea what this error means as I am very new to pandas. Please let me know how I can achieve my goal of extracting only those rows from the dataFrame wherein the 'categories' column for that row has the string 'Restaurants' as part of the categories_list. Any help would be much appreciated.

Thanks in advance!

like image 761
anonuser0428 Avatar asked Oct 13 '13 23:10

anonuser0428


People also ask

How do I get a list of column names in Python?

You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.

How do I store a list in a DataFrame column?

Insert List into Cell Using DataFrame.at() Method. In order to insert the list into the cell will use DataFrame.at() function. For example, I will use the Duration column from the above DataFrame to insert list. at() inserts a list into a specific cell without raising a ValueError.

What is ILOC () in Python?

The iloc() function in python is defined in the Pandas module, which helps us select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.


1 Answers

I think you may have to use a lambda function for this, since you can test whether a value in your column isin some sequence, but pandas doesn't seem to provide a function for testing whether the sequence in your column contains some value:

import pandas as pd
categories = [['fast_food', 'restaurant'], ['coffee', 'cafe'], ['burger', 'restaurant']]
counts = [137, 176, 390]
df = pd.DataFrame({'categories': categories, 'review_count': counts})
# Show which rows contain 'restaurant'
df.categories.map(lambda x: 'restaurant' in x)
# Subset the dataframe using this:
df[df.categories.map(lambda x: 'restaurant' in x)]

Output:

Out[11]: 
                categories  review_count
0  [fast_food, restaurant]           137
2     [burger, restaurant]           390
like image 130
Marius Avatar answered Sep 29 '22 16:09

Marius