Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I count comma separated values in one column of my panda table?

Tags:

python

pandas

I have the following code:

businessdata = ['Name of Location','Address','City','Zip Code','Website','Yelp',
'# Reviews', 'Yelp Rating Stars','BarRestStore','Category',
'Price Range','Alcohol','Ambience','Latitude','Longitude']

business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata)
print '\n\nBusiness\n'
print business[:6]

It reads my file and creates a Panda table I can work with. What I need is to count how many categories are in each line of the 'Category' variable and store this number in a new column named '# Categories'. Here is the target column sample:

Category                                         
French                                               
Adult Entertainment , Lounges , Music Venues         
American (New) , Steakhouses                        
American (New) , Beer, Wine & Spirits , Gastropubs 
Chicken Wings , Sports Bars , American (New)         
Japanese

Desired output:

Category                                        # Categories  
French                                               1           
Adult Entertainment , Lounges , Music Venues         3         
American (New) , Steakhouses                         2        
American (New) , Beer, Wine & Spirits , Gastropubs   4         
Chicken Wings , Sports Bars , American (New)         3         
Japanese                                             1        

EDIT 1:

Raw input = CSV file. Target column: "Category" I can't post screenshots yet. I don't think the values to be counted are lists.

This is my code:

business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata, skip_blank_lines=True)
#business = pd.read_csv('FL_Yelp_Data_v2.csv')

business['Category'].str.split(',').apply(len)
#not sure where to declare the df part in the suggestions that use it.

print business[:6]

but I keep getting the following error:

TypeError: object of type 'float' has no len() 

EDIT 2:

I GIVE UP. Thanks for all your help, but I'll have to figure something else.

like image 527
Danilo Avatar asked May 12 '15 21:05

Danilo


People also ask

How do I count comma separated values in a column in Excel?

Please do as follows: Select the cell you will place the counting result, type the formula =LEN(A2)-LEN(SUBSTITUTE(A2,",","")) (A2 is the cell where you will count the commas) into it, and then drag this cell's AutoFill Handle to the range as you need.

How do I count the number of values in a column in a DataFrame?

Use Sum Function to Count Specific Values in a Column in a Dataframe. We can use the sum() function on a specified column to count values equal to a set condition, in this case we use == to get just rows equal to our specific data point.


2 Answers

Assuming that Category is actually a list, you can use apply (per @EdChum's suggestion):

business['# Categories'] = business.Category.apply(len)

If not, you first need to parse it and convert it into a list.

df['Category'] = df.Category.map(lambda x: [i.strip() for i in x.split(",")])

Can you show some sample output of EXACTLY what this column looks like (including correct quotations)?

P.S. @EdChum Thank you for your suggestions. I appreciate them. I believe the list comprehension method may be faster, per a sample of some text data I tested with 30k+ rows of data:

%%timeit
df.Category.str.strip().str.split(',').apply(len)
10 loops, best of 3: 44.8 ms per loop

%%timeit
df.Category.map(lambda x: [i.strip() for i in x.split(",")])
10 loops, best of 3: 28.4 ms per loop

Even accounting for the len function call:

%%timeit
df.Category.map(lambda x: len([i.strip() for i in x.split(",")]))
10 loops, best of 3: 30.3 ms per loop
like image 113
Alexander Avatar answered Oct 02 '22 11:10

Alexander


This works:

business['# Categories'] = business['Category'].apply(lambda x: len(x.split(',')))

If you need to handle NA, etc, you can pass a more elaborate function instead of the lambda.

like image 29
Joe Germuska Avatar answered Oct 02 '22 10:10

Joe Germuska