I have the following code: <pre class="prettyprint"><code>businessdata = ['Name of Location','Address','City','Zip Code','Website','Yelp', '# Reviews', 'Yelp Rating Stars','BarRestStore','Category', 'Price Range','Alcohol','Ambience','Latitude','Longitude'] business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata) print '\n\nBusiness\n' print business[:6] </code></pre> It reads my file and creates a Panda table I can work with. What I need is to count how many categories are in each line of the 'Category' variable and store this number in a new column named '# Categories'. Here is the target column sample: <pre class="prettyprint"><code>Category French Adult Entertainment , Lounges , Music Venues American (New) , Steakhouses American (New) , Beer, Wine & Spirits , Gastropubs Chicken Wings , Sports Bars , American (New) Japanese </code></pre> Desired output: <pre class="prettyprint"><code>Category # Categories French 1 Adult Entertainment , Lounges , Music Venues 3 American (New) , Steakhouses 2 American (New) , Beer, Wine & Spirits , Gastropubs 4 Chicken Wings , Sports Bars , American (New) 3 Japanese 1 </code></pre> EDIT 1: Raw input = CSV file. Target column: "Category" I can't post screenshots yet. I don't think the values to be counted are lists. This is my code: <pre class="prettyprint"><code>business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata, skip_blank_lines=True) #business = pd.read_csv('FL_Yelp_Data_v2.csv') business['Category'].str.split(',').apply(len) #not sure where to declare the df part in the suggestions that use it. print business[:6] </code></pre> but I keep getting the following error: <pre class="prettyprint"><code>TypeError: object of type 'float' has no len() </code></pre> EDIT 2: I GIVE UP. Thanks for all your help, but I'll have to figure something else.

This works: <pre class="prettyprint"><code>business['# Categories'] = business['Category'].apply(lambda x: len(x.split(','))) </code></pre> If you need to handle NA, etc, you can pass a more elaborate function instead of the lambda.

How can I count comma separated values in one column of my panda table?

Tags:

python

pandas

I have the following code:

businessdata = ['Name of Location','Address','City','Zip Code','Website','Yelp',
'# Reviews', 'Yelp Rating Stars','BarRestStore','Category',
'Price Range','Alcohol','Ambience','Latitude','Longitude']

business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata)
print '\n\nBusiness\n'
print business[:6]

It reads my file and creates a Panda table I can work with. What I need is to count how many categories are in each line of the 'Category' variable and store this number in a new column named '# Categories'. Here is the target column sample:

Category                                         
French                                               
Adult Entertainment , Lounges , Music Venues         
American (New) , Steakhouses                        
American (New) , Beer, Wine & Spirits , Gastropubs 
Chicken Wings , Sports Bars , American (New)         
Japanese

Desired output:

Category                                        # Categories  
French                                               1           
Adult Entertainment , Lounges , Music Venues         3         
American (New) , Steakhouses                         2        
American (New) , Beer, Wine & Spirits , Gastropubs   4         
Chicken Wings , Sports Bars , American (New)         3         
Japanese                                             1

EDIT 1:

Raw input = CSV file. Target column: "Category" I can't post screenshots yet. I don't think the values to be counted are lists.

This is my code:

business = pd.read_table('FL_Yelp_Data_v2.csv', sep=',', header=1, names=businessdata, skip_blank_lines=True)
#business = pd.read_csv('FL_Yelp_Data_v2.csv')

business['Category'].str.split(',').apply(len)
#not sure where to declare the df part in the suggestions that use it.

print business[:6]

but I keep getting the following error:

TypeError: object of type 'float' has no len()

EDIT 2:

I GIVE UP. Thanks for all your help, but I'll have to figure something else.

527

asked May 12 '15 21:05

Danilo

2 Answers

Assuming that Category is actually a list, you can use apply (per @EdChum's suggestion):

business['# Categories'] = business.Category.apply(len)

If not, you first need to parse it and convert it into a list.

df['Category'] = df.Category.map(lambda x: [i.strip() for i in x.split(",")])

Can you show some sample output of EXACTLY what this column looks like (including correct quotations)?

P.S. @EdChum Thank you for your suggestions. I appreciate them. I believe the list comprehension method may be faster, per a sample of some text data I tested with 30k+ rows of data:

%%timeit
df.Category.str.strip().str.split(',').apply(len)
10 loops, best of 3: 44.8 ms per loop

%%timeit
df.Category.map(lambda x: [i.strip() for i in x.split(",")])
10 loops, best of 3: 28.4 ms per loop

Even accounting for the len function call:

%%timeit
df.Category.map(lambda x: len([i.strip() for i in x.split(",")]))
10 loops, best of 3: 30.3 ms per loop

113

answered Oct 02 '22 11:10

Alexander

This works:

business['# Categories'] = business['Category'].apply(lambda x: len(x.split(',')))

If you need to handle NA, etc, you can pass a more elaborate function instead of the lambda.

answered Oct 02 '22 10:10

Joe Germuska

Related questions
                            
                                align three time series in python
                            
                                Issues with username field in Python-social-auth
                            
                                why do you need "if instance is None" in __get__ of a descriptor class?
                            
                                mocking a function within a class method
                            
                                cosine similarity between two words in a list
                            
                                How to remove key from request QueryDict in Django?
                            
                                urllib2.quote does not work properly
                            
                                Changing the length of axis lines in matplotlib
                            
                                Merging multiple dataframes with non unique indexes
                            
                                Create LTI system in Python from state matrices using scipy.signal.lti
                            
                                How can I convert a .whl to an .egg?
                            
                                What's wrong with this Python mock patch?
                            
                                File paths hierarchial sort in python
                            
                                How to simulate timeout response
                            
                                pandas area plot interpolation / step style
                            
                                Difference between function and generator?
                            
                                Distribution-type graphs (histogram/kde) with weighted data
                            
                                How do I kill a Python multiprocessing job?
                            
                                Get all global variables/local variables in gdb's python interface
                            
                                Python: Selenium send_key can't type numbers like 5 or 6

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With