Pandas: reshaping data

Tags:

I have a pandas Series which presently looks like this:

14    [Yellow, Pizza, Restaurants]
...
160920                  [Automotive, Auto Parts & Supplies]
160921       [Lighting Fixtures & Equipment, Home Services]
160922                 [Food, Pizza, Candy Stores]
160923           [Hair Removal, Nail Salons, Beauty & Spas]
160924           [Hair Removal, Nail Salons, Beauty & Spas]

And I want to radically reshape it into a dataframe that looks something like this...

      Yellow  Automotive  Pizza
14       1         0        1
…           
160920   0         1        0
160921   0         0        0
160922   0         0        1
160923   0         0        0
160924   0         0        0

ie. a logical construction noting which categories each observation(row) falls into.

I'm capable of writing for loop based code to tackle the problem, but given the large number of rows I need to handle, that's going to be very slow.

Does anyone know a vectorised solution to this kind of problem? I'd be very grateful.

EDIT: there are 509 categories, which I do have a list of.

420

asked May 19 '13 17:05

N. McA.

1 Answers

In [9]: s = Series([list('ABC'),list('DEF'),list('ABEF')])

In [10]: s
Out[10]: 
0       [A, B, C]
1       [D, E, F]
2    [A, B, E, F]
dtype: object

In [11]: s.apply(lambda x: Series(1,index=x)).fillna(0)
Out[11]: 
   A  B  C  D  E  F
0  1  1  1  0  0  0
1  0  0  0  1  1  1
2  1  1  0  0  1  1

answered Sep 27 '22 03:09

Jeff

Related questions
                            
                                Assignment to None
                            
                                Django: Change models without clearing all data?
                            
                                Not all of arguments converted during string formatting
                            
                                Modifying a Python dictionary from different threads
                            
                                Passing dict to constructor?
                            
                                Why did I need to specify a specific class to import in python?
                            
                                Boost::Python- possible to automatically convert from dict --> std::map?
                            
                                Python logging with context
                            
                                Updating context data in FormView form_valid method?
                            
                                Automatically Type Cast Parameters In Python
                            
                                Make matplotlib autoscaling ignore some of the plots
                            
                                Add a directory to Python sys.path so that it's included each time I use Python
                            
                                Can't import Webkit from gi.repository
                            
                                How to keep a socket open until client closes it?
                            
                                Limiting Python input strings to certain characters and lengths
                            
                                python sqlalchemy get column names dynamically?
                            
                                Python subprocess module much slower than commands (deprecated)
                            
                                Inherent way to save web page source
                            
                                Priority queue with higher priority first in Python
                            
                                python netcdf: making a copy of all variables and attributes but one

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: reshaping data

Tags:

python

pandas

vectorization

categories

N. McA.

People also ask

1 Answers

Jeff

Recent Activity

Donate For Us