Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: reshaping data

I have a pandas Series which presently looks like this:

14    [Yellow, Pizza, Restaurants]
...
160920                  [Automotive, Auto Parts & Supplies]
160921       [Lighting Fixtures & Equipment, Home Services]
160922                 [Food, Pizza, Candy Stores]
160923           [Hair Removal, Nail Salons, Beauty & Spas]
160924           [Hair Removal, Nail Salons, Beauty & Spas]

And I want to radically reshape it into a dataframe that looks something like this...

      Yellow  Automotive  Pizza
14       1         0        1
…           
160920   0         1        0
160921   0         0        0
160922   0         0        1
160923   0         0        0
160924   0         0        0

ie. a logical construction noting which categories each observation(row) falls into.

I'm capable of writing for loop based code to tackle the problem, but given the large number of rows I need to handle, that's going to be very slow.

Does anyone know a vectorised solution to this kind of problem? I'd be very grateful.

EDIT: there are 509 categories, which I do have a list of.

like image 420
N. McA. Avatar asked May 19 '13 17:05

N. McA.


People also ask

How do you reshape a Pandas DataFrame?

You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...])

What is reshaping in Pandas?

In Pandas data reshaping means the transformation of the structure of a table or vector (i.e. DataFrame or Series) to make it suitable for further analysis. Some of Pandas reshaping capabilities do not readily exist in other environments (e.g. SQL or bare bone R) and can be tricky for a beginner.

What is reshaping data in Python?

Python has operations for rearranging tabular data, known as reshaping or pivoting operations. For example, hierarchical indexing provides a consistent way to rearrange data in a DataFrame.

How do you reshape a data set?

You can reshape a stacked DataFrame back to its unstacked format with the unstack() function. By default, the innermost level is unstacked. In our example, it was a number. However, you can unstack a different level by passing a level number or name as a parameter to the unstack() method.


1 Answers

In [9]: s = Series([list('ABC'),list('DEF'),list('ABEF')])

In [10]: s
Out[10]: 
0       [A, B, C]
1       [D, E, F]
2    [A, B, E, F]
dtype: object

In [11]: s.apply(lambda x: Series(1,index=x)).fillna(0)
Out[11]: 
   A  B  C  D  E  F
0  1  1  1  0  0  0
1  0  0  0  1  1  1
2  1  1  0  0  1  1
like image 78
Jeff Avatar answered Sep 27 '22 03:09

Jeff