Is there a pandas function to transform this data so it show the columns as a,b,c,d,e or whatever is inside the data field and the rows count how many of the letters there are. <pre class="prettyprint"><code>import pandas as pd trans = pd.read_table('output.txt', header=None, index_col=0) print trans >>> 1 2 3 4 0 11 a b c NaN 666 a d e NaN 10101 b c d NaN 1010 a b c d 414147 b c NaN NaN 10101 a b d NaN 1242 d e NaN NaN 101 a b c d 411 c d e NaN 444 a b c NaN </code></pre> instead I want the output to be like this: <pre class="prettyprint"><code> a b c d e 0 11 1 1 1 NaN NaN 666 1 NaN NaN 1 1 </code></pre> The function .stack() almost gets it done but in the wrong format.

You could also use Pandas <code>get_dummies()</code> <pre class="prettyprint"><code>pd.get_dummies(df.unstack().dropna()).groupby(level=1).sum() </code></pre> results in: <pre class="prettyprint"><code> a b c d e 0 11 1 1 1 0 0 666 1 0 0 1 1 10101 0 1 1 1 0 1010 1 1 1 1 0 414147 0 1 1 0 0 10101 1 1 0 1 0 1242 0 0 0 1 1 101 1 1 1 1 0 411 0 0 1 1 1 444 1 1 1 0 0 </code></pre> You could replace the zeros with NaN's in you want to. Its a bit obscure in one line. <code>df.unstack().dropna()</code> basically flattens your DataFrame to a series and drops al NaN's. The <code>get_dummies</code> gives a table of all the occurrences of the letters, but for each level in the unstack DataFrame. The grouping and sum then combine the index to the original shape.

Transforming Pandas Dataframe

Is there a pandas function to transform this data so it show the columns as a,b,c,d,e or whatever is inside the data field and the rows count how many of the letters there are.

import pandas as pd

trans = pd.read_table('output.txt', header=None, index_col=0)

print trans
>>> 
        1  2    3    4
0                     
11      a  b    c  NaN
666     a  d    e  NaN
10101   b  c    d  NaN
1010    a  b    c    d
414147  b  c  NaN  NaN
10101   a  b    d  NaN
1242    d  e  NaN  NaN
101     a  b    c    d
411     c  d    e  NaN
444     a  b    c  NaN

instead I want the output to be like this:

        a  b    c     d   e
0                     
11      1  1    1   NaN  NaN
666     1  NaN  NaN   1    1

The function .stack() almost gets it done but in the wrong format.

What does transform () do in Python?

The transform() method allows you to execute a function for each value of the DataFrame.

What is transform () function used for?

The transform() function is used to call function on self producing a Series with transformed values and that has the same axis length as self. Function to use for transforming the data. If a function, must either work when passed a Series or when passed to Series. apply.

What the difference between pandas apply and transform?

transform() can take a function, a string function, a list of functions, and a dict. However, apply() is only allowed a function. apply() works with multiple Series at a time. But, transform() is only allowed to work with a single Series at a time.

You could also use Pandas get_dummies()

pd.get_dummies(df.unstack().dropna()).groupby(level=1).sum()

results in:

        a  b  c  d  e
0                    
11      1  1  1  0  0
666     1  0  0  1  1
10101   0  1  1  1  0
1010    1  1  1  1  0
414147  0  1  1  0  0
10101   1  1  0  1  0
1242    0  0  0  1  1
101     1  1  1  1  0
411     0  0  1  1  1
444     1  1  1  0  0

You could replace the zeros with NaN's in you want to.

Its a bit obscure in one line. df.unstack().dropna() basically flattens your DataFrame to a series and drops al NaN's. The get_dummies gives a table of all the occurrences of the letters, but for each level in the unstack DataFrame. The grouping and sum then combine the index to the original shape.

Something like this may be:

>>> st = pd.DataFrame(trans.stack()).reset_index(level=0)
>>> st.columns = ['i','c']
>>> st.pivot_table(rows='i', cols='c', aggfunc=len)
c        a   b   c   d   e
i                         
11       1   1   1 NaN NaN
101      1   1   1   1 NaN
411    NaN NaN   1   1   1
444      1   1   1 NaN NaN
666      1 NaN NaN   1   1
1010     1   1   1   1 NaN
1242   NaN NaN NaN   1   1
10101    1   2   1   2 NaN
414147 NaN   1   1 NaN NaN

Transforming Pandas Dataframe

Tags:

python

pandas

dataframe

user3084006

People also ask

2 Answers

Rutger Kassies

Roman Pekar

Recent Activity

Donate For Us

Transforming Pandas Dataframe

Tags:

python

pandas

dataframe

user3084006

People also ask

2 Answers

Rutger Kassies

Roman Pekar

Related questions

Recent Activity

Donate For Us