Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in parsing, update multiple columns in 1 line

Input to pd.read_clipboard()

Ratanhia ,30c x2, 200c x2
Aloe ,30c x2, 200c x2
Nitric Acid ,30c x2, 200c x 2
Sedum Acre ,200c x2, 30c x2
Paeonia ,200c x2, 30c x2
Sulphur ,200c x2, 30c x2
Hamamelis ,30c x1, 200c x1
Aesculus ,30c x1, 200c x1⁠⁠⁠⁠

Code:

import pandas as pd

df = pd.read_clipboard(header=None, sep=',')
df.columns = ['Medicine','power30c','power200c']
df.power30c=df.power30c.apply(lambda x: x[-1])
df.power200c=df.power200c.apply(lambda x: x[-1])
print df

Output:

       Medicine power30c power200c
0     Ratanhia         2         2
1         Aloe         2         2
2  Nitric Acid         2         2
3   Sedum Acre         2         2
4      Paeonia         2         2
5      Sulphur         2         2
6    Hamamelis         1         1
7     Aesculus         1         �

Questions:

  1. Why that in the last row?
  2. How to modify more than 1 column in 1 line?
df[['power30c','power200c']] = df[['power30c','power200c']].apply(lambda x: x[-1])

Throws error:

ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements

Python Version: 2.7, Pandas: 0.19, IPython: 4

like image 909
Mohammad Yusuf Avatar asked Jan 06 '17 08:01

Mohammad Yusuf


4 Answers

You need parameter skipinitialspace:

df = pd.read_clipboard(sep=',',
                       names=['Medicine','power30c','power200c'], 
                       skipinitialspace=True)
print (df)
       Medicine power30c power200c
0     Ratanhia    30c x2   200c x2
1         Aloe    30c x2   200c x2
2  Nitric Acid    30c x2  200c x 2
3   Sedum Acre   200c x2    30c x2
4      Paeonia   200c x2    30c x2
5      Sulphur   200c x2    30c x2
6    Hamamelis    30c x1   200c x1
7     Aesculus    30c x1   200c x1

And then indexing with str:

df[['power30c','power200c']] = df[['power30c','power200c']].apply(lambda x: x.str[-1])
print (df)
       Medicine power30c power200c
0     Ratanhia         2         2
1         Aloe         2         2
2  Nitric Acid         2         2
3   Sedum Acre         2         2
4      Paeonia         2         2
5      Sulphur         2         2
6    Hamamelis         1         1
7     Aesculus         1         1
like image 132
jezrael Avatar answered Nov 19 '22 02:11

jezrael


There is some characters at the end of the text you posted. If you do the copy from here it works:

Ratanhia ,30c x2, 200c x2
Aloe ,30c x2, 200c x2
Nitric Acid ,30c x2, 200c x 2
Sedum Acre ,200c x2, 30c x2
Paeonia ,200c x2, 30c x2
Sulphur ,200c x2, 30c x2
Hamamelis ,30c x1, 200c x1
Aesculus ,30c x1, 200c x1

When I use the text you posted, you can see the extra characters:

In [88]: df.power200c[6]
Out[88]: '200c x1'
In [89]: df.power200c[7]
Out[89]: '200c x1\xe2\x81\xa0\xe2\x81\xa0\xe2\x81\xa0\xe2\x81\xa0'
like image 41
gntoni Avatar answered Nov 19 '22 02:11

gntoni


This is a syntax error

    Ratanhia ,30c x2, 200c x2
    Aloe ,30c x2, 200c x2
    Nitric Acid ,30c x2, 200c x2
    Sedum Acre ,200c x2, 30c x2
    Paeonia ,200c x2, 30c x2
    Sulphur ,200c x2, 30c x2
    Hamamelis ,30c x1, 200c x1
    Aesculus ,30c x1, 200c x1

The last character in your table referring non-utf-8. Please find corrected one above. After solve this length mismatch issue resolved automatically.

like image 1
nuriselcuk Avatar answered Nov 19 '22 04:11

nuriselcuk


I think those solution don't address the fact that you have mixed data in "power*" columns:

       Medicine power30c power200c
0     Ratanhia    30c x2   200c x2
1         Aloe    30c x2   200c x2
2  Nitric Acid    30c x2  200c x 2
3   Sedum Acre   200c x2    30c x2   #  / NOTE: mixed up values
4      Paeonia   200c x2    30c x2   # <  "200c" is in the "power30c" column
5      Sulphur   200c x2    30c x2   #  \ and "30c" is in the "power200c" column
6    Hamamelis    30c x1   200c x1
7     Aesculus    30c x1   200c x1

Here is another solution:

In [34]: df
Out[34]:
       Medicine power30c     power200c
0     Ratanhia    30c x2       200c x2
1         Aloe    30c x2       200c x2
2  Nitric Acid    30c x2      200c x 2
3   Sedum Acre   200c x2        30c x2
4      Paeonia   200c x2        30c x2
5      Sulphur   200c x2        30c x2
6    Hamamelis    30c x1       200c x1
7     Aesculus    30c x1   200c x1⁠⁠⁠⁠

In [35]: (df.set_index('Medicine')
    ...:    .stack()
    ...:    .str.extract(r'(\d+)c\s+x\s*(\d+)', expand=True)
    ...:    .reset_index(level=1, drop=1)
    ...:    .pivot(columns=0, values=1)
    ...:    .add_prefix('power')
    ...:    .add_suffix('c')
    ...:    .reset_index()
    ...: )
    ...:
Out[35]:
0      Medicine power200c power30c
0     Aesculus          1        1
1         Aloe          2        2
2    Hamamelis          1        1
3  Nitric Acid          2        2
4      Paeonia          2        2
5     Ratanhia          2        2
6   Sedum Acre          2        2
7      Sulphur          2        2
like image 1
MaxU - stop WAR against UA Avatar answered Nov 19 '22 03:11

MaxU - stop WAR against UA