Error in parsing, update multiple columns in 1 line

Question

Input to pd.read_clipboard()

Ratanhia ,30c x2, 200c x2
Aloe ,30c x2, 200c x2
Nitric Acid ,30c x2, 200c x 2
Sedum Acre ,200c x2, 30c x2
Paeonia ,200c x2, 30c x2
Sulphur ,200c x2, 30c x2
Hamamelis ,30c x1, 200c x1
Aesculus ,30c x1, 200c x1⁠⁠⁠⁠

Code:

import pandas as pd

df = pd.read_clipboard(header=None, sep=',')
df.columns = ['Medicine','power30c','power200c']
df.power30c=df.power30c.apply(lambda x: x[-1])
df.power200c=df.power200c.apply(lambda x: x[-1])
print df

Output:

       Medicine power30c power200c
0     Ratanhia         2         2
1         Aloe         2         2
2  Nitric Acid         2         2
3   Sedum Acre         2         2
4      Paeonia         2         2
5      Sulphur         2         2
6    Hamamelis         1         1
7     Aesculus         1         �

Questions:

Why that � in the last row?
How to modify more than 1 column in 1 line?

df[['power30c','power200c']] = df[['power30c','power200c']].apply(lambda x: x[-1])

Throws error:

ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements

Python Version: 2.7, Pandas: 0.19, IPython: 4

jezrael · Accepted Answer

You need parameter skipinitialspace:

df = pd.read_clipboard(sep=',',
                       names=['Medicine','power30c','power200c'], 
                       skipinitialspace=True)
print (df)
       Medicine power30c power200c
0     Ratanhia    30c x2   200c x2
1         Aloe    30c x2   200c x2
2  Nitric Acid    30c x2  200c x 2
3   Sedum Acre   200c x2    30c x2
4      Paeonia   200c x2    30c x2
5      Sulphur   200c x2    30c x2
6    Hamamelis    30c x1   200c x1
7     Aesculus    30c x1   200c x1

And then indexing with str:

df[['power30c','power200c']] = df[['power30c','power200c']].apply(lambda x: x.str[-1])
print (df)
       Medicine power30c power200c
0     Ratanhia         2         2
1         Aloe         2         2
2  Nitric Acid         2         2
3   Sedum Acre         2         2
4      Paeonia         2         2
5      Sulphur         2         2
6    Hamamelis         1         1
7     Aesculus         1         1

gntoni · Answer

There is some characters at the end of the text you posted. If you do the copy from here it works:

Ratanhia ,30c x2, 200c x2
Aloe ,30c x2, 200c x2
Nitric Acid ,30c x2, 200c x 2
Sedum Acre ,200c x2, 30c x2
Paeonia ,200c x2, 30c x2
Sulphur ,200c x2, 30c x2
Hamamelis ,30c x1, 200c x1
Aesculus ,30c x1, 200c x1

When I use the text you posted, you can see the extra characters:

In [88]: df.power200c[6]
Out[88]: '200c x1'
In [89]: df.power200c[7]
Out[89]: '200c x1\xe2\x81\xa0\xe2\x81\xa0\xe2\x81\xa0\xe2\x81\xa0'

nuriselcuk · Answer

This is a syntax error

    Ratanhia ,30c x2, 200c x2
    Aloe ,30c x2, 200c x2
    Nitric Acid ,30c x2, 200c x2
    Sedum Acre ,200c x2, 30c x2
    Paeonia ,200c x2, 30c x2
    Sulphur ,200c x2, 30c x2
    Hamamelis ,30c x1, 200c x1
    Aesculus ,30c x1, 200c x1

The last character in your table referring non-utf-8. Please find corrected one above. After solve this length mismatch issue resolved automatically.

MaxU - stop WAR against UA · Answer

I think those solution don't address the fact that you have mixed data in "power*" columns:

       Medicine power30c power200c
0     Ratanhia    30c x2   200c x2
1         Aloe    30c x2   200c x2
2  Nitric Acid    30c x2  200c x 2
3   Sedum Acre   200c x2    30c x2   #  / NOTE: mixed up values
4      Paeonia   200c x2    30c x2   # <  "200c" is in the "power30c" column
5      Sulphur   200c x2    30c x2   #  \ and "30c" is in the "power200c" column
6    Hamamelis    30c x1   200c x1
7     Aesculus    30c x1   200c x1

Here is another solution:

In [34]: df
Out[34]:
       Medicine power30c     power200c
0     Ratanhia    30c x2       200c x2
1         Aloe    30c x2       200c x2
2  Nitric Acid    30c x2      200c x 2
3   Sedum Acre   200c x2        30c x2
4      Paeonia   200c x2        30c x2
5      Sulphur   200c x2        30c x2
6    Hamamelis    30c x1       200c x1
7     Aesculus    30c x1   200c x1⁠⁠⁠⁠

In [35]: (df.set_index('Medicine')
    ...:    .stack()
    ...:    .str.extract(r'(\d+)c\s+x\s*(\d+)', expand=True)
    ...:    .reset_index(level=1, drop=1)
    ...:    .pivot(columns=0, values=1)
    ...:    .add_prefix('power')
    ...:    .add_suffix('c')
    ...:    .reset_index()
    ...: )
    ...:
Out[35]:
0      Medicine power200c power30c
0     Aesculus          1        1
1         Aloe          2        2
2    Hamamelis          1        1
3  Nitric Acid          2        2
4      Paeonia          2        2
5     Ratanhia          2        2
6   Sedum Acre          2        2
7      Sulphur          2        2

Error in parsing, update multiple columns in 1 line

Tags:

python

pandas

python-2.7

Mohammad Yusuf

4 Answers

jezrael

gntoni

nuriselcuk

MaxU - stop WAR against UA

Recent Activity

Donate For Us

Error in parsing, update multiple columns in 1 line

Tags:

python

pandas

python-2.7

Mohammad Yusuf

4 Answers

jezrael

gntoni

nuriselcuk

MaxU - stop WAR against UA

Related questions

Recent Activity

Donate For Us