Input to pd.read_clipboard()
Ratanhia ,30c x2, 200c x2
Aloe ,30c x2, 200c x2
Nitric Acid ,30c x2, 200c x 2
Sedum Acre ,200c x2, 30c x2
Paeonia ,200c x2, 30c x2
Sulphur ,200c x2, 30c x2
Hamamelis ,30c x1, 200c x1
Aesculus ,30c x1, 200c x1
Code:
import pandas as pd
df = pd.read_clipboard(header=None, sep=',')
df.columns = ['Medicine','power30c','power200c']
df.power30c=df.power30c.apply(lambda x: x[-1])
df.power200c=df.power200c.apply(lambda x: x[-1])
print df
Output:
Medicine power30c power200c
0 Ratanhia 2 2
1 Aloe 2 2
2 Nitric Acid 2 2
3 Sedum Acre 2 2
4 Paeonia 2 2
5 Sulphur 2 2
6 Hamamelis 1 1
7 Aesculus 1 �
Questions:
�
in the last row?df[['power30c','power200c']] = df[['power30c','power200c']].apply(lambda x: x[-1])
Throws error:
ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements
Python Version: 2.7, Pandas: 0.19, IPython: 4
You need parameter skipinitialspace
:
df = pd.read_clipboard(sep=',',
names=['Medicine','power30c','power200c'],
skipinitialspace=True)
print (df)
Medicine power30c power200c
0 Ratanhia 30c x2 200c x2
1 Aloe 30c x2 200c x2
2 Nitric Acid 30c x2 200c x 2
3 Sedum Acre 200c x2 30c x2
4 Paeonia 200c x2 30c x2
5 Sulphur 200c x2 30c x2
6 Hamamelis 30c x1 200c x1
7 Aesculus 30c x1 200c x1
And then indexing with str:
df[['power30c','power200c']] = df[['power30c','power200c']].apply(lambda x: x.str[-1])
print (df)
Medicine power30c power200c
0 Ratanhia 2 2
1 Aloe 2 2
2 Nitric Acid 2 2
3 Sedum Acre 2 2
4 Paeonia 2 2
5 Sulphur 2 2
6 Hamamelis 1 1
7 Aesculus 1 1
There is some characters at the end of the text you posted. If you do the copy from here it works:
Ratanhia ,30c x2, 200c x2
Aloe ,30c x2, 200c x2
Nitric Acid ,30c x2, 200c x 2
Sedum Acre ,200c x2, 30c x2
Paeonia ,200c x2, 30c x2
Sulphur ,200c x2, 30c x2
Hamamelis ,30c x1, 200c x1
Aesculus ,30c x1, 200c x1
When I use the text you posted, you can see the extra characters:
In [88]: df.power200c[6]
Out[88]: '200c x1'
In [89]: df.power200c[7]
Out[89]: '200c x1\xe2\x81\xa0\xe2\x81\xa0\xe2\x81\xa0\xe2\x81\xa0'
This is a syntax error
Ratanhia ,30c x2, 200c x2
Aloe ,30c x2, 200c x2
Nitric Acid ,30c x2, 200c x2
Sedum Acre ,200c x2, 30c x2
Paeonia ,200c x2, 30c x2
Sulphur ,200c x2, 30c x2
Hamamelis ,30c x1, 200c x1
Aesculus ,30c x1, 200c x1
The last character in your table referring non-utf-8. Please find corrected one above. After solve this length mismatch issue resolved automatically.
I think those solution don't address the fact that you have mixed data in "power*" columns:
Medicine power30c power200c
0 Ratanhia 30c x2 200c x2
1 Aloe 30c x2 200c x2
2 Nitric Acid 30c x2 200c x 2
3 Sedum Acre 200c x2 30c x2 # / NOTE: mixed up values
4 Paeonia 200c x2 30c x2 # < "200c" is in the "power30c" column
5 Sulphur 200c x2 30c x2 # \ and "30c" is in the "power200c" column
6 Hamamelis 30c x1 200c x1
7 Aesculus 30c x1 200c x1
Here is another solution:
In [34]: df
Out[34]:
Medicine power30c power200c
0 Ratanhia 30c x2 200c x2
1 Aloe 30c x2 200c x2
2 Nitric Acid 30c x2 200c x 2
3 Sedum Acre 200c x2 30c x2
4 Paeonia 200c x2 30c x2
5 Sulphur 200c x2 30c x2
6 Hamamelis 30c x1 200c x1
7 Aesculus 30c x1 200c x1
In [35]: (df.set_index('Medicine')
...: .stack()
...: .str.extract(r'(\d+)c\s+x\s*(\d+)', expand=True)
...: .reset_index(level=1, drop=1)
...: .pivot(columns=0, values=1)
...: .add_prefix('power')
...: .add_suffix('c')
...: .reset_index()
...: )
...:
Out[35]:
0 Medicine power200c power30c
0 Aesculus 1 1
1 Aloe 2 2
2 Hamamelis 1 1
3 Nitric Acid 2 2
4 Paeonia 2 2
5 Ratanhia 2 2
6 Sedum Acre 2 2
7 Sulphur 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With