I am trying to do something very similar to that previous question but I get an error. I have a pandas dataframe containing features,label I need to do some convertion to send the features and the label variable into a machine learning object: <pre class="prettyprint"><code>import pandas import milk from scikits.statsmodels.tools import categorical </code></pre> then I have: <pre class="prettyprint"><code>trainedData=bigdata[bigdata['meta']<15] untrained=bigdata[bigdata['meta']>=15] #print trainedData #extract two columns from trainedData #convert to numpy array features=trainedData.ix[:,['ratio','area']].as_matrix(['ratio','area']) un_features=untrained.ix[:,['ratio','area']].as_matrix(['ratio','area']) print 'features' print features[:5] ##label is a string:single, touching,nuclei,dust print 'labels' labels=trainedData.ix[:,['type']].as_matrix(['type']) print labels[:5] #convert single to 0, touching to 1, nuclei to 2, dusts to 3 # tmp=categorical(labels,drop=True) targets=categorical(labels,drop=True).argmax(1) print targets </code></pre> The output console yields first: <pre class="prettyprint"><code>features [[ 0.38846334 0.97681855] [ 3.8318634 0.5724734 ] [ 0.67710876 1.01816444] [ 1.12024943 0.91508699] [ 7.51749674 1.00156707]] labels [[single] [touching] [single] [single] [nuclei]] </code></pre> I meet then the following error: <pre class="prettyprint"><code>Traceback (most recent call last): File "/home/claire/Applications/ProjetPython/projet particule et objet/karyotyper/DAPI-Trainer02-MILK.py", line 83, in <module> tmp=categorical(labels,drop=True) File "/usr/local/lib/python2.6/dist-packages/scikits.statsmodels-0.3.0rc1-py2.6.egg/scikits/statsmodels/tools/tools.py", line 206, in categorical tmp_dummy = (tmp_arr[:,None]==data).astype(float) AttributeError: 'bool' object has no attribute 'astype' </code></pre> Is it possible to convert the category variable 'type' within the dataframe into int type ? 'type' can take the values 'single', 'touching','nuclei','dusts' and I need to convert with int values such 0, 1, 2, 3.

The previous answers are outdated, so here is a solution for mapping strings to numbers that works with version 0.18.1 of Pandas. For a Series: <pre class="prettyprint"><code>In [1]: import pandas as pd In [2]: s = pd.Series(['single', 'touching', 'nuclei', 'dusts', 'touching', 'single', 'nuclei']) In [3]: s_enc = pd.factorize(s) In [4]: s_enc[0] Out[4]: array([0, 1, 2, 3, 1, 0, 2]) In [5]: s_enc[1] Out[5]: Index([u'single', u'touching', u'nuclei', u'dusts'], dtype='object') </code></pre> For a DataFrame: <pre class="prettyprint"><code>In [1]: import pandas as pd In [2]: df = pd.DataFrame({'labels': ['single', 'touching', 'nuclei', 'dusts', 'touching', 'single', 'nuclei']}) In [3]: catenc = pd.factorize(df['labels']) In [4]: catenc Out[4]: (array([0, 1, 2, 3, 1, 0, 2]), Index([u'single', u'touching', u'nuclei', u'dusts'], dtype='object')) In [5]: df['labels_enc'] = catenc[0] In [6]: df Out[4]: labels labels_enc 0 single 0 1 touching 1 2 nuclei 2 3 dusts 3 4 touching 1 5 single 0 6 nuclei 2 </code></pre>

If you have a vector of strings or other objects and you want to give it categorical labels, you can use the <code>Factor</code> class (available in the <code>pandas</code> namespace): <pre class="prettyprint"><code>In [1]: s = Series(['single', 'touching', 'nuclei', 'dusts', 'touching', 'single', 'nuclei']) In [2]: s Out[2]: 0 single 1 touching 2 nuclei 3 dusts 4 touching 5 single 6 nuclei Name: None, Length: 7 In [4]: Factor(s) Out[4]: Factor: array([single, touching, nuclei, dusts, touching, single, nuclei], dtype=object) Levels (4): [dusts nuclei single touching] </code></pre> The factor has attributes <code>labels</code> and <code>levels</code>: <pre class="prettyprint"><code>In [7]: f = Factor(s) In [8]: f.labels Out[8]: array([2, 3, 1, 0, 3, 2, 1], dtype=int32) In [9]: f.levels Out[9]: Index([dusts, nuclei, single, touching], dtype=object) </code></pre> This is intended for 1D vectors so not sure if it can be instantly applied to your problem, but have a look. BTW I recommend that you ask these questions on the statsmodels and / or scikit-learn mailing list since most of us are not frequent SO users.

Convert array of string (category) to array of int from a pandas dataframe

Tags:

python

pandas

numpy

I am trying to do something very similar to that previous question but I get an error. I have a pandas dataframe containing features,label I need to do some convertion to send the features and the label variable into a machine learning object:

import pandas
import milk
from scikits.statsmodels.tools import categorical

then I have:

trainedData=bigdata[bigdata['meta']<15]
untrained=bigdata[bigdata['meta']>=15]
#print trainedData
#extract two columns from trainedData
#convert to numpy array
features=trainedData.ix[:,['ratio','area']].as_matrix(['ratio','area'])
un_features=untrained.ix[:,['ratio','area']].as_matrix(['ratio','area'])
print 'features'
print features[:5]
##label is a string:single, touching,nuclei,dust
print 'labels'

labels=trainedData.ix[:,['type']].as_matrix(['type'])
print labels[:5]
#convert single to 0, touching to 1, nuclei to 2, dusts to 3
#
tmp=categorical(labels,drop=True)
targets=categorical(labels,drop=True).argmax(1)
print targets

The output console yields first:

features
[[ 0.38846334  0.97681855]
[ 3.8318634   0.5724734 ]
[ 0.67710876  1.01816444]
[ 1.12024943  0.91508699]
[ 7.51749674  1.00156707]]
labels
[[single]
[touching]
[single]
[single]
[nuclei]]

I meet then the following error:

Traceback (most recent call last):
File "/home/claire/Applications/ProjetPython/projet particule et objet/karyotyper/DAPI-Trainer02-MILK.py", line 83, in <module>
tmp=categorical(labels,drop=True)
File "/usr/local/lib/python2.6/dist-packages/scikits.statsmodels-0.3.0rc1-py2.6.egg/scikits/statsmodels/tools/tools.py", line 206, in categorical
tmp_dummy = (tmp_arr[:,None]==data).astype(float)
AttributeError: 'bool' object has no attribute 'astype'

Is it possible to convert the category variable 'type' within the dataframe into int type ? 'type' can take the values 'single', 'touching','nuclei','dusts' and I need to convert with int values such 0, 1, 2, 3.

405

asked Oct 18 '11 20:10

Jean-Pat

2 Answers

The previous answers are outdated, so here is a solution for mapping strings to numbers that works with version 0.18.1 of Pandas.

For a Series:

In [1]: import pandas as pd
In [2]: s = pd.Series(['single', 'touching', 'nuclei', 'dusts',
                       'touching', 'single', 'nuclei'])
In [3]: s_enc = pd.factorize(s)
In [4]: s_enc[0]
Out[4]: array([0, 1, 2, 3, 1, 0, 2])
In [5]: s_enc[1]
Out[5]: Index([u'single', u'touching', u'nuclei', u'dusts'], dtype='object')

For a DataFrame:

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'labels': ['single', 'touching', 'nuclei', 
                       'dusts', 'touching', 'single', 'nuclei']})
In [3]: catenc = pd.factorize(df['labels'])
In [4]: catenc
Out[4]: (array([0, 1, 2, 3, 1, 0, 2]), 
        Index([u'single', u'touching', u'nuclei', u'dusts'],
        dtype='object'))
In [5]: df['labels_enc'] = catenc[0]
In [6]: df
Out[4]:
         labels  labels_enc
    0    single           0
    1  touching           1
    2    nuclei           2
    3     dusts           3
    4  touching           1
    5    single           0
    6    nuclei           2

190

answered Oct 07 '22 20:10

tomp

If you have a vector of strings or other objects and you want to give it categorical labels, you can use the Factor class (available in the pandas namespace):

In [1]: s = Series(['single', 'touching', 'nuclei', 'dusts', 'touching', 'single', 'nuclei'])

In [2]: s
Out[2]: 
0    single
1    touching
2    nuclei
3    dusts
4    touching
5    single
6    nuclei
Name: None, Length: 7

In [4]: Factor(s)
Out[4]: 
Factor:
array([single, touching, nuclei, dusts, touching, single, nuclei], dtype=object)
Levels (4): [dusts nuclei single touching]

The factor has attributes labels and levels:

In [7]: f = Factor(s)

In [8]: f.labels
Out[8]: array([2, 3, 1, 0, 3, 2, 1], dtype=int32)

In [9]: f.levels
Out[9]: Index([dusts, nuclei, single, touching], dtype=object)

This is intended for 1D vectors so not sure if it can be instantly applied to your problem, but have a look.

BTW I recommend that you ask these questions on the statsmodels and / or scikit-learn mailing list since most of us are not frequent SO users.

answered Oct 07 '22 22:10

Wes McKinney

Related questions
                            
                                Find there is an emoji in a string in python3 [duplicate]
                            
                                'is' operator behaves unexpectedly with floats
                            
                                PySpark: Absolute value of a column. TypeError: a float is required
                            
                                Minimum value on a 2d array python
                            
                                How to parse SOAP XML with Python?
                            
                                Read External SQL File into Pandas Dataframe
                            
                                Coursera jupyterNotebook: revert to the beginning
                            
                                I want to change the colors in image with python from specific color range to another color
                            
                                ValueError: The field admin.LogEntry.user was declared with a lazy reference
                            
                                How to mask a list using boolean values from another list
                            
                                Flask send_file is sending old file instead of newest
                            
                                Shorthand adding/appending in Python
                            
                                base64 png in python on Windows
                            
                                Python equivalent of PHP's compact() and extract()
                            
                                Is it possible to use Python lxml on Google App Engine?
                            
                                How do I url unencode in Python?
                            
                                Why type(classInstance) is returning 'instance'?
                            
                                How can I created a PIL Image from an in-memory file?
                            
                                Using Python to extract dictionary keys within a list
                            
                                How to find the sum of all the multiples of 3 or 5 below 1000 in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With