I'm a beginner trying to create a predictive model using Random Forest in Python, using train and test datasets. train["ALLOW/BLOCK"] can take 1 out of 4 expected values (all strings). test["ALLOW/BLOCK"] is what needs to be predicted. <pre class="prettyprint"><code>y,_ = pd.factorize(train["ALLOW/BLOCK"]) y Out[293]: array([0, 1, 0, ..., 1, 0, 2], dtype=int64) </code></pre> I used <code>predict</code> for the prediction. <pre class="prettyprint"><code>clf.predict(test[features]) clf.predict(test[features])[0:10] Out[294]: array([0, 0, 0, 0, 0, 2, 2, 0, 0, 0], dtype=int64) </code></pre> How can I get the original values instead of the numeric ones? Is the following code actually comparing the actual and predicted values? <pre class="prettyprint"><code>z,_= pd.factorize(test["AUDIT/BLOCK"]) z==clf.predict(test[features]) Out[296]: array([ True, False, False, ..., False, False, False], dtype=bool) </code></pre>

First, you need to save the <code>label</code> returned by <code>pd.factorize</code> as follows: <pre class="prettyprint"><code>y, label = pd.factorize(train["ALLOW/BLOCK"]) </code></pre> And then after you got the numeric predictions, you can extract the corresponding labels by <code>label[pred]</code>: <pre class="prettyprint"><code>pred = clf.predict(test[features]) pred_label = label[pred] </code></pre> <code>pred_label</code> contains predictions with the original values. <hr> No you should not re factorize the test predictions, since very likely the label would be different. Consider the following example: <pre class="prettyprint"><code>pd.factorize(['a', 'b', 'c']) # (array([0, 1, 2]), array(['a', 'b', 'c'], dtype=object)) pd.factorize(['c', 'a', 'b']) # (array([0, 1, 2]), array(['c', 'a', 'b'], dtype=object)) </code></pre> So the label depends on the order of the elements.

How to get original values after using factorize() in Python?

Tags:

python

pandas

random-forest

prediction

I'm a beginner trying to create a predictive model using Random Forest in Python, using train and test datasets. train["ALLOW/BLOCK"] can take 1 out of 4 expected values (all strings). test["ALLOW/BLOCK"] is what needs to be predicted.

y,_ = pd.factorize(train["ALLOW/BLOCK"])

y
Out[293]: array([0, 1, 0, ..., 1, 0, 2], dtype=int64)

I used predict for the prediction.

clf.predict(test[features])

clf.predict(test[features])[0:10]
Out[294]: array([0, 0, 0, 0, 0, 2, 2, 0, 0, 0], dtype=int64)

How can I get the original values instead of the numeric ones? Is the following code actually comparing the actual and predicted values?

z,_= pd.factorize(test["AUDIT/BLOCK"])

z==clf.predict(test[features])
Out[296]: array([ True, False, False, ..., False, False, False], dtype=bool)

276

asked Sep 09 '17 19:09

Parvathy Sarat

1 Answers

First, you need to save the label returned by pd.factorize as follows:

y, label = pd.factorize(train["ALLOW/BLOCK"])

And then after you got the numeric predictions, you can extract the corresponding labels by label[pred]:

pred = clf.predict(test[features])
pred_label = label[pred]

pred_label contains predictions with the original values.

No you should not re factorize the test predictions, since very likely the label would be different. Consider the following example:

pd.factorize(['a', 'b', 'c'])
# (array([0, 1, 2]), array(['a', 'b', 'c'], dtype=object))

pd.factorize(['c', 'a', 'b'])
# (array([0, 1, 2]), array(['c', 'a', 'b'], dtype=object))

So the label depends on the order of the elements.

answered Oct 18 '22 20:10

Psidom

Related questions
                            
                                Tweepy StreamListener extended mode
                            
                                How to do nested subplots in python
                            
                                pandas read_csv fix columns to read data with newline characters in data
                            
                                Accessing `.days` for a pandas Series of timedeltas
                            
                                Custom eval_metric_ops in Estimator in Tensorflow
                            
                                How recursive functions work inside a 'for loop'
                            
                                No attribute "call" error in tkinter font
                            
                                How to set cookies in Graphene Python mutation?
                            
                                How to replace a list of values in a numpy array?
                            
                                TensorFlow - Defining the shape of a variable dynamically, depending on the shape of another variable
                            
                                Why does creating a datetime with a tzinfo from pytz show a weird time offset?
                            
                                Exporting jupyter notebook to pdf with offline plotly graph; missing graphs
                            
                                Installing packages from a list using pip
                            
                                How to increase iterations for scipy.optimize.linprog function in python?
                            
                                How to verify structure a neural network in keras model?
                            
                                Zappa not packaging nested source directories
                            
                                return the index using pandas series.sample()?
                            
                                Python program outputting different results, even though no random is used
                            
                                How do you append the values of the first column to all other columns in a pandas dataframe
                            
                                Using Python Selenium Webdriver to open Electron Application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With