Labels of datasets imported with sklearn.datasets.load_files

Tags:

I'm wondering how to match the labels produced by a SVN classifier with the ones on my dataset. ANd then I realized that the problem starts at the begining: when I load the dataset I got a dataset which in my case has the following properties:

.data = the news text
.target_names = label used in the dataset e.g. ["positive", "negative"]
.target = A matrix with a number for each news with a label.

But I,m wondering if the order og the target_names is different across different datasets (with the sametags but different news), and if the order of the .data elements influences that.

Is there any way to easily know the label of a number in the .target matrix? (I mean, what does 0 or 1 represents in such a matrix)

Best,

609

asked Apr 10 '19 16:04

gal007

1 Answers

The corresponding label for an entry i in .target is available as .target_names[i]. In your example: .target_names[1] is "negative".

The order of the target names will be the same across different datasets, as long as the tags are exactly the same. This is because sklearn.datasets.load_files() creates the tags from the sorted folder names, as we can see in the source code (v.20.x):

[...]
folders = [f for f in sorted(listdir(container_path))
           if isdir(join(container_path, f))]

if categories is not None:
    folders = [f for f in folders if f in categories]

for label, folder in enumerate(folders):
    target_names.append(folder)
[...]

I'd still suggest to always retrieve the label from target_names of the current dataset to be on the safe side (implementations may change over time etc.)

138

answered Oct 14 '22 01:10

rvf

Related questions
                            
                                Approve a CSR in Kuberentes Using the Python client
                            
                                Instance of 'OneToOneField' has no 'username' member
                            
                                How to dynamically remove a decorator from a function?
                            
                                How to set a prefix for all print() output in python?
                            
                                Converting "year" and "week of year" columns to "date" in Pandas
                            
                                why using cv2.calcHist always has an errer "returned NULL without setting an error"
                            
                                Pandas: How to read specific rows from a CSV file
                            
                                Locally disable warnings of Python Language Server in Visual Studio Code
                            
                                WEBP support not installed error with Pillow included in Anaconda
                            
                                How to do alpha matting in python
                            
                                Find number of non-zero elements adjacent to zeros in numpy 2D array
                            
                                Cube root of a very large number using only math library
                            
                                "Expected a list of items but got type \"dict\"."
                            
                                How to run TF object detection API model_main.py in evaluation mode only
                            
                                Spawn actor from class in Unreal Engine using Python
                            
                                Is it normal to have a settings file for each staging instance/version in a Django project?
                            
                                How can I embed Superset Apache into Flask web app?
                            
                                How to link interactive problems (w.r.t. CodeJam)?
                            
                                How to extract value from span tag
                            
                                How to install google.cloud automl_v1beta1 for python using anaconda?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Labels of datasets imported with sklearn.datasets.load_files

Tags:

python

scikit-learn

gal007

People also ask

1 Answers

rvf

Recent Activity

Donate For Us