Using scikit-learn to training an NLP log linear model for NER

1 Answers

In scikit-learn 0.16 and higher, you can use the multinomial option for sklearn.linear_model.LogisticRegression to train a log-linear model (a.k.a. MaxEnt classifier, multiclass logistic regression). Currently the multinomial option is supported only by the ‘lbfgs’ and ‘newton-cg’ solvers.

Example with the Iris data set (4 features, 3 classes, 150 samples):

#!/usr/bin/python
# -*- coding: utf-8 -*-

from __future__ import print_function
from __future__ import division

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

# Import data 
iris = datasets.load_iris()
X = iris.data # features
y_true = iris.target # labels

# Look at the size of the feature matrix and the label vector:
print('iris.data.shape: {0}'.format(iris.data.shape))
print('iris.target.shape: {0}\n'.format(iris.target.shape))

#  Instantiate a MaxEnt model
logreg = linear_model.LogisticRegression(C=1e5, multi_class='multinomial', solver='lbfgs')

# Train the model
logreg.fit(X, y_true)
print('logreg.coef_: \n{0}\n'.format(logreg.coef_))
print('logreg.intercept_: \n{0}'.format(logreg.intercept_))

# Use the model to make predictions
y_pred = logreg.predict(X)
print('\ny_pred: \n{0}'.format(y_pred))

# Assess the quality of the predictions
print('\nconfusion_matrix(y_true, y_pred):\n{0}\n'.format(confusion_matrix(y_true, y_pred)))
print('classification_report(y_true, y_pred): \n{0}'.format(classification_report(y_true, y_pred)))

The multinomial option for sklearn.linear_model.LogisticRegression was introduced in version 0.16:

Add multi_class="multinomial" option in :class:linear_model.LogisticRegression to implement a Logistic Regression solver that minimizes the cross-entropy or multinomial loss instead of the default One-vs-Rest setting. Supports lbfgs and newton-cg solvers. By Lars Buitinck_ and Manoj Kumar_. Solver option newton-cg by Simon Wu.

166

answered Oct 12 '22 17:10

Franck Dernoncourt

Related questions
                            
                                how to automatically detect acronym meaning / extension
                            
                                Sentence annotation in text without punctuation
                            
                                Chunking NP, VP and PP phrases in Java (CoreNLP)
                            
                                Sentence tokenization for texts that contains quotes
                            
                                NLTK - WordNet: list of long words
                            
                                How to extract Predicate and subject from a sentence using NLP Libraries?
                            
                                Using predict on new text with kmeans (sklearn)?
                            
                                NLP, spaCy: Strategy for improving document similarity
                            
                                How can I create and fit vocab.bpe file (GPT and GPT2 OpenAI models) with my own corpus text?
                            
                                Extracting a person's age from unstructured text in Python
                            
                                How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?
                            
                                How to Convert English to Cron?
                            
                                NLP algorithm to 'fill out' search terms
                            
                                python nltk keyword extraction from sentence
                            
                                most efficient edit distance to identify misspellings in names?
                            
                                Stanford CoreNLP sentiment
                            
                                Independent clause boundary disambiguation, and independent clause segmentation – any tools to do this?
                            
                                Can I control the way the CountVectorizer vectorizes the corpus in scikit learn?
                            
                                Updating the feature names into scikit TFIdfVectorizer
                            
                                Language detection API/Library [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using scikit-learn to training an NLP log linear model for NER

Tags:

nlp

scikit-learn

Franck Dernoncourt

People also ask

1 Answers

Franck Dernoncourt

Recent Activity

Donate For Us