Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences between lexical features and orthographic features in NLP?

Tags:

nlp

Features are used for model training and testing. What are the differences between lexical features and orthographic features in Natural Language Processing? Examples preferred.

like image 243
DehengYe Avatar asked Oct 22 '15 13:10

DehengYe


People also ask

What is lexical features NLP?

lexical features: whole word, prefix/suffix (various lengths possible), stemmed word, lemmatized word. shape features: uppercase, titlecase, camelcase, lowercase. grammatical and syntactic features: POS, part of a noun-phrase, head of a verb phrase, complement of a prepositional phrase, etc...

What are lexical features?

In a lexical feature, the data found is in the form of nouns, verbs, adjectives, adverbs, compound noun, and word family. While, on the results of the analysis of grammatical feature, there are adjective markers and sentence structure such as simple present tense, simple future tense and of the present perfect tense.

What is lexical and syntactic features?

The lexical features are unigrams, bigrams, and the surface form of the target word, while the syntactic features are part of speech tags and various components from a parse tree.

What are the main features of the lexical unit?

The lexical unit can be: ( 1) a single word, (2) the ha- bitual co–occurrence of two words and (3) also a frequent recurrent uninterrupted string of words. Second and third notion refers to the definition of a collocation or a multi– word unit. It is common to consider a single word as a lexical unit.


1 Answers

I am not aware of such a distinction, and most of the time when people talk about lexical features they talk about using the word itself, in contrast to only using other features, ie its part-of-speech.

Here is an example of a paper that means "whole word orthograph" when they say lexical features

One could venture that orthographic could mean something more abstract than the sequence of characters themselves, for example whether the sequence is capitalized / titlecased / camelcased / etc. But we already have the useful and clearly understood shape feature denomination for that.

As such, I would recommend distinguishing features like this:

lexical features: whole word, prefix/suffix (various lengths possible), stemmed word, lemmatized word

shape features: uppercase, titlecase, camelcase, lowercase

grammatical and syntactic features: POS, part of a noun-phrase, head of a verb phrase, complement of a prepositional phrase, etc...

This is not an exhaustive list of possible features and feature categories, but it might help you categorizing linguistic features in a clearer and more widely-accepted way.

like image 92
HugoMailhot Avatar answered Nov 20 '22 18:11

HugoMailhot