Features are used for model training and testing. What are the differences between lexical features and orthographic features in Natural Language Processing? Examples preferred.
lexical features: whole word, prefix/suffix (various lengths possible), stemmed word, lemmatized word. shape features: uppercase, titlecase, camelcase, lowercase. grammatical and syntactic features: POS, part of a noun-phrase, head of a verb phrase, complement of a prepositional phrase, etc...
In a lexical feature, the data found is in the form of nouns, verbs, adjectives, adverbs, compound noun, and word family. While, on the results of the analysis of grammatical feature, there are adjective markers and sentence structure such as simple present tense, simple future tense and of the present perfect tense.
The lexical features are unigrams, bigrams, and the surface form of the target word, while the syntactic features are part of speech tags and various components from a parse tree.
The lexical unit can be: ( 1) a single word, (2) the ha- bitual co–occurrence of two words and (3) also a frequent recurrent uninterrupted string of words. Second and third notion refers to the definition of a collocation or a multi– word unit. It is common to consider a single word as a lexical unit.
I am not aware of such a distinction, and most of the time when people talk about lexical features they talk about using the word itself, in contrast to only using other features, ie its part-of-speech.
Here is an example of a paper that means "whole word orthograph" when they say lexical features
One could venture that orthographic could mean something more abstract than the sequence of characters themselves, for example whether the sequence is capitalized / titlecased / camelcased / etc. But we already have the useful and clearly understood shape feature denomination for that.
As such, I would recommend distinguishing features like this:
lexical features: whole word, prefix/suffix (various lengths possible), stemmed word, lemmatized word
shape features: uppercase, titlecase, camelcase, lowercase
grammatical and syntactic features: POS, part of a noun-phrase, head of a verb phrase, complement of a prepositional phrase, etc...
This is not an exhaustive list of possible features and feature categories, but it might help you categorizing linguistic features in a clearer and more widely-accepted way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With