I need to represent sentences in RDF format. In other words "John likes coke" would be automatically represented as: <pre class="prettyprint"><code>Subject : John Predicate : Likes Object : Coke </code></pre> Does anyone know where I should start? Are there any programs which can do this automatically or would I need to do everything from scratch?

It looks like you want the typed dependencies of a sentence, e.g. for <code>John likes coke</code>: <pre class="prettyprint"><code> nsubj(likes-2, John-1) dobj(likes-2, coke-3) </code></pre> I'm not aware of any dependency parser that directly produces RDF. However, many of them produce parses in a standardized tab limited representation known as CoNLL-X, and it shouldn't be too hard to convert from CoNLL-X to RDF. Open Source Dependency parsers There are a number of parsers to choose from that extract typed dependencies, including the following state-of-art open source options: <ul> <li> Stanford Parser - see online demo. </li> <li>MaltParser</li> <li>MSTParser</li> </ul> The Stanford Parser includes a pre-trained model for parsing English. To get typed dependencies you'll need to use the flag <code>-outputFormat typedDependencies</code>. For the MaltParser you can download an English model here. The MSTParser includes a small 200 sentence English training set that you can use to create you're own English parsing model. However, training on this little data will hurt the accuracy of the resulting parser. So, if you decide to use this parser, you are probably better off using the pretrain model available here. All of the pretrained models linked above produce parses according to the Stanford Dependency formalism (ACL paper, and manual). Of these three, the Stanford Parser is the most accurate. The MaltParser is the fastest, with some configurations of this package being able to parse 1800 sentences in only 8 seconds.

RDF representation of sentences

Tags:

artificial-intelligence

nlp

rdf

I need to represent sentences in RDF format.

In other words "John likes coke" would be automatically represented as:

Click to copy

Subject : John
Predicate : Likes
Object : Coke

Does anyone know where I should start? Are there any programs which can do this automatically or would I need to do everything from scratch?

464

asked Apr 24 '10 19:04

Lilz

2 Answers

It looks like you want the typed dependencies of a sentence, e.g. for John likes coke:

Click to copy

 nsubj(likes-2, John-1)
 dobj(likes-2, coke-3)

I'm not aware of any dependency parser that directly produces RDF. However, many of them produce parses in a standardized tab limited representation known as CoNLL-X, and it shouldn't be too hard to convert from CoNLL-X to RDF.

Open Source Dependency parsers

There are a number of parsers to choose from that extract typed dependencies, including the following state-of-art open source options:

Stanford Parser - see online demo.
MaltParser
MSTParser

The Stanford Parser includes a pre-trained model for parsing English. To get typed dependencies you'll need to use the flag -outputFormat typedDependencies.

For the MaltParser you can download an English model here.

The MSTParser includes a small 200 sentence English training set that you can use to create you're own English parsing model. However, training on this little data will hurt the accuracy of the resulting parser. So, if you decide to use this parser, you are probably better off using the pretrain model available here.

All of the pretrained models linked above produce parses according to the Stanford Dependency formalism (ACL paper, and manual).

Of these three, the Stanford Parser is the most accurate. The MaltParser is the fastest, with some configurations of this package being able to parse 1800 sentences in only 8 seconds.

answered Oct 06 '22 23:10

dmcer

One option is to use output from Link Parser, available under a GPL-compatible license. You can define a translation layer between these outputs and your RDF nodes as needed.

Check out this demo on your "John likes coke" example!

answered Oct 06 '22 23:10

Bosh

Related questions
                            
                                Python stemming (with pandas dataframe)
                            
                                OpenNLP lemmatization example
                            
                                Spacy Pipeline?
                            
                                Binary numbers instead of one hot vectors
                            
                                Best Algorithm to make correction typos in text
                            
                                Python (NLTK) - more efficient way to extract noun phrases?
                            
                                tokenizer.texts_to_sequences Keras Tokenizer gives almost all zeros
                            
                                Stanford typed dependencies using coreNLP in python
                            
                                NLP Transformers: Best way to get a fixed sentence embedding-vector shape?
                            
                                Applying SVD throws a Memory Error instantaneously?
                            
                                What is the default nltk part of speech tagset?
                            
                                How is stemming useful?
                            
                                When to use GlobalAveragePooling1D and when to use GlobalMaxPooling1D while using Keras for an LSTM model?
                            
                                In language modeling, why do I have to init_hidden weights before every new epoch of training? (pytorch)
                            
                                What is the difference between token and span (a slice from a doc) in spaCy?
                            
                                Transformer: Error importing packages. "ImportError: cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'"
                            
                                How can I create relative/approximate dates in Perl?
                            
                                using python nltk to find similarity between two web pages?
                            
                                How do I get started with a project on Text Summarization using NLP?
                            
                                What's a good measure for classifying text documents?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With