Convert Natural Language Questions to SQL Queries [closed]

Tags:

I want to convert Natural Language questions to SQL queries using opennlp Java library, i.e.

Who won women figure skating in Sochi?

should be converted to

select name from winners where event='skating_woman'

Anybody know which classes will be useful and how to achieve this?

Also pasting code which I tried.
I have converted question into statements and later into tokens.

/////////1st part

String paragraph = "Did Matt win the men slalom?";
InputStream is1 = new FileInputStream("bins/en-sent.bin");
SentenceModel model1 = new SentenceModel(is1);
SentenceDetectorME sdetector = new SentenceDetectorME(model1);
String sentences[] = sdetector.sentDetect(paragraph);
System.out.println(sentences[0]);
is1.close();

////////2nd part
InputStream is2 = new FileInputStream("bins/en-token.bin");
TokenizerModel model2 = new TokenizerModel(is2);
Tokenizer tokenizer = new TokenizerME(model2);
String tokens[] = tokenizer.tokenize(sentences[0]);
for (String a : tokens)
    System.out.println(a);
is2.close();

428

asked Apr 10 '14 08:04

Nevil Patel

3 Answers

Simple answer to this is that you "Could" do it - but its impractical and close to impossible.. Simple reason is that you can have many similar Questions that can mean exactly the same. With SQL query you have specific syntax you have to keep in order to get data... with standard language you have potentially 100's of different syntax to get same thing. Mapping this into SQL language is very impractical and close to impossible.

Yes you could force the user to use specific words or specific question format so your program can interpret them as SQL queries - but than again this is breaking the paradigm of what you want to achieve isn't it

Edit:

Since you are soooo desperate ;) you could "potentially" do something like this

Lets imagine a very simple Query

SELECT * FROM USERS WHERE AGE = '20';

What are the possible Human language questions?

Can you show me all people with age of 20?
Display humans with the age of 20
How many people in this group have age of 20?
How many people in this table have 20 years ?
Do we have any people aged 20 ?
Search for all people with age of twenty

What you could then do is create some sort of Map<key,value> where ie.

Key = USERS;
Value = people, humans,

And another Map

Key = SELECT;
Value = Can you show me, Display, Do we have, Search, How many;

And so on so forth - This will create a complex map with all possible phrases that can mean the same thing and correspond to a given SQL query syntax element. This may not be the best solution but it's what I would probably do - at least this is what I would start with

177

answered Oct 30 '22 14:10

Maciej Cygan

I don't have time to post code right now, but please take a look at the opennlp sentence chunker and document categorizer. I think you could -very creatively- use the doccat to establish your "key" and the sentence chunker to establish noun and verb phrases (not tokens, but actualy multi word phrases) and combine the results. So at query time, you would categorize the sentence to establish a key, then chunk the sentence, then do a query that joins to the keys table and then fuzzily (full text index maybe) the phrases table. Just a thought, if it's interesting I'll post code as an edit. you would have to build the doccat model using samples.

EDIT

Here is how to get a probability dist over a set of categories using the opennlp document categorizer, youll need to supply a properties file that has the path to the doccat model:

import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import opennlp.tools.doccat.DoccatModel;
import opennlp.tools.doccat.DocumentCategorizerME;


/**
 *
 * @author Owner
 */
public class SentimentFinder {

  private DoccatModel doccatModel;
  private DocumentCategorizerME documentCategorizerME;
  Properties props =null;

  public void init() {
    try {
      if (doccatModel == null) {
        doccatModel = new DoccatModel(new File(props.getProperty("opennlp.sentiment.model.generic")));
        documentCategorizerME = new DocumentCategorizerME(doccatModel);

      }
    } catch (IOException ex) {
      ex.printStackTrace();
    }
  }

  /**
   * Classifies text via a maxent model. Try to keep chunks of text small, or
   * typically there will be all low scores with little difference.
   *
   * @param text the string to be classified
   * @return
   */
  public Map<String, Double> probDist(String text) {
    Map<String, Double> probDist = new HashMap<String, Double>();
    if (doccatModel == null) {
      init();
    }
    double[] categorize = documentCategorizerME.categorize(text);
    int catSize = documentCategorizerME.getNumberOfCategories();
    for (int i = 0; i < catSize; i++) {
      String category = documentCategorizerME.getCategory(i);
      probDist.put(category, categorize[documentCategorizerME.getIndex(category)]);
    }
    return probDist;

  }
}

And here's how to chunk the sentence with a sentence chunker and get noun phrases

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;

/**
 *
 * Extracts noun phrases from a sentence. To create sentences using OpenNLP use
 * the SentenceDetector classes.
 */
public class OpenNLPNounPhraseExtractor {

  static final int N = 2;

  public static void main(String[] args) {

    try {
      HashMap<String, Integer> termFrequencies = new HashMap<>();
      String modelPath = "c:\\temp\\opennlpmodels\\";
      TokenizerModel tm = new TokenizerModel(new FileInputStream(new File(modelPath + "en-token.zip")));
      TokenizerME wordBreaker = new TokenizerME(tm);
      POSModel pm = new POSModel(new FileInputStream(new File(modelPath + "en-pos-maxent.zip")));
      POSTaggerME posme = new POSTaggerME(pm);
      InputStream modelIn = new FileInputStream(modelPath + "en-chunker.zip");
      ChunkerModel chunkerModel = new ChunkerModel(modelIn);
      ChunkerME chunkerME = new ChunkerME(chunkerModel);
      //this is your sentence
      String sentence = "Barack Hussein Obama II  is the 44th awesome President of the United States, and the first African American to hold the office.";
      //words is the tokenized sentence
      String[] words = wordBreaker.tokenize(sentence);
      //posTags are the parts of speech of every word in the sentence (The chunker needs this info of course)
      String[] posTags = posme.tag(words);
      //chunks are the start end "spans" indices to the chunks in the words array
      Span[] chunks = chunkerME.chunkAsSpans(words, posTags);
      //chunkStrings are the actual chunks
      String[] chunkStrings = Span.spansToStrings(chunks, words);
      for (int i = 0; i < chunks.length; i++) {
        String np = chunkStrings[i];
        if (chunks[i].getType().equals("NP")) {
          if (termFrequencies.containsKey(np)) {
            termFrequencies.put(np, termFrequencies.get(np) + 1);
          } else {
            termFrequencies.put(np, 1);
          }
        }
      }
      System.out.println(termFrequencies);

    } catch (IOException e) {
    }
  }

}

so what I was thinking is to classify the input text, and extract and store noun phrases, you could then, at query time, classify the input, get a category, then do something like this in SQL

select * from categories a inner join nounphrases b on a.id = b.catid where catname = @thecatIjustgotfromtheclassifier and contains(text,'search term')

or something like that

answered Oct 30 '22 12:10

Mark Giaconia

Check out Kueri.me

This product provides technology which powers a search-box which converts English to SQL, It's built to construct an Answer Engine to relational databases.

It can recognizes various filters, groupings, and other analytics elements, so it can answer questions like

Show all orders from France in the last 4 months
Show top 5 orders by total sum in the last week
What's the average number of orders per customer

and so on.

You do have to configure the system to recognize your tables, fields and jargon.

Note : I work at kueri.me

answered Oct 30 '22 13:10

Yossi Vainshtein

Related questions
                            
                                Efficient algo to find number of integers in a sorted array that are within a certain range in O(log(N)) time?
                            
                                Why does *= not give any errors when implicitly casting a float to an int? [duplicate]
                            
                                HTTP Request using Netty
                            
                                How linkedhashmap maintains insertion order
                            
                                How to detect whether String.substring copies the character data
                            
                                How to edit an array from another class in Java [duplicate]
                            
                                Find a string (or a line) in a txt File Java
                            
                                java aes 256 java.security.InvalidKeyException: Illegal key size after installation the policy [duplicate]
                            
                                Getting certificates from PKCS11 Smartcard without PIN/password
                            
                                How do I build a GenericDao using QueryDSL?
                            
                                Finding the Bytecode Size of a Java Method
                            
                                Java equivalent of .NET RSACryptoServiceProvider with SHA-1
                            
                                How to convert a JPA OneToMany relationship to DTO
                            
                                Is it possible to bind the non-empty state of an ObservableList inside an ObjectProperty with the Bindings API?
                            
                                Assigning floating value to a double value
                            
                                Maven 3 does not update snapshot dependency from local repository
                            
                                Huge XML file to text files
                            
                                Java Method Unbounded Type or Class Return
                            
                                Gradle: execute Groovy interactive shell with project classpath
                            
                                Send Image in JSON request

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert Natural Language Questions to SQL Queries [closed]

Tags:

java

desktop-application

opennlp