Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Analyse the sentences and extract person name, organization and location with the help of NLP

I need to solve the following using NLP, can you give me pointers on how to achieve this using OpenNLP API

a. How to find out if a sentence implies a certain action in the past, present or future.

(e.g.) I was very sad last week - past
       I feel like hitting my neighbor - present
       I am planning to go to New York next week - future

b. How to find the word which corresponds to a person or company or country

(e.g.) John is planning to specialize in Electrical Engineering in UC Berkley and pursue a career with IBM).

Person = John

Company = IBM

Location = Berkley

Thanks

like image 350
SST Avatar asked Aug 01 '13 09:08

SST


People also ask

How information is extracted in NLP?

Named Entity Recognition The most basic and useful technique in NLP is extracting the entities in the text. It highlights the fundamental concepts and references in the text. Named entity recognition (NER) identifies entities such as people, locations, organizations, dates, etc. from the text.

Why is information extraction An important concept in NLP?

Information extraction can save time and money by reducing human effort and making the process less error-prone and efficient. Deep Learning and NLP techniques like Named Entity Recognition may be used to extract information from text input.

Which AI is used to extract information from unstructured data?

Text mining (also referred to as text analytics) is an artificial intelligence (AI) technology that uses natural language processing (NLP) to transform the free (unstructured) text in documents and databases into normalized, structured data suitable for analysis or to drive machine learning (ML) algorithms.


1 Answers

I can provide solution of

Solution of b.

Here is code :

    public class tikaOpenIntro {

    public String Tokens[];

    public static void main(String[] args) throws IOException, SAXException,
            TikaException {

        tikaOpenIntro toi = new tikaOpenIntro();


        String cnt;

        cnt="John is planning to specialize in Electrical Engineering in UC Berkley and pursue a career with IBM.";

                toi.tokenization(cnt);

        String names = toi.namefind(toi.Tokens);
        String org = toi.orgfind(toi.Tokens);

                System.out.println("person name is : "+names);
        System.out.println("organization name is: "+org);

    }
        public String namefind(String cnt[]) {
        InputStream is;
        TokenNameFinderModel tnf;
        NameFinderME nf;
        String sd = "";
        try {
            is = new FileInputStream(
                    "/home/rahul/opennlp/model/en-ner-person.bin");
            tnf = new TokenNameFinderModel(is);
            nf = new NameFinderME(tnf);

            Span sp[] = nf.find(cnt);

            String a[] = Span.spansToStrings(sp, cnt);
            StringBuilder fd = new StringBuilder();
            int l = a.length;

            for (int j = 0; j < l; j++) {
                fd = fd.append(a[j] + "\n");

            }
            sd = fd.toString();

        } catch (FileNotFoundException e) {

            e.printStackTrace();
        } catch (InvalidFormatException e) {

            e.printStackTrace();
        } catch (IOException e) {

            e.printStackTrace();
        }
        return sd;
    }

    public String orgfind(String cnt[]) {
        InputStream is;
        TokenNameFinderModel tnf;
        NameFinderME nf;
        String sd = "";
        try {
            is = new FileInputStream(
                    "/home/rahul/opennlp/model/en-ner-organization.bin");
            tnf = new TokenNameFinderModel(is);
            nf = new NameFinderME(tnf);
            Span sp[] = nf.find(cnt);
            String a[] = Span.spansToStrings(sp, cnt);
            StringBuilder fd = new StringBuilder();
            int l = a.length;

            for (int j = 0; j < l; j++) {
                fd = fd.append(a[j] + "\n");

            }

            sd = fd.toString();

        } catch (FileNotFoundException e) {

            e.printStackTrace();
        } catch (InvalidFormatException e) {

            e.printStackTrace();
        } catch (IOException e) {

            e.printStackTrace();
        }
        return sd;

    }


    public void tokenization(String tokens) {

        InputStream is;
        TokenizerModel tm;

        try {
            is = new FileInputStream("/home/rahul/opennlp/model/en-token.bin");
            tm = new TokenizerModel(is);
            Tokenizer tz = new TokenizerME(tm);
            Tokens = tz.tokenize(tokens);
            // System.out.println(Tokens[1]);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

and you want location also then import location model also that is available on openNLP source Forge. you can download and you can use them.

I am not sure about what will be probability of Name, Location, and Organization Extraction but almost it recognize all names,location,organization.

and if don't find openNLP sufficient then use Stanford Parser for Name Entity Recognization.

like image 73
Rahul Kulhari Avatar answered Sep 27 '22 18:09

Rahul Kulhari