Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two English strings for similarities

So here is my problem. I have two paragraphs of text and I need to see if they are similar. Not in the sense of string metrics but in meaning. The following two paragraphs are related but I need to find out if they cover the 'same' topic. Any help or direction to solving this problem would be greatly appreciated.

Fossil fuels are fuels formed by natural processes such as anaerobic decomposition of buried dead organisms. The age of the organisms and their resulting fossil fuels is typically millions of years, and sometimes exceeds 650 million years. The fossil fuels, which contain high percentages of carbon, include coal, petroleum, and natural gas. Fossil fuels range from volatile materials with low carbon:hydrogen ratios like methane, to liquid petroleum to nonvolatile materials composed of almost pure carbon, like anthracite coal. Methane can be found in hydrocarbon fields, alone, associated with oil, or in the form of methane clathrates. It is generally accepted that they formed from the fossilized remains of dead plants by exposure to heat and pressure in the Earth's crust over millions of years. This biogenic theory was first introduced by Georg Agricola in 1556 and later by Mikhail Lomonosov in the 18th century.

Second:

Fossil fuel reforming is a method of producing hydrogen or other useful products from fossil fuels such as natural gas. This is achieved in a processing device called a reformer which reacts steam at high temperature with the fossil fuel. The steam methane reformer is widely used in industry to make hydrogen. There is also interest in the development of much smaller units based on similar technology to produce hydrogen as a feedstock for fuel cells. Small-scale steam reforming units to supply fuel cells are currently the subject of research and development, typically involving the reforming of methanol or natural gas but other fuels are also being considered such as propane, gasoline, autogas, diesel fuel, and ethanol.

like image 724
Jarod D Avatar asked Aug 17 '11 00:08

Jarod D


People also ask

How do you compare two strings are similar?

Use the == and != operators to compare two strings for equality. Use the is operator to check if two strings are the same instance. Use the < , > , <= , and >= operators to compare strings alphabetically.

How do you determine if strings are similar?

There are many ways to determine the similarity of two strings. One of the most common is that of the edit distance of which the Levenshtein distance is an example of (and there are several variations and other approaches - take a glance at Category:String similarity measures on Wikipedia).

Can you use == to compare two strings?

You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not. On the other hand, equals() method compares whether the value of the strings is equal, and not the object itself.

Which method is used to compare two strings?

The compareTo() method compares two strings lexicographically. The comparison is based on the Unicode value of each character in the strings. The method returns 0 if the string is equal to the other string.


1 Answers

That's a tall order. If I were you, I'd start reading up on Natural Language Processing. NLP is a fairly large field -- I would recommend looking specifically at the things mentioned in the Wikipedia Text Analytics article's "Processes" section.

I think if you make use of information retrieval, named entity recognition, and sentiment analysis, you should be well on your way.

like image 168
Benson Avatar answered Sep 28 '22 16:09

Benson