Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Classify or keyword match a natural language string or phrase

This is my first post on StackOverflow, so apologies if it's lacking the right information.

Scenario.

I'm in the process of moving away from the Google Weather API to BOM (Australia) weather service. I've managed to get the weather data from BOM just fine using streamreaders etc, but what I'm stuck on is the image icon that matches the daily forecast.

What I did with the old Google Weather API was quite brutal yet did the trick. The Google Weather API only gave off a couple of different type of forecasts that I could jam together into a string that i could in turn use in an imageURL.

Example of what I did with the Google Weather API...

imageDay1.ImageUrl = "images/weather/" + lbWeatherDay1Cond.Text.Replace(" ", string.Empty) + ".png";

"Mostly sunny" = mostlysunny.png

"Sunny" = sunny.png

"Chance of Rain" = chanceofrain.png

"Showers" = showers.png

"Partly cloudy" = partlycloudy.png

There was on say 15 different possible options for the daily forecast.

The problems I have now and with BOM (Australia Weather Service) is this...

Possible morning shower

Shower or two, clearing later

So many thousands more.... there is no standard.

What I'm hoping is that it is possible is some of the great minds on here to create a string from a keyword within this string? Something like "Showers" for "Showers.png" or something a little more complex to recognise "Chance of Showers" as "Chanceshowers.jpg" while keeping "Shower or two" as "Showers.png".

I'm easy to any ideas or solutions (hopefully in c#). As long as it's very lightweight (the process has to be repeated for the 5 day forecast) and can capture almost any scenario...

At this point of time, I'm carrying on with String.Replace, after String.Replace, after String.Replace option.... It will do for now, but I can't roll it into production like this.

Cheers all!

Trent

like image 225
Trent Steenholdt Avatar asked Sep 20 '12 11:09

Trent Steenholdt


1 Answers

I noticed in the comments you're trying out the regex lookup table, which just might do well enough to solve the problem. However, I'm going to expand on what Adriano mentioned about a more robust Bayesian solution.

This is a problem that's related to machine learning and AI. It involves some Natural Language Processing, like how Google tries to interpret what users ask it, or how mail spam filters work.

A simple and interesting system is described by Sebastian Thrun in the following videos that were part of an online course. It begins describing a basic method by which an algorithm can learn to classify a collection of words (such as from an email) as "Spam" or "Not Spam".

(Most of the videos are really short.)

  1. Spam Detection - Quiz Answer
  2. Probability of Spam - Quiz Answer
  3. Maximum Likelihood - Quiz Answer
  4. Relationship to Bayes Networks - Quiz Answer
  5. Classification Quiz - Quiz Answer
  6. Classification 2 Quiz - Quiz Answer
  7. Classification 3 Quiz, a contrived example
  8. Quiz Answer & Laplace Smoothing - Quiz Answer
  9. Smoothed Classification Quiz - Quiz Answer
  10. Final Quiz - Quiz Answer

This Bayesian method is robust against dynamic input and is reasonably quick at learning. Then, after consuming enough training data, you would only need to save a lookup table of probabilities and do a series of arithmetic computations at runtime.

With this foundation, you could apply the same method to work for multiple classifications, e.g. one for each weather image.

like image 95
Kache Avatar answered Oct 01 '22 18:10

Kache