Just wondering if there are a set of design patterns for complex string manipulation?
Basically the problem I am trying to solve is I need to be able to read in a string, like the following:
"[name_of_kicker] looks to make a clearance kick, but is under some real pressure from the [name_of_defending_team] players. He gets a [length_of_kick] kick away, but it drifts into touch on the full."
or
"[name_of_kicker] receives the ball from [name_of_passer] and launches the bomb. [name_of_kicker] has really made good contact, it's given a couple of [name_of_attacking_team] chasers ample time to get under the ball as it comes down."
And replace each "tag"
with a possible value and check if the string is equal to another string.
So for example, any tag that represents a player I need to be able to replace with anyone of 22 string values that represent a player. But I also need to be able to make sure I have looped through each combination of players for the various tags, that I may find in a string. NOTE, the tags listed in the above 2 samples, are not the only tags possible, there are countless other ones that could come up in any sentence.
I had tried to create a load of nested for loops to go through the collection of players, etc and attempt to replace the tags each time, but with there being many possibilities of tags I was just creating one nested for loop within another, and it has become unmanageable, and also I suspect inefficient, since I need to loop through over 1,000 base string like the samples above, and replace difference tags with players, etc for each one...
So are there any String manipulation patterns I could look into, or does anyone have any possible solutions to solving a problem like this.
String manipulation basically refers to the process of handling and analyzing strings. It involves various operations concerned with modification and parsing of strings to use and change its data. R offers a series of in-built functions to manipulate the contents of a string.
To match a character in the string expression against a range of characters. Put brackets ( [ ] ) in the pattern string, and inside the brackets put the lowest and highest characters in the range, separated by a hyphen ( – ). Any single character within the range makes a successful match.
Firstly, to answer your question.
Just wondering if there are a set of design patterns for complex string manipulation?
Not really. There are some techniques, but they hardly qualify as design patterns. The two techniques that spring to mind are template expansion and pattern matching.
What you are currently doing / proposing to do is a form of template expansion. However, typical templating engines don't support the combinatorial expansion that you are trying to do, and as you expect anticipate, it would appear to be an inefficient way to solve your problem.
A better technique would appear to be pattern matching. Let's take your first example, and turn it into a pattern:
"(Ronaldino|Maradonna|Peter Shilton|Jackie Charlton) looks to make a clearance kick, but is under some real pressure from the (Everton|Real Madrid|Adelaide United) players. He gets a ([0-9]+ metre) kick away, but it drifts into touch on the full."
What I've done is insert all of the possible alternatives into the pseudo-template, to turn it into a regex. I can now compile this regex to a java.util.Pattern
, and use it to match against your list of other strings.
Having said that, if you are trying to do this to "analyse" text, I don't rate your chances of success. I think you would be better off going down the NLP route.
What you're describing looks a bit like what template engines are used for.
Two popular ones for Java are:
But there are many, many more, of course.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With