I am working on detecting sentences which start and end with hashtags. As of now, I only have code to find words, which is part of this mechanism. How can I find sentences depending upon case below.
Case 1:
Hello, #how are you# today.
In this case, I want to detect how are you
. Now if there is only a word, then the above case is to be ignored.
Case 2:
Hello, #how are you #today.
In this case, only words #how
and #today
are found, which I already have working. No sentences here as words don't end with hashtag.
Code:
@Override
public List<String> findHashTags(String text){
if(text == null){
return new ArrayList<>();
}
String[] tagSet = text.split(" ");
Set<String> sortedTags = new HashSet<>();
List<String> processedTags = new ArrayList<>();
for(String tags : tagSet){
if(tags.startsWith("#")){
sortedTags.add(tags);
}
}
processedTags.addAll(sortedTags);
return processedTags;
}
@Override
public List<String> findHashTags(String text){
if(text == null){
return new ArrayList<>();
}
Set<String> sortedTags = new HashSet<>();
List<String> processedTags = new ArrayList<>();
Pattern pattern = Pattern.compile("#\\b.*?\\b#|\\B#\\w+");
Matcher matcher = pattern.matcher(text);
while (matcher.find()){
String outString = matcher.group();
outString = outString.replace("#","");
outString = outString.replace(",","");
sortedTags.add(outString);
}
processedTags.addAll(sortedTags);
return processedTags;
}
Read our A sentence that begins and ends with the same word – such as “Nice to see you; to see you nice!”– is called an epanadiplosis, according to Haggard Hawks, one of Twitter’s best word-mavens. So I asked for 10 more. 1.
Each value identifies the algorithm used to produce the hash - if the implementation decides to use a different approach, it would know to validate a password using the first version, but could then replace the {enc:1} with {enc:2} or whatever, along with the new form of the hash.
The rest is base64 - and it's 24 characters ending in ==, which means the original value is 16 bytes. So it's a 16 byte (128 bit) hash of some kind, with a versioning prefix.
You may use a regex to match substrings from a #
that is followed with a word char and up to the first #
that is preceded with a word char, or match a #
that is not preceded with a word char and then has any 1+ word chars.
#\b.*?\b#|\B#\w+
See the regex demo
You may precise it a bit if you want to exclude substrings like #_ s#
by turning the first \b
into (?=\p{L})
/ (?=[a-zA-Z])
to require a letter.
See a Java demo:
List<String> results = new ArrayList<>();
String s = "Hello, #how are you# today. Hello, #how are you #today.";
Pattern pattern = Pattern.compile("#\\b.*?\\b#|\\B#\\w+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
results.add(matcher.group());
}
System.out.println(results);
// => [#how are you#, #how, #today]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With