Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find sentences begining and ending with hash

Tags:

java

string

regex

I am working on detecting sentences which start and end with hashtags. As of now, I only have code to find words, which is part of this mechanism. How can I find sentences depending upon case below.

Case 1:

Hello, #how are you# today. 

In this case, I want to detect how are you. Now if there is only a word, then the above case is to be ignored.

Case 2:

Hello, #how are you #today. 

In this case, only words #how and #today are found, which I already have working. No sentences here as words don't end with hashtag.

Code:

@Override
public List<String> findHashTags(String text){
    if(text == null){
        return new ArrayList<>();
    }
    String[] tagSet = text.split(" ");
    Set<String> sortedTags = new HashSet<>();
    List<String> processedTags = new ArrayList<>();
    for(String tags : tagSet){
         if(tags.startsWith("#")){
             sortedTags.add(tags);
         }
    }
    processedTags.addAll(sortedTags);
    return processedTags;
}

Updated code

@Override
    public List<String> findHashTags(String text){
        if(text == null){
            return new ArrayList<>();
        }
        Set<String> sortedTags = new HashSet<>();
        List<String> processedTags = new ArrayList<>();
        Pattern pattern = Pattern.compile("#\\b.*?\\b#|\\B#\\w+");
        Matcher matcher = pattern.matcher(text);
        while (matcher.find()){
            String outString = matcher.group();
            outString = outString.replace("#","");
            outString = outString.replace(",","");
            sortedTags.add(outString);
        }
        processedTags.addAll(sortedTags);

        return processedTags;
    }
like image 457
We are Borg Avatar asked Mar 01 '18 11:03

We are Borg


People also ask

Can you start and end a sentence with the same word?

Read our A sentence that begins and ends with the same word – such as “Nice to see you; to see you nice!”– is called an epanadiplosis, according to Haggard Hawks, one of Twitter’s best word-mavens. So I asked for 10 more. 1.

What is the use of each value in a password hash?

Each value identifies the algorithm used to produce the hash - if the implementation decides to use a different approach, it would know to validate a password using the first version, but could then replace the {enc:1} with {enc:2} or whatever, along with the new form of the hash.

How many bytes is a hash of a character?

The rest is base64 - and it's 24 characters ending in ==, which means the original value is 16 bytes. So it's a 16 byte (128 bit) hash of some kind, with a versioning prefix.


1 Answers

You may use a regex to match substrings from a # that is followed with a word char and up to the first # that is preceded with a word char, or match a # that is not preceded with a word char and then has any 1+ word chars.

#\b.*?\b#|\B#\w+

See the regex demo

You may precise it a bit if you want to exclude substrings like #_ s# by turning the first \b into (?=\p{L}) / (?=[a-zA-Z]) to require a letter.

See a Java demo:

List<String> results = new ArrayList<>();
String s = "Hello, #how are you# today. Hello, #how are you #today.";
Pattern pattern = Pattern.compile("#\\b.*?\\b#|\\B#\\w+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    results.add(matcher.group());
} 
System.out.println(results); 
// => [#how are you#, #how, #today]
like image 158
Wiktor Stribiżew Avatar answered Oct 06 '22 13:10

Wiktor Stribiżew