Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove spaces between single character in string

I was trying to remove duplicate words from a string in scala.

I wrote a udf(code below) to remove duplicate words from string:

val de_duplicate: UserDefinedFunction = udf ((value: String) => {
if(value == "" | value == null){""}
else {value.split("\\s+").distinct.mkString(" ")}
})

The problem I'm facing with this is that it is also removing single character tokens from the string,

For example if the string was:

"test abc abc 123 foo bar f f f"

The output I'm getting is:

"test abc 123 foo bar f"

What I want to do so remove only repeating words and not single characters, One workaround I could think of was to replace the spaces between any single character tokens in the string so that the example input string would become:

"test abc abc 123 foo bar fff"  

which would solve my problem, I can't figure out the proper regex pattern but I believe this could be done using capture group or look-ahead. I looked at similar questions for other languages but couldn't figure out the regex pattern in scala.

Any help on this would be appreciated!

like image 774
Vaibhav Avatar asked Mar 04 '23 23:03

Vaibhav


1 Answers

If you want to remove spaces between single character in your input string, you can just use the following regex:

println("test abc abc 123 foo bar f f f".replaceAll("(?<= \\w|^\\w|^) (?=\\w |\\w$|$)", ""));

Output:

test abc abc 123 foo bar fff

Demo: https://regex101.com/r/tEKkeP/1

Explanations:

The regex: (?<= \w|^\w|^) (?=\w |\w$|$) will match spaces that are surrounded by one word character (with eventually a space before after it, or the beginning/end of line anchors) via positive lookahead/lookbehind closes.

More inputs:

test abc abc 123 foo bar f f f
f boo
 f boo
boo f
boo f f
too f 

Associated outputs:

test abc abc 123 foo bar fff
f boo
f boo
boo f
boo ff
too f
like image 195
Allan Avatar answered Mar 11 '23 01:03

Allan