Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android/Java Regex to remove extra zeros from sub-strings

I have the following string as input :

"2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101"

Final output will be like :

"2,3,-4,0,0,.03,2.01,.001,-.03,101"

i.e. all leading and trailing zeros will be removed and both positive/negative zeros will be simply zero.

We can achieve this by split the string first and using Regex for each part. But my string size is more than 10000.
How can we achieve this using Regex?

Edit:

Analysis of Answers:

I have tested all answers with String "0.00,-0.00,00.00,-00.00,40.00,-40.00,4.0,-4.0,4.01,-4.01,04.01,-04.01,004.04,-004.04,0004.040,-0004.040,101,.40,-.40,0.40,-0.40" and answer from Wiktor Stribiżew passed all the test cases .(see here : https://regex101.com/r/tS8hE3/9 ) Other answers were passed on most of the cases but not all.

like image 983
N Kaushik Avatar asked Jan 25 '16 05:01

N Kaushik


People also ask

How do you get rid of extra zeros in Java?

stripTrailingZeros() is an inbuilt method in Java that returns a BigDecimal which is numerically equal to this one but with any trailing zeros removed from the representation. So basically the function trims off the trailing zero from the BigDecimal value.

How do I remove leading zeros in regex?

Use the inbuilt replaceAll() method of the String class which accepts two parameters, a Regular Expression, and a Replacement String. To remove the leading zeros, pass a Regex as the first parameter and empty string as the second parameter. This method replaces the matched value with the given string.

How do I get rid of extra spaces in Java?

replaceAll (String regex, String replacement) We can use this method for many purposes. Using replaceAll() method we can replace each matching regular expression substring with the given replacement string. For example for removing all spaces, removing leading spaces, removing trailing spaces and so on.

How do you remove a character from a String in Java?

The idea is to use the deleteCharAt() method of StringBuilder class to remove first and the last character of a string. The deleteCharAt() method accepts a parameter as an index of the character you want to remove. Remove last character of a string using sb. deleteCharAt(str.


1 Answers

Updated test case answer

Use the following regex:

String rx = "-?0+\\.(0)+\\b|\\.0+\\b|\\b0+(?=\\.\\d*[1-9])|\\b0+(?=[1-9]\\d*\\.)|(\\.\\d*?)0+\\b";

And replace with $1$2. See another demo.

The regex matches several alternatives and captures some parts of the string to later re-insert during replacement:

  • -?0+\.(0)+\b - matching an optional - followed with one or more 0s followed with a . and then captures exactly one 0 but matching one or more occurrences (because the (...) is placed on the 0 and the + is applied to this group); the word boundary at the end requires a non-word character to appear after the last matched 0. In the replacement, we restore the 0 with $1 backreference. So, -00.00 or 00.00 will be replaced with 0.
  • | - or...
  • \.0+\b - a dot followed with one or more zeros before a , (since the string is comma-delimited).
  • | - or...
  • \b0+(?=\.\d*[1-9]) - a word boundary (start of string or a location after ,) followed with one or more 0s that are followed by . + zero or more digits followed by a non-0 digit (so we remove leading zeros in the integer part that only consists of zeros)
  • | - or...
  • \b0+(?=[1-9]\d*\.) - a word boundary followed by one or more zeros followed by a non-0 digit before a . (so, we remove all leading zeros from the integer part that is not equal to 0).
  • | - or...
  • (\.\d*?)0+\b - capturing a .+zero or more digits, but as few as possible, up to the first 0, and then just matching one or more zeros (up to the end of string or ,) (so, we get rid of trailing zeros in the decimal part)

Answer before the test cases update

I suggest a very simple and short regex that does what you need:

-0+\.(0)+\b|\.0+\b|\b0+(?=\.\d*[1-9])

Replace with $1.

See the regex demo. Short IDEONE demo:

String re = "-0+\\.(0)+\\b|\\.0+\\b|\\b0+(?=\\.\\d*[1-9])"; 
String str = "2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101,0.001,-0.03";
String expected = "2,3,-4,0,0,.03,2.01,.001,-.03,101,.001,-.03"; 
System.out.println(str.replaceAll(re, "$1").equals(expected)); // TRUE

Explanation:

  • -0+\.(0)+\b - a minus followed with one or more 0s (0+) followed with a literal dot (\.) followed with one or more zeros (and capturing just the last 0 matched with (0)+) followed with a word boundary (location before , in this context)
  • | - or...
  • \.0+\b - a literal dot (\.) followed with one or more zeros followed with a word boundary (location before , in this context)
  • | - or...
  • \b0+(?=\.\d*[1-9]) - a word boundary (location after , in this context) followed with one or more zeros that must be followed with a literal dot (\.), then zero or more digits and then a digit from 1 to 9 range (so that the decimal part is more than 0).
like image 149
Wiktor Stribiżew Avatar answered Sep 27 '22 21:09

Wiktor Stribiżew