Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove non digits?

Tags:

java

private String removeNonDigits(final String value) {         
   if(value == null || value.isEmpty()){
        return "";
   }
   return value.replaceAll("[^0-9]+", "");
}

Any better way to do this? Does StringUtils of Apache has a similar method?

like image 605
Milli Avatar asked Dec 01 '22 06:12

Milli


2 Answers

Just for fun I ran a benchmark:

import java.util.List;
import java.util.regex.Pattern;

import com.google.common.base.Joiner;
import com.google.common.base.Predicate;
import com.google.common.collect.Iterables;
import com.google.common.primitives.Chars;

public final class Main {
    private static final String INPUT = "0a1b2c3d4e";
    private static final int REPS = 10000000;

    public static volatile String out;

    public static void main(String[] args) {
        System.err.println(removeNonDigits1(INPUT));
        System.err.println(removeNonDigits2(INPUT));
        System.err.println(removeNonDigits3(INPUT));
        System.err.println(removeNonDigits4(INPUT));
        System.err.println(removeNonDigits5(INPUT));

        long t0 = System.currentTimeMillis();
        for (int i = 0; i < REPS; ++ i) {
            out = removeNonDigits1(INPUT);
        }
        long t1 = System.currentTimeMillis();
        for (int i = 0; i < REPS; ++ i) {
            out = removeNonDigits2(INPUT);
        }
        long t2 = System.currentTimeMillis();
        for (int i = 0; i < REPS; ++ i) {
            out = removeNonDigits3(INPUT);
        }
        long t3 = System.currentTimeMillis();
        for (int i = 0; i < REPS; ++ i) {
            out = removeNonDigits4(INPUT);
        }
        long t4 = System.currentTimeMillis();
        for (int i = 0; i < REPS; ++ i) {
            out = removeNonDigits5(INPUT);
        }
        long t5 = System.currentTimeMillis();
        System.err.printf("removeNonDigits1: %d\n", t1-t0);
        System.err.printf("removeNonDigits2: %d\n", t2-t1);
        System.err.printf("removeNonDigits3: %d\n", t3-t2);
        System.err.printf("removeNonDigits4: %d\n", t4-t3);
        System.err.printf("removeNonDigits5: %d\n", t5-t4);
    }

    private static final String PATTERN_SOURCE = "[^0-9]+";
    private static final Pattern PATTERN = Pattern.compile(PATTERN_SOURCE);

    public static String removeNonDigits1(String input) {
        return input.replaceAll(PATTERN_SOURCE, "");
    }

    public static String removeNonDigits2(String input) {
        return PATTERN.matcher(input).replaceAll("");
    }

    public static String removeNonDigits3(String input) {
        char[] arr = input.toCharArray();
        int j = 0;
        for (int i = 0; i < arr.length; ++ i) {
            if (Character.isDigit(arr[i])) {
                arr[j++] = arr[i];
            }
        }
        return new String(arr, 0, j);
    }

    public static String removeNonDigits4(String input) {
        StringBuilder result = new StringBuilder();
        for (int i = 0; i < input.length(); ++ i) {
            char c = input.charAt(i);
            if (Character.isDigit(c)) {
                result.append(c);
            }
        }
        return result.toString();
    }

    public static String removeNonDigits5(String input) {
        List<Character> charList = Chars.asList(input.toCharArray());
        Predicate<Character> isDigit =
            new Predicate<Character>() {
                public boolean apply(Character input) {
                    return Character.isDigit(input);
                }
            };
        Iterable<Character> filteredList =
            Iterables.filter(charList, isDigit);
        return Joiner.on("").join(filteredList);
    }
}

And got these results:

removeNonDigits1: 74656
removeNonDigits2: 52235
removeNonDigits3: 4468
removeNonDigits4: 5250
removeNonDigits5: 29610

The amusing part is that removeNonDigits5 (the Google Collections version) was supposed to be an example of a silly, overcomplicated and inefficent solution, yet it's twice as fast as the regex version.

Update: Pre-compiling the regex increases the speed, but not as much as one might expect.

Re-using the Matcher gives another slight speedup, but probably not worth sacrificing thread-safety for.

like image 70
finnw Avatar answered Dec 05 '22 19:12

finnw


Your method seems fine to me - what exactly is it you're looking for when you say "better"? Your method is clear and understandable in its implementation, and will have reasonably good performance.

In particular, unless your application consists of calling this method constantly in a tight loop, I don't think you'd gain anything noticeable from trying to make it more performant. Don't optimise prematurely; profile first and optimise the hotspots.

like image 40
Andrzej Doyle Avatar answered Dec 05 '22 18:12

Andrzej Doyle