Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to remove multiple occurences of a character in a string in java

Tags:

java

string

regex

I have a string like foo..txt and I want to convert it to foo.txt The occurence of '.' may be more than 2 also. What is the best way to accomplish this?

edit : The '.' may not occur just together. The occurences may be as below too

foo.bar.txt = foo bar.txt
foo..bar.foo.txt = foo bar.txt

like image 935
Vaishak Suresh Avatar asked May 13 '10 09:05

Vaishak Suresh


3 Answers

With replaceAll()! Like this:

string = string.replaceAll("\\.{2,}", ".")

Note that we had to escape the period, since it's a special character in regular expressions (and also escape the backslash, for Java's sake). Also note the {2,}, which means "match if it occurs two or more times".

like image 64
Etaoin Avatar answered Nov 11 '22 19:11

Etaoin


I believe what you want is to replace all periods in the file name part with spaces, but keep the extension, right?

If so, something like this would be appropriate:

    String[] tests = {
        "foo.bar.txt",       // [foo bar.txt]
        "foo...bar.foo.txt", // [foo bar foo.txt]
        "........",          // [.]
        "...x...dat",        // [x.dat]
        "foo..txt",          // [foo.txt]
        "mmm....yummy...txt" // [mmm yummy.txt]
    };
    for (String test : tests) {
        int k = test.lastIndexOf('.');          
        String s = test.substring(0, k).replaceAll("\\.+", " ").trim()
           + test.substring(k);
        System.out.println("[" + s + "]");
    }

Essentially the way this works is:

  • First, find the lastIndexOf('.') in our string
    • Say this index is k, then we have logically separated our string into:
      • substring(0, k), the prefix part
      • substring(k), the suffix (file extension) part
  • Then we use regex on the prefix part to replaceAll matches of \.+ with " "
    • That is, a literal dot \., repeated one or more times +
    • We also trim() this string to remove leading and trailing spaces
  • The result we want is the transformed prefix concatenated with the original suffix

Clarifications

  • The reason why the pattern is \.+ instead of .+ is because the dot . is a regex metacharacter, but in this case we really mean a literal period, so it needs to be escaped as \.
  • The reason why this pattern as a Java string literal is "\\.+" is because \ is itself a Java string literal escape character. For example, the string literal "\t" contains the tab character. Analogously, the string literal "\\" contains the backslash character; it has a length() of one.

References

  • regular-expressions.info/The Dot Matches (Almost) Any Character and Repetition with Star and Plus
  • String API: lastIndexOf and trim()
  • JLS 3.10.6 Escape Sequences for Character and String Literals
like image 21
polygenelubricants Avatar answered Nov 11 '22 21:11

polygenelubricants


You've made me read manuals :) I solved more general problem: how to replace any 2+ same characters one after another with only 1 same character:

String str = "assddffffadfdd..o";
System.out.println (str.replaceAll("(.)\\1+", "$1"));

Output:

asdfadfd.o

If you need a solution only for the case "filename....ext" then I'd prefer something simpler like in Etaoin's answer because it probably works faster (but not fact). My solution simplified for this concrete case looks like this:

str.replaceAll("(\\.)\\1+", "$1")
like image 4
Roman Avatar answered Nov 11 '22 21:11

Roman