Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good way to split strings here?

I have the following string:
A:B:1111;domain:80;a;b
The A is optional so B:1111;domain:80;a;b is also valid input.
The :80 is optional as well so B:1111;domain;a;b or :1111;domain;a;b are also valid input
What I want is to end up with a String[] that has:

s[0] = "A";  
s[1] = "B";  
s[2] = "1111";  
s[3] = "domain:80"  
s[4] = "a"  
s[5] = "b"  

I did this as follows:

List<String> tokens = new ArrayList<String>();  
String[] values = s.split(";");  
String[] actions = values[0].split(":");   

for(String a:actions){  
    tokens.add(a);  
}  
//Start from 1 to skip A:B:1111
for(int i = 1; i < values.length; i++){  
    tokens.add(values[i]);  
}  
String[] finalResult = tokens.toArray();

I was wondering is there a better way to do this? How else could I do this more efficiently?

like image 921
Jim Avatar asked May 16 '12 12:05

Jim


People also ask

How do I split a string into multiple parts?

As the name suggests, a Java String Split() method is used to decompose or split the invoking Java String into parts and return the Array. Each part or item of an Array is delimited by the delimiters(“”, “ ”, \\) or regular expression that we have passed. The return type of Split is an Array of type Strings.

How do I split a string without a separator?

To split a string without removing the delimiter: Use the str. split() method to split the string into a list.

How many ways can you split a string?

In javascript, we can split a string in 3 ways.


2 Answers

There are not many efficiency concerns here, all I see is linear.

Anyway, you could either use a regular expression or a manual tokenizer.

You can avoid the list. You know the length of values and actions, so you can do

String[] values = s.split(";");  
String[] actions = values[0].split(":");
String[] result = new String[actions.length + values.length - 1];
System.arraycopy(actions, 0, result, 0, actions.legnth);
System.arraycopy(values, 1, result, actions.length, values.length - 1);
return result;

It should be reasonably efficient, unless you insist on implementing split yourself.

Untested low-level approach (make sure to unit test and benchmark before use):

// Separator characters, as char, not string.
final static int s1 = ':';
final static int s2 = ';';
// Compute required size:
int components = 1;
for(int p = Math.min(s.indexOf(s1), s.indexOf(s2));
  p < s.length() && p > -1;
  p = s.indexOf(s2, p+1)) {
    components++;
}
String[] result = new String[components];
// Build result
int in=0, i=0, out=Math.min(s.indexOf(s1), s.indexOf(s2));
while(out < s.length() && out > -1) {
  result[i] = s.substring(in, out);
  i++;
  in = out + 1;
  out = s.indexOf(s2, in);
}
assert(i == result.length - 1);
result[i] = s.substring(in, s.length());
return result;

Note: this code is optimized in the crazy way of that it will consider a : only in the first component. Handling the last component is a bit tricky, as out will have the value -1.

I would usually not use this last approach, unless performance and memory is extremely crucial. Most likely there are still some bugs in it, and the code is fairly unreadable, in particulare compare to the one above.

like image 60
Has QUIT--Anony-Mousse Avatar answered Sep 28 '22 07:09

Has QUIT--Anony-Mousse


With some assumptions about acceptable characters, this regex provides validation as well as splitting into the groups you desire.

Pattern p = Pattern.compile("^((.+):)?(.+):(\\d+);(.+):(\\d+);(.+);(.+)$");
Matcher m = p.matcher("A:B:1111;domain:80;a;b");
if(m.matches())
{
    for(int i = 0; i <= m.groupCount(); i++)
        System.out.println(m.group(i));
}
m = p.matcher("B:1111;domain:80;a;b");
if(m.matches())
{
    for(int i = 0; i <= m.groupCount(); i++)
        System.out.println(m.group(i));
}

Gives:

A:B:1111;domain:80;a;b // ignore this
A: // ignore this
A // This is the optional A, check for null
B
1111
domain
80
a
b

And

B:1111;domain:80;a;b // ignore this
null // ignore this
null // This is the optional A, check for null
B
1111
domain
80
a
b
like image 45
Ina Avatar answered Sep 28 '22 07:09

Ina