Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String Manipulation: Spliting Delimitted Data

Tags:

java

string

I need to split some info from a asterisk delimitted data.

Data Format:

NAME*ADRESS LINE1*ADDRESS LINE2

Rules:

1. Name should be always present
2. Address Line 1 and 2 might not be
3. There should be always three asterisks.

Samples:

MR JONES A ORTEGA*ADDRESS 1*ADDRESS2*

Name: MR JONES A ORTEGA
Address Line1: ADDRESS 1
Address Line2: ADDRESS 2

A PAUL*ADDR1**
Name: A PAUL
Address Line1: ADDR1
Address Line2: Not Given

My algo is:

1. Iterate through the characters in the line
2. Store all chars in a temp variables until first * is found. Reject the data if no char is found before first occurence of asterisk. If some chars found, use it as the name.
3. Same as step 2 for finding address line 1 and 2 except that this won't reject the data if no char is found

My algo looks ugly. The code looks uglier. Spliting using //* doesn't work either since name can be replaced with address line 1 if the data was *Address 1*Address2. Any suggestion?

EDIT:

Try using the data excluding quotes "-MS DEBBIE GREEN*1036 PINEWOOD CRES**"

like image 620
Milli Szabo Avatar asked Nov 15 '22 09:11

Milli Szabo


1 Answers

You can use the String[] split(String regex, int limit) as follows:

    String[] tests = {
        "NAME*ADRESS LINE1*ADDRESS LINE2*",
        "NAME*ADRESS LINE1**",
        "NAME**ADDRESS LINE2*",
        "NAME***",
        "*ADDRESS LINE1*ADDRESS LINE2*",
        "*ADDRESS LINE1**",
        "**ADDRESS LINE2*",
        "***",
        "-MS DEBBIE GREEN*1036 PINEWOOD CRES**",
    };
    for (String test : tests) {
        test = test.substring(0, test.length() - 1);
        String[] parts = test.split("\\*", 3);
        System.out.printf(
            "%s%n  Name: %s%n  Address Line1: %s%n  Address Line2: %s%n%n",
            test, parts[0], parts[1], parts[2]
        );
    }

This prints (as seen on ideone.com):

NAME*ADRESS LINE1*ADDRESS LINE2*
  Name: NAME
  Address Line1: ADRESS LINE1
  Address Line2: ADDRESS LINE2

NAME*ADRESS LINE1**
  Name: NAME
  Address Line1: ADRESS LINE1
  Address Line2: 

NAME**ADDRESS LINE2*
  Name: NAME
  Address Line1: 
  Address Line2: ADDRESS LINE2

NAME***
  Name: NAME
  Address Line1: 
  Address Line2: 

*ADDRESS LINE1*ADDRESS LINE2*
  Name: 
  Address Line1: ADDRESS LINE1
  Address Line2: ADDRESS LINE2

*ADDRESS LINE1**
  Name: 
  Address Line1: ADDRESS LINE1
  Address Line2: 

**ADDRESS LINE2*
  Name: 
  Address Line1: 
  Address Line2: ADDRESS LINE2

***
  Name: 
  Address Line1: 
  Address Line2: 

-MS DEBBIE GREEN*1036 PINEWOOD CRES**
  Name: -MS DEBBIE GREEN
  Address Line1: 1036 PINEWOOD CRES
  Address Line2: 

The reason for the "\\*" is because split takes a regular expression, and * is a regex metacharacter, and since you want it to mean literally, it needs to be escaped with a \. Since \ itself is a Java string escape character, to get a \ in a string, you need to double it.

The reason for the limit of 3 is because you want the array to have 3 parts, including trailing empty strings. A limit-less split discards trailing empty strings by default.

The last * is discarded manually before the split is performed.

like image 128
polygenelubricants Avatar answered Dec 31 '22 23:12

polygenelubricants