Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - Splitting text into array without obvious delimiter

I need to split each line of text into an array using a loop. The problem is that there's no obvious delimiter to use given the formatting of the text file (which I can't change):

Adam Rippon      New York, NY    77.58144.6163.6780.94
Brandon Mroz     Broadmoor, CO   70.57138.1266.8471.28
Stephen Carriere Boston, MA      64.42138.8368.2770.56
Grant Hochstein  New York, NY    64.62133.8867.4468.44
Keegan Messing   Alaska, AK      61.15136.3071.0266.28
Timothy Dolensky Atlanta, AL     61.76123.0861.3063.78
Max Aaron        Broadmoor, CO   86.95173.4979.4893.51
Jeremy Abbott    Detroit, MI     99.86174.4193.4280.99
Jason Brown      Skokie Value,IL 87.47182.6193.3489.27
Joshua Farris    Broadmoor, CO   78.37169.6987.1783.52
Richard Dornbush All Year, CA    92.04144.3465.8278.52
Douglas Razzano  Coyotes, AZ     75.18157.2580.6976.56
Ross Miner       Boston, MA      71.94152.8772.5380.34
Sean Rabbit      Glacier, CA     60.58122.7656.9066.86
Lukas Kaugars    Broadmoor, CO   64.57114.7550.4766.28
Philip Warren    All Year, CA    55.80113.2457.0258.22
Daniel Raad      Southwest FL    52.98108.0358.6151.42
Scott Dyer       Brooklyn, OH    55.78100.9744.3357.64
Robert PrzepioskiRochester, NY   47.00100.3449.2651.08

Ideally I would like each name to be in [0] (or first name in [0] last name in [1]), each location to be in [2] or also in two different indexes for city and state, and then each score to be in their own index. For each person there are four separate numbers. Like for example Adam Rippon's scores are 77.58, 144.61, 63.67, 80.94

I can't split by spaces because some of the cities have a space between their name (like New York would then be split into New and York in two different array elements while Broadmoor would be in one element). Can't split cities by commas because Southwest FL has no comma. I also can't split the numbers by decimal point because those numbers would be wrong. So is there an easy way to go about doing this? Like perhaps a way to split numbers by the amount of decimal places?

like image 521
sam Avatar asked Apr 25 '26 18:04

sam


2 Answers

It looks like there is a fixed size for each column. So in your case, column 1 is 17 characters long, the second column is 16 characters long and the last one is 21 characters long.

Now you can simply iterate through the lines and make use of the substring() method. Something like...

String firstColumn = line.substring(0, 17).trim();
String secondColumn = line.substring(17, 33).trim();
String thirdColumn = line.substring(33, line.length).trim();

To extract the numbers, we could use a regular expression that searches for all numbers with two decimal places.

Pattern pattern = Pattern.compile("(\\d+\\.[0-9]{2})");

Matcher matcher = pattern.matcher(thirdColumn);

while(matcher.find())
{
    System.out.println(matcher.group());
}

So in this case 47.00100.3449.2651.08 will output

47.00
100.34
49.26
51.08
like image 158
kevcodez Avatar answered Apr 28 '26 08:04

kevcodez


It looks like each column has a fixed size (number of characters). As you already said you cannot split by tabs or spaces because of the last line where there is no tab or space between name and city.

I propose to read one line and then split the String by line.substring(startIndex,endIndex). For example line.substring(0,18) for the name (if I counted correctly). Then you can split this name in first and lastname by using the space as delimiter.

like image 33
havogt Avatar answered Apr 28 '26 08:04

havogt



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!