Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: String.substring() with long type parameters

I have a large string (an RSS Article to be more precise) and I want to get the word in a specific startIndex and endIndex. String provides the substring method, but only using ints as its parameters. My start and end indexes are of type long.

What is the best way to get the word from a String using start and end indexes of type long?

My first solution was to start trimming the String and get it down so I can use ints. Didn't like where it was going. Then I looked at Apache Commons Lang but didn't find anything. Any good solutions?

Thank you.


Update:

Just to provide a little more information.

I am using a tool called General Architecture for Text Engineering (GATE) which scans a String and returns a list of Annotations. An annotation holds a type of a word (Person, Location, etc) and the start and end indexes of that word .

For the RSS, I use ROME, which reads an RSS feed and contains the body of the article in a String.

like image 793
pek Avatar asked Sep 23 '10 11:09

pek


People also ask

How many parameters can you give the substring method?

The two parameters of substr() are start and length , while for substring() , they are start and end .

How does substring () inside string works?

Public String substring(int startIndex, int endIndex):This method is used to return a new String object that includes a substring of the given string with their indexes lying between startIndex and endIndex. If the second argument is given, the substring begins with the element at the startIndex to endIndex -1.

What does .substring do in Java?

The substring() method returns a substring from the given string. The substring begins with the character at the startIndex and extends to the character at index endIndex - 1 . If the endIndex is not passed, the substring begins with the character at the specified index and extends to the end of the string.

Can substring go out of bounds Java?

However, this does not go out of bounds because of the substring() "up to but not including" use of the end index. Incidentally, the length of the resulting substring can always be computed by subtracting (end - start) -- try it with the examples above.


1 Answers

There is no point doing this on a String because a String can hold at 2^31 - 1 characters. Internally the string's characters are held in a char[], and all of the API methods use int as the type for lengths, positions and offsets.

  • The same restriction applied to StringBuffer or StringBuilder; i.e. an int length.
  • A StringReader is backed by a String, so that won't help.
  • Both CharBuffer and ByteBuffer have the same restriction; i.e. an int length.
  • A bare array of a primitive type is limited to an int length.

In short, you are going to have to implement your own "long string" type that internally holds its characters in (for example) an array of arrays of characters.

(I tried a Google search but I couldn't spot an existing implementation of long strings that looked credible. I guess there's not a lot of call for monstrously large strings in Java ...)

By the way, if you anticipate that the strings are never going to be this large, you should just convert your long offsets to int. A cast would work, but you might want to check the range and throw an exception if you ever get an offset >= 2^31.

like image 127
Stephen C Avatar answered Oct 08 '22 21:10

Stephen C