I have a large string (an RSS Article to be more precise) and I want to get the word in a specific startIndex and endIndex. String provides the substring method, but only using ints as its parameters. My start and end indexes are of type long.
What is the best way to get the word from a String using start and end indexes of type long?
My first solution was to start trimming the String and get it down so I can use ints. Didn't like where it was going. Then I looked at Apache Commons Lang but didn't find anything. Any good solutions?
Thank you.
Update:
Just to provide a little more information.
I am using a tool called General Architecture for Text Engineering (GATE) which scans a String and returns a list of Annotations. An annotation holds a type of a word (Person, Location, etc) and the start and end indexes of that word .
For the RSS, I use ROME, which reads an RSS feed and contains the body of the article in a String.
The two parameters of substr() are start and length , while for substring() , they are start and end .
Public String substring(int startIndex, int endIndex):This method is used to return a new String object that includes a substring of the given string with their indexes lying between startIndex and endIndex. If the second argument is given, the substring begins with the element at the startIndex to endIndex -1.
The substring() method returns a substring from the given string. The substring begins with the character at the startIndex and extends to the character at index endIndex - 1 . If the endIndex is not passed, the substring begins with the character at the specified index and extends to the end of the string.
However, this does not go out of bounds because of the substring() "up to but not including" use of the end index. Incidentally, the length of the resulting substring can always be computed by subtracting (end - start) -- try it with the examples above.
There is no point doing this on a String because a String can hold at 2^31 - 1
characters. Internally the string's characters are held in a char[], and all of the API methods use int
as the type for lengths, positions and offsets.
int
length. int
length.int
length.In short, you are going to have to implement your own "long string" type that internally holds its characters in (for example) an array of arrays of characters.
(I tried a Google search but I couldn't spot an existing implementation of long strings that looked credible. I guess there's not a lot of call for monstrously large strings in Java ...)
By the way, if you anticipate that the strings are never going to be this large, you should just convert your long
offsets to int
. A cast would work, but you might want to check the range and throw an exception if you ever get an offset >= 2^31
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With