I've never had close experiences with Java IO API before and I'm really frustrated now. I find it hard to believe how strange and complex it is and how hard it could be to do a simple task.
My task: I have 2 positions (starting byte, ending byte), pos1
and pos2
. I need to read lines between these two bytes (including the starting one, not including the ending one) and use them as UTF8 String objects.
For example, in most script languages it would be a very simple 1-2-3-liner like that (in Ruby, but it will be essentially the same for Python, Perl, etc):
f = File.open("file.txt").seek(pos1)
while f.pos < pos2 {
s = f.readline
# do something with "s" here
}
It quickly comes hell with Java IO APIs ;) In fact, I see two ways to read lines (ending with \n
) from regular local files:
getFilePointer()
and seek(long pos)
, but it's readLine() reads non-UTF8 strings (and even not byte arrays), but very strange strings with broken encoding, and it has no buffering (which probably means that every read*()
call would be translated into single undelying OS read()
=> fairly slow).readLine()
method, and it can even do some seeking with skip(long n)
, but it has no way to determine even number of bytes that has been already read, not mentioning the current position in a file.I've tried to use something like:
FileInputStream fis = new FileInputStream(fileName);
FileChannel fc = fis.getChannel();
BufferedReader br = new BufferedReader(
new InputStreamReader(
fis,
CHARSET_UTF8
)
);
... and then using fc.position()
to get current file reading position and fc.position(newPosition)
to set one, but it doesn't seem to work in my case: looks like it returns position of a buffer pre-filling done by BufferedReader, or something like that - these counters seem to be rounded up in 16K increments.
Do I really have to implement it all by myself, i.e. a file readering interface which would:
\n
")Is there a quicker way than implementing it all myself? Am I overseeing something?
While creating the instance of RandomAccessFile in java, we need to provide the mode to open the file. For example, to open the file for read only mode we have to use “r” and for read-write operations we have to use “rw”. Using file pointer, we can read or write data from random access file at any position.
readLine() method reads the next line of text from this file. This method successively reads bytes from the file, starting at the current file pointer, until it reaches a line terminator or the end of the file.
RandomAccessFile(File file, String mode) Creates a random access file stream to read from, and optionally to write to, the file specified by the File argument. RandomAccessFile(String name, String mode) Creates a random access file stream to read from, and optionally to write to, a file with the specified name.
File is an abstract representation of a file/directory which may or may not even exist. It doesn't consume any resources, so you can store them as much as you want. RandomAccessFile is for actual file access (reading, seeking, writing), so you don't need it here.
import org.apache.commons.io.input.BoundedInputStream
FileInputStream file = new FileInputStream(filename);
file.skip(pos1);
BufferedReader br = new BufferedReader(
new InputStreamReader(new BoundedInputStream(file,pos2-pos1))
);
If you didn't care about pos2
, then you woundn't need Apache Commons IO.
I wrote this code to read utf-8 using randomaccessfiles
//File: CyclicBuffer.java
public class CyclicBuffer {
private static final int size = 3;
private FileChannel channel;
private ByteBuffer buffer = ByteBuffer.allocate(size);
public CyclicBuffer(FileChannel channel) {
this.channel = channel;
}
private int read() throws IOException {
return channel.read(buffer);
}
/**
* Returns the byte read
*
* @return byte read -1 - end of file reached
* @throws IOException
*/
public byte get() throws IOException {
if (buffer.hasRemaining()) {
return buffer.get();
} else {
buffer.clear();
int eof = read();
if (eof == -1) {
return (byte) eof;
}
buffer.flip();
return buffer.get();
}
}
}
//File: UTFRandomFileLineReader.java
public class UTFRandomFileLineReader {
private final Charset charset = Charset.forName("utf-8");
private CyclicBuffer buffer;
private ByteBuffer temp = ByteBuffer.allocate(4096);
private boolean eof = false;
public UTFRandomFileLineReader(FileChannel channel) {
this.buffer = new CyclicBuffer(channel);
}
public String readLine() throws IOException {
if (eof) {
return null;
}
byte x = 0;
temp.clear();
while ((byte) -1 != (x = (buffer.get())) && x != '\n') {
if (temp.position() == temp.capacity()) {
temp = addCapacity(temp);
}
temp.put(x);
}
if (x == -1) {
eof = true;
}
temp.flip();
if (temp.hasRemaining()) {
return charset.decode(temp).toString();
} else {
return null;
}
}
private ByteBuffer addCapacity(ByteBuffer temp) {
ByteBuffer t = ByteBuffer.allocate(temp.capacity() + 1024);
temp.flip();
t.put(temp);
return t;
}
public static void main(String[] args) throws IOException {
RandomAccessFile file = new RandomAccessFile("/Users/sachins/utf8.txt",
"r");
UTFRandomFileLineReader reader = new UTFRandomFileLineReader(file
.getChannel());
int i = 1;
while (true) {
String s = reader.readLine();
if (s == null)
break;
System.out.println("\n line " + i++);
s = s + "\n";
for (byte b : s.getBytes(Charset.forName("utf-8"))) {
System.out.printf("%x", b);
}
System.out.printf("\n");
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With