I have a 10GB CSV file which is essentially a huge square matrix. I am trying to write a function that can access a single cell of the matrix as efficiently as possible, ie matrix[12345,20000].
Given its size, it is obviously not possible to load the entire matrix into a 2D array, I need to somehow read the values direct from the file.
I have Googled around looking at file random access using FileStream.Seek, however unfortunately due to variable rounding each cell isn't a fixed width. It would not be possible for me to seek to a specific byte and know what cell I'm looking at by some sort of arithmetic.
I considered scanning the file and creating a lookup table for the index of the first byte of each row. That way, if I wanted to access matrix[12345,20000] I would seek to the start of row 12345 and then scan across the line, counting the commas until I reach the correct cell.
I am about to try this, but has anyone else got any better ideas? I'm sure I wouldn't be the first person to try and deal with a file like this.
Cheers
Edit: I should note that the file contains a very sparse matrix. If parsing the CSV file ends up being too slow, I would consider converting the file to a more appropriate, and easier to process, file format. What is the best way to store a sparse matrix?
I have used Lumenworks CSV reader for quite large CSV files, it may be worth a quick look to see how quickly it can parse your file.
Lumenworks CSV
First of all, how would you want to refer to a particular row? Is it the index of the row so that you have another table or something that will help you know which row you are interested? or is it by an id or something?
These ideas come to mind
Index-file would be the best you could do. I bet. Having unknown size of row, there is no way to skip directly to the line other than either scan the file or have an index.
The only question is how large your index is. If it is too large, you could make it smaller by indexing only every 5th (for example) line and scan in range of 5 lines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With