Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiency of line by line file reading in Python

Right now I am writing some Python code to deal with massive twitter files. These files are so big that they can't fit into memory. To work with them, I basically have two choices.

  1. I could split the files into smaller files that can fit into memory.

  2. I could process the big file line by line so I never need to fit the entire file into memory at once. I would prefer the latter for ease of implementation.

However, I am wondering if it is faster to read in an entire file to memory and then manipulate it from there. It seems like it could be slow to constantly be reading a file line by line from disk. But then again, I do not fully understand how these processes work in Python. Does anyone know if line by line file reading will cause my code to be slower than if I read the entire file into memory and just manipulate it from there?

like image 513
andrew Avatar asked May 05 '12 09:05

andrew


People also ask

Which method is used to read file line by line in Python?

Python File readline() Method The readline() method returns one line from the file. You can also specified how many bytes from the line to return, by using the size parameter.

Can you read a Python file line by line?

Method 1: Read a File Line by Line using readlines() readlines() is used to read all the lines at a single go and then return them as each line a string element in a list. This function can be used for small files, as it reads the whole file content to the memory, then split it into separate lines.

How do I read a file line by line?

Java Read File line by line using BufferedReader We can use java. io. BufferedReader readLine() method to read file line by line to String. This method returns null when end of file is reached.

How do I read a line from a file in Python?

Use readlines() to Read the range of line from the File The readlines() method reads all lines from a file and stores it in a list. You can use an index number as a line number to extract a set of lines from it. This is the most straightforward way to read a specific line from a file in Python.


1 Answers

For really fast file reading, have a look at the mmap module. This will make the entire file appear as a big chunk of virtual memory, even if it's much larger than your available RAM. If your file is bigger than 3 or 4 gigabytes, then you'll want to be using a 64-bit OS (and 64-bit build of Python).

I've done this for files over 30 GB in size with good results.

like image 132
Greg Hewgill Avatar answered Nov 12 '22 13:11

Greg Hewgill