I have a large file (A.txt
) of 2 GB containing a list of strings ['Question','Q1','Q2','Q3','Ans1','Format','links',...]
.
Now I have another larger file(1TB) containing the above mentioned strings in 2nd position:
Output:
a, Question, b
The, quiz, is
This, Q1, Answer
Here, Ans1, is
King1, links, King2
programming,language,drupal,
.....
I want to retain the lines whose second position contain the strings in the list stored in file A.txt
. That is, I want to retain (store in another file) the below mentioned lines:
a, Question, b
This, Q1, Answer
Here, Ans1, is
King1, links, King2
I know how to do this when the length of the list in file (A.txt) is 100..using 'any'. But I am not getting how I should go about it when the length of the list in file (A.txt) is 2 GB.
Don't use a list; use a set instead.
Read the first file into a set:
with open('A.txt') as file_a:
words = {line.strip() for line in file_a}
0.5 GB of words isn't that much to store in a set.
Now you can test against words
in O(1) constant time:
if second_word in words:
# ....
Open the second file and process it line by line, perhaps using the csv
module if the lines words are comma-separated.
For a larger set of words, use a database instead; Python comes with the sqlite3
library:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('CREATE TABLE words (word UNIQUE)')
with open('A.txt') as file_a, conn:
cursor = conn.cursor()
for line in file_a:
cursor.execute('INSERT OR IGNORE INTO words VALUES (?)', (line.strip(),))
then test against that:
cursor = conn.cursor()
for line in second_file:
second_word = hand_waving
cursor.execute('SELECT 1 from words where word=?', (second_word,))
if cursor.fetchone():
# ....
Even though I use a :memory:
database here, SQLite is smart enough to store data in temporary files when you start filling up memory. The :memory:
connection is basically just a temporary, one-off database. You can also use a real filepath if you want to re-use the words database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With