I have three huge files, with just 2 columns, and I need both. I want to merge them into one file which I can then write to a SQLite database.
I used Python and got the job done, but it took >30 minutes and also hung my system for 10 of those. I was wondering if there is a faster way by using awk or any other unix-tool. A faster way within Python would be great too. Code written below:
'''We have tweets of three months in 3 different files.
Combine them to a single file '''
import sys, os
data1 = open(sys.argv[1], 'r')
data2 = open(sys.argv[2], 'r')
data3 = open(sys.argv[3], 'r')
data4 = open(sys.argv[4], 'w')
for line in data1:
data4.write(line)
data1.close()
for line in data2:
data4.write(line)
data2.close()
for line in data3:
data4.write(line)
data3.close()
data4.close()
The standard Unix way to merge files is cat
. It may not be much faster but it will be faster.
cat file1 file2 file3 > bigfile
Rather than make a temporary file, you may be able to cat
directly to sqlite
cat file1 file2 file3 | sqlite database
In python, you will probably get better performance if you copy the file in blocks rather than lines. Use file.read(65536)
to read 64k of data at a time, rather than iterating through the files with for
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With