Grepping progressively through large file

Question

I have several large data files (~100MB-1GB of text) and a sorted list of tens of thousands of timestamps that index data points of interest. The timestamp file looks like:

And the data file looks like:

Line of text
12345 0.234 0.123 2.321
More text
Some unimportant data
14509 0.987 0.543 3.600
More text
15467 0.678 0.345 4.431

The data in the second file is all in order of timestamp. I want to grep through the second file using the time stamps of the first, printing the timestamp and fourth data item in an output file. I've been using this:

grep -wf time.stamps data.file | awk '{print $1 "	" $4 }'  >> output.file

This is taking on the order of a day to complete for each data file. The problem is that this command searches though the entire data file for every line in time.stamps, but I only need the search to pick up from the last data point. Is there any way to speed up this process?

jaypal singh · Accepted Answer

You can do this entirely in awk …

awk 'NR==FNR{a[$1]++;next}($1 in a){print $1,$4}' timestampfile datafile

Grepping progressively through large file

Tags:

grep

shell

unix

user2548142

1 Answers

jaypal singh

Recent Activity

Donate For Us

Grepping progressively through large file

Tags:

grep

shell

unix

user2548142

1 Answers

jaypal singh

Related questions

Recent Activity

Donate For Us