Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text processing with two files

Tags:

python

text

awk

I have two text files in the following format:

The first is this on every line:

Key1:Value1

The second is this:

Key2:Value2

Is there a way I can replace Value1 in file1 by the Value2 obtained from using it as a key in file2?

For example:

file1:

foo:hello
bar:world

file2:

hello:adam
bar:eve

I would like to get:

foo:adam
bar:eve

There isn't necessarily a match between the two files on every line. Can this be done neatly in awk or something, or should I do it naively in Python?

like image 726
Ivy Avatar asked May 05 '12 08:05

Ivy


People also ask

How do I convert multiple text files to single text?

Open the two files you want to merge. Select all text (Command+A/Ctrl+A) from one document, then paste it into the new document (Command+V/Ctrl+V). Repeat steps for the second document. This will finish combining the text of both documents into one.

Can I open 2 files at the same time Python?

Python provides the ability to open as well as work with multiple files at the same time. Different files can be opened in different modes, to simulate simultaneous writing or reading from these files. An arbitrary number of files can be opened with the open() method supported in Python 2.7 version or greater.


1 Answers

Create two dictionaries, one for each file. For example:

file1 = {}
for line in open('file1', 'r'):
    k, v = line.strip().split(':')
    file1[k] = v

Or if you prefer a one-liner:

file1 = dict(l.strip().split(':') for l in open('file1', 'r'))

Then you could do something like:

result = {}
for key, value in file1.iteritems():
    if value in file2:
        result[key] = file2[value]

Another way is you could generate the key-value pairs in reverse for file1 and use sets. For example, if your file1 contains foo:bar, your file1 dict is {bar: foo}.

for key in set(file1) & set(file2):
    result[file1[key]] = file2[key]

Basically, you can quickly find common elements using set intersection, so those elements are guaranteed to be in file2 and you don't waste time checking for their existence.

Edit: As pointed out by @pepr You can use collections.OrderedDict for the first method if order is important to you.

like image 114
spinlok Avatar answered Sep 27 '22 21:09

spinlok