Easiest scripting method to merge two text files - Ruby, Python, JavaScript, Java?

Question

I have two text files, one containing HTML and the other containing URL slugs:

FILE 1 (HTML):

<li><a href="/article/"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>
...

FILE 2 (URL SLUGS):

thomas-friedman-the-world-is-flat
michael-dagleish-scotland-in-wartime
dr-raymond-kinsella-progress-in-cancer-treatments
...

I need to merge them so that the slugs in FILE 2 are inserted into the HTML in FILE 1 like this:

OUTPUT:

<li><a href="/article/thomas-friedman-the-world-is-flat"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/michael-dagleish-scotland-in-wartime"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/dr-raymond-kinsella-progress-in-cancer-treatments"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>

What's the best approach and which language would be most appropriate to accomplish this task with a minimum of complexity?

Nakilon · Accepted Answer

You need zip-function, which is available in most languages. It's purpose is parallel processing of two or more arrays.
In Ruby it will be something like this:

f1 = File.readlines('file1.txt')
f2 = File.readlines('file2.txt')

File.open('file3.txt','w') do |output_file|

    f1.zip(f2) do |a,b|
        output_file.puts a.sub('/article/','/article/'+b)
    end

end

For zipping more, than two arrays you can do f1.zip(f2,f3,...) do |a,b,c,...|

Katriel · Answer

This will be easy in any language. Here it is in pseudo-Python; I've omitted the lxml bits because I don't have access to them and I can't quite remember the syntax. They're not difficult, though.

with open(...) as htmls, open(...) as slugs, open(...) as output:
    for html, slug in zip(htmls, slugs):
        root = lxml.etree.fromstring(html)
        # do some fiddling with lxml to get the name

        slug = slug.split("-")[(len(name.split()):]
        # add in the extra child in lxml

        output.write(root.tostring())

Interesting features:

This doesn't read in the entire file at once; it does it chunk by chunk (well, line-by-line but Python will buffer it). Useful if the files are huge, but probably irrelevant.
lxml may be overkill, depending on how rigid the format of the html strings is. If they're guaranteed to be the same and all well-formed, it might be easier for you to use simple string operations. On the other hand, lxml is pretty fast and offers a lot more flexibility.

Easiest scripting method to merge two text files - Ruby, Python, JavaScript, Java?

Tags:

java

python

javascript

scripting

ruby

fidlrz

2 Answers

Nakilon

Katriel

Recent Activity

Donate For Us

Easiest scripting method to merge two text files - Ruby, Python, JavaScript, Java?

Tags:

java

python

javascript

scripting

ruby

fidlrz

2 Answers

Nakilon

Katriel

Related questions

Recent Activity

Donate For Us