Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easiest scripting method to merge two text files - Ruby, Python, JavaScript, Java?

I have two text files, one containing HTML and the other containing URL slugs:

FILE 1 (HTML):

<li><a href="/article/"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>
...

FILE 2 (URL SLUGS):

thomas-friedman-the-world-is-flat
michael-dagleish-scotland-in-wartime
dr-raymond-kinsella-progress-in-cancer-treatments
...

I need to merge them so that the slugs in FILE 2 are inserted into the HTML in FILE 1 like this:

OUTPUT:

<li><a href="/article/thomas-friedman-the-world-is-flat"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li>
<li><a href="/article/michael-dagleish-scotland-in-wartime"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li>
<li><a href="/article/dr-raymond-kinsella-progress-in-cancer-treatments"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>

What's the best approach and which language would be most appropriate to accomplish this task with a minimum of complexity?

like image 963
fidlrz Avatar asked Jun 27 '26 07:06

fidlrz


2 Answers

You need zip-function, which is available in most languages. It's purpose is parallel processing of two or more arrays.
In Ruby it will be something like this:

f1 = File.readlines('file1.txt')
f2 = File.readlines('file2.txt')

File.open('file3.txt','w') do |output_file|

    f1.zip(f2) do |a,b|
        output_file.puts a.sub('/article/','/article/'+b)
    end

end

For zipping more, than two arrays you can do f1.zip(f2,f3,...) do |a,b,c,...|

like image 196
Nakilon Avatar answered Jun 28 '26 21:06

Nakilon


This will be easy in any language. Here it is in pseudo-Python; I've omitted the lxml bits because I don't have access to them and I can't quite remember the syntax. They're not difficult, though.

with open(...) as htmls, open(...) as slugs, open(...) as output:
    for html, slug in zip(htmls, slugs):
        root = lxml.etree.fromstring(html)
        # do some fiddling with lxml to get the name

        slug = slug.split("-")[(len(name.split()):]
        # add in the extra child in lxml

        output.write(root.tostring())

Interesting features:

  • This doesn't read in the entire file at once; it does it chunk by chunk (well, line-by-line but Python will buffer it). Useful if the files are huge, but probably irrelevant.

  • lxml may be overkill, depending on how rigid the format of the html strings is. If they're guaranteed to be the same and all well-formed, it might be easier for you to use simple string operations. On the other hand, lxml is pretty fast and offers a lot more flexibility.

like image 32
Katriel Avatar answered Jun 28 '26 21:06

Katriel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!