I'm using a combination of rubyzip and nokogiri to edit a .docx file. I'm using rubyzip to unzip the .docx file and then using nokogiri to parse and change the body of the word/document.xml file but ever time I close rubyzip at the end it corrupts the file and I can't open it or repair it. I unzip the .docx file on desktop and check the word/document.xml file and the content is updated to what I changed it to but all the other files are messed up. Could someone help me with this issue? Here is my code:
require 'rubygems'
require 'zip/zip'
require 'nokogiri'
zip = Zip::ZipFile.open("test.docx")
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_stream)
wt = xml.root.xpath("//w:t", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}).first
wt.content = "New Text"
zip.get_output_stream("word/document.xml") {|f| f << xml.to_s}
zip.close
Edit DOCX file online Just upload a DOCX file and start working with it like you would with a common Office suite. The user-friendly DOCX Editor opens files quickly and provides standard text formatting features that you may need in your work.
Click Edit Document > Edit in Word for the web to make changes to a document. When you open a document from OneDrive, Word for the web displays it in Reading view. To make changes to your document, switch to Editing view, where you can add and delete content and do other things, such as: Add tables and pictures.
Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (CRuby) and xerces (JRuby).
I ran into the same corruption problem with rubyzip last night. I solved it by copying everything to a new zip file, replacing files as necessary.
Here's my working proof of concept:
#!/usr/bin/env ruby
require 'rubygems'
require 'zip/zip' # rubyzip gem
require 'nokogiri'
class WordXmlFile
def self.open(path, &block)
self.new(path, &block)
end
def initialize(path, &block)
@replace = {}
if block_given?
@zip = Zip::ZipFile.open(path)
yield(self)
@zip.close
else
@zip = Zip::ZipFile.open(path)
end
end
def merge(rec)
xml = @zip.read("word/document.xml")
doc = Nokogiri::XML(xml) {|x| x.noent}
(doc/"//w:fldSimple").each do |field|
if field.attributes['instr'].value =~ /MERGEFIELD (\S+)/
text_node = (field/".//w:t").first
if text_node
text_node.inner_html = rec[$1].to_s
else
puts "No text node for #{$1}"
end
end
end
@replace["word/document.xml"] = doc.serialize :save_with => 0
end
def save(path)
Zip::ZipFile.open(path, Zip::ZipFile::CREATE) do |out|
@zip.each do |entry|
out.get_output_stream(entry.name) do |o|
if @replace[entry.name]
o.write(@replace[entry.name])
else
o.write(@zip.read(entry.name))
end
end
end
end
@zip.close
end
end
if __FILE__ == $0
file = ARGV[0]
out_file = ARGV[1] || file.sub(/\.docx/, ' Merged.docx')
w = WordXmlFile.open(file)
w.force_settings
w.merge('First_Name' => 'Eric', 'Last_Name' => 'Mason')
w.save(out_file)
end
I stumbled accross the post and know nothing about ruby or nokogiri but ...
It looks like you are reziping the new content incorrectly. I don't know about rubyzip, but you need a way to tell it to update the entry word/document.xml and then resave/rezip the file.
It looks like you are just overwriting the entry with new data wich of course is going to be a different size and totally screw up the rest of the zip file.
I give an example for excel in this post Parse text file and create an excel report
which may be of use even though i am using a different zip library and VB (Im still doing exactly what you are trying to do, my code is about half way down)
here is the part that applies
Using z As ZipFile = ZipFile.Read(xlStream.BaseStream)
'Grab Sheet 1 out of the file parts and read it into a string.
Dim myEntry As ZipEntry = z("xl/worksheets/sheet1.xml")
Dim msSheet1 As New MemoryStream
myEntry.Extract(msSheet1)
msSheet1.Position = 0
Dim sr As New StreamReader(msSheet1)
Dim strXMLData As String = sr.ReadToEnd
'Grab the data in the empty sheet and swap out the data that I want
Dim str2 As XElement = CreateSheetData(tbl)
Dim strReplace As String = strXMLData.Replace("<sheetData/>", str2.ToString)
z.UpdateEntry("xl/worksheets/sheet1.xml", strReplace)
'This just rezips the file with the new data it doesnt save to disk
z.Save(fiRet.FullName)
End Using
According to the official Github documentation, you should Use write_buffer instead open
. There's also a code example at the link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With