I am trying to create the word document using the HWPFDocument . I am able to create the doc with some feature but not able to get few things. My problem is simple but i am not able to figure out few things . I want to convert this simple HTML into created worddoc :
<div xmlns="http://www.w3.org/1999/xhtml" class="formatted_content">
<strong>cloudHQ.tester.4</strong> –
this is the bold text
<br/>
this is italic text
<br/>
<ul>
<li>bullets 1</li>
<li>bullets 2</li>
<li>bullets 3</li>
</ul>
<br/>
<ol>
<li>Number1</li>
<li>Number2</li>
<li>Number3</li>
</ol>
<br/>
<pre>this is simple quote</pre>
<br>
this is simple quote
</div>
Here I am able to convert the bold and italic text . But not able to figure-out that how to create the
1) <ul><li>....
2) <ol><li>...
3) break <br>
4) <pre>
tags into the WordDoc.
Is there is any example to do this , then please let me know I really appreciate the efforts , thanks in advance.
Edited :
included library :
include_class "org.apache.poi.poifs.filesystem.POIFSFileSystem"
include_class "org.apache.poi.hwpf.usermodel.ParagraphProperties"
include_class "org.apache.poi.hwpf.usermodel.CharacterRun"
include_class "org.apache.poi.hwpf.usermodel.CharacterProperties"
And this is main code to convert the html to doc like :
def convert_from_html_to_doc(html_file_name, comment_files)
puts("Script start.....")
puts("Parsing document comments start.....NEW")
default_file = "misc/poi_experiment/empty.doc"
fs = JavaPoi::POIFSFileSystem.new(JavaPoi::FileInputStream.new(default_file))
# Horrible word Document Format
hwpfDocument = JavaPoi::HWPFDocument.new(fs)
# range is used for getting the range of the document except header and footer
range = hwpfDocument.getRange()
par1 = range.insertAfter(JavaPoi::ParagraphProperties.new(), 0)
par1.setSpacingAfter(200);
puts("Adding given html content to doc.")
main_html = Nokogiri::HTML(File.read(html_file_name))
characterRun = par1.insertAfter(main_html.text)
# setting the font size
characterRun.setFontSize(2 * 12)
puts("Start procees on comment..... total : #{comment_files.size}")
comment_files.each do |cf|
file_path = "misc/poi_experiment/#{cf}"
puts("The comment file path : #{file_path}")
html = Nokogiri::HTML(File.read(file_path)).css('html')
puts( html )
par = characterRun.insertAfter(JavaPoi::ParagraphProperties.new(), 0)
par.setSpacingAfter(200);
#text = "<b><u>this is bold and underlined text</u></b>"
text = html.to_s.scan(/\D\d*/)
index = 0
currentCharacterRun , currentCharacterStyleList = [], []
character_arr = text.to_s.scan(/\D\d*/)
character_or_tag, index = get_next_character_or_tag(character_arr, index)
while !character_or_tag.nil?
if character_or_tag.is_char?
currentCharacterRun << character_or_tag.get_char
end
if character_or_tag.is_start_tag?
currentCharacterRunText = currentCharacterRun.join
if currentCharacterRunText != ""
characterproperties = JavaPoi::CharacterProperties.new
characterproperties = emit_to_document_and_apply_style(characterproperties, currentCharacterStyleList)
characterRun = par.insertAfter(currentCharacterRunText,characterproperties)
currentCharacterRun = []
end
currentCharacterStyleList << character_or_tag.get_tag
end
if character_or_tag.is_end_tag?
currentCharacterRunText = currentCharacterRun.join
if currentCharacterRunText != ""
characterproperties = JavaPoi::CharacterProperties.new
characterproperties = emit_to_document_and_apply_style(characterproperties, currentCharacterStyleList)
characterRun = par.insertAfter(currentCharacterRunText,characterproperties)
currentCharacterRun = []
end
currentCharacterStyleList.reject! { |x| x == character_or_tag.get_tag.gsub("/","") }
end
character_or_tag, index = get_next_character_or_tag(character_arr, index)
end
end
hwpfDocument.write(JavaPoi::FileOutputStream.new("#{html_file_name}.doc", true))
end
Hope this will help to understand you .
After a lot of try I thought to move on jod converter
This section of the poi javadocs will likely be useful to you. For example, to create a list, I think you want to use
http://poi.apache.org/apidocs/org/apache/poi/hwpf/usermodel/HWPFList.html
This class is used to create a list in a Word document. It is used in conjunction with registerList in HWPFDocument. In Word, lists are not ranged entities, meaning you can't actually add one to the document. Lists only act as properties for list entries. Once you register a list, you can add list entries to a document that are a part of the list. The only benefit of this that I see, is that you can add a list entry anywhere in the document and continue numbering from the previous list.
So in java you would do this:
new HWPFList(boolean numbered, StyleSheet styleSheet)
I'm not a ruby expert so I'll leave the translation to JRuby to you.
That gives you #1 and #2 in your list, and #3 and #4 are styled versions of Paragraph I think.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With