Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any way to display line breaks , order list with using HWPFDocument

I am trying to create the word document using the HWPFDocument . I am able to create the doc with some feature but not able to get few things. My problem is simple but i am not able to figure out few things . I want to convert this simple HTML into created worddoc :

<div xmlns="http://www.w3.org/1999/xhtml" class="formatted_content">
        <strong>cloudHQ.tester.4</strong> –
      this is the bold text
      <br/>
      this is italic text
      <br/>
      <ul>
      <li>bullets 1</li>
      <li>bullets 2</li>
      <li>bullets 3</li>
      </ul>
      <br/>
      <ol>
                <li>Number1</li>
                <li>Number2</li>
                <li>Number3</li>
      </ol>
      <br/>
      <pre>this is simple quote</pre>
      <br>
      this is simple quote
</div>

Here I am able to convert the bold and italic text . But not able to figure-out that how to create the

1) <ul><li>....
2) <ol><li>...
3) break <br>
4) <pre> 

tags into the WordDoc.

Is there is any example to do this , then please let me know I really appreciate the efforts , thanks in advance.

Edited :

included library :

  include_class "org.apache.poi.poifs.filesystem.POIFSFileSystem"
  include_class "org.apache.poi.hwpf.usermodel.ParagraphProperties"
  include_class "org.apache.poi.hwpf.usermodel.CharacterRun"
  include_class "org.apache.poi.hwpf.usermodel.CharacterProperties"

And this is main code to convert the html to doc like :

 def convert_from_html_to_doc(html_file_name, comment_files)

    puts("Script start.....")
    puts("Parsing document comments start.....NEW")

    default_file = "misc/poi_experiment/empty.doc"
    fs = JavaPoi::POIFSFileSystem.new(JavaPoi::FileInputStream.new(default_file))

    # Horrible word Document Format
    hwpfDocument = JavaPoi::HWPFDocument.new(fs)

    # range is used for getting the range of the document except header and footer
    range = hwpfDocument.getRange()

    par1 = range.insertAfter(JavaPoi::ParagraphProperties.new(), 0)
    par1.setSpacingAfter(200);

    puts("Adding given html content to doc.")
    main_html = Nokogiri::HTML(File.read(html_file_name))
    characterRun = par1.insertAfter(main_html.text)
    # setting the font size
    characterRun.setFontSize(2 * 12)

    puts("Start procees on comment..... total : #{comment_files.size}")
    comment_files.each do |cf|

      file_path = "misc/poi_experiment/#{cf}"
      puts("The comment file path : #{file_path}")

      html = Nokogiri::HTML(File.read(file_path)).css('html')
      puts( html )

      par = characterRun.insertAfter(JavaPoi::ParagraphProperties.new(), 0)
      par.setSpacingAfter(200);

      #text = "<b><u>this is bold and underlined text</u></b>"
      text = html.to_s.scan(/\D\d*/)
      index = 0
      currentCharacterRun , currentCharacterStyleList = [], []
      character_arr = text.to_s.scan(/\D\d*/)

      character_or_tag, index = get_next_character_or_tag(character_arr, index)

      while !character_or_tag.nil?
       if character_or_tag.is_char?
        currentCharacterRun << character_or_tag.get_char
       end
       if character_or_tag.is_start_tag?
        currentCharacterRunText = currentCharacterRun.join
        if currentCharacterRunText != ""
          characterproperties = JavaPoi::CharacterProperties.new
          characterproperties = emit_to_document_and_apply_style(characterproperties, currentCharacterStyleList)
          characterRun = par.insertAfter(currentCharacterRunText,characterproperties)
          currentCharacterRun = []
        end
        currentCharacterStyleList << character_or_tag.get_tag
       end
       if character_or_tag.is_end_tag?
        currentCharacterRunText = currentCharacterRun.join
        if currentCharacterRunText != ""
          characterproperties = JavaPoi::CharacterProperties.new
          characterproperties = emit_to_document_and_apply_style(characterproperties, currentCharacterStyleList)
          characterRun = par.insertAfter(currentCharacterRunText,characterproperties)
          currentCharacterRun = []
        end
        currentCharacterStyleList.reject! { |x| x == character_or_tag.get_tag.gsub("/","") }
       end

       character_or_tag, index = get_next_character_or_tag(character_arr, index)
      end
    end

    hwpfDocument.write(JavaPoi::FileOutputStream.new("#{html_file_name}.doc", true))
  end

Hope this will help to understand you .

like image 987
Vik Avatar asked May 28 '12 04:05

Vik


2 Answers

After a lot of try I thought to move on jod converter

like image 105
Vik Avatar answered Nov 02 '22 07:11

Vik


This section of the poi javadocs will likely be useful to you. For example, to create a list, I think you want to use

http://poi.apache.org/apidocs/org/apache/poi/hwpf/usermodel/HWPFList.html

This class is used to create a list in a Word document. It is used in conjunction with registerList in HWPFDocument. In Word, lists are not ranged entities, meaning you can't actually add one to the document. Lists only act as properties for list entries. Once you register a list, you can add list entries to a document that are a part of the list. The only benefit of this that I see, is that you can add a list entry anywhere in the document and continue numbering from the previous list.

So in java you would do this:

new HWPFList(boolean numbered, StyleSheet styleSheet) 

I'm not a ruby expert so I'll leave the translation to JRuby to you.

That gives you #1 and #2 in your list, and #3 and #4 are styled versions of Paragraph I think.

like image 31
Gus Avatar answered Nov 02 '22 07:11

Gus