Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MS Word document file generate index

I use phpword to generate a MS Word document, is there any way help me to generate an index at the end of the generated file ?

one way I think may work, is to read the generated MS Word file and locate where each word is (e.g. page number in the MS Word file) and then regenerate the index in a separate MS Word file.

Is there any better method ?
Example of the required file :

A 
Animal 51,98 
Apple 11,54,99 

B
Basket 55  
...
..
etc
like image 640
shox Avatar asked Sep 29 '12 07:09

shox


2 Answers

Honestly, shox, I don't think you have a lot of good options here. I looked in to this a bit as it's interesting to me as well, but I could find nothing in the phpword docs or forums besides your posts there on how to make this possible. On the back side of that, you can try to uncompress the docx bundle (it's a zip file) and manipulate the XML files within directly. I have no idea how this will go...hypothetically, done right, it's no different than if it was done in Word manually. I experimented by using some filler text, and manually doing a "Mark All" for the index on one term in the document. What I can find is the following:

Adds as the first child in ~/[Content_Types].xml :

<Override PartName="/customXml/itemProps1.xml" ContentType="application/vnd.openxmlformats-officedocument.customXmlProperties+xml" />

Creates the folder ~/CustomXml -

In the ~/CustomXml folder, creates item1.xml :

<b:Sources SelectedStyle="\APA.XSL" StyleName="APA" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"></b:Sources>

In the ~/CustomXml folder, creates itemProps1.xml :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ds:datastoreItem ds:itemID="{3DC430FE-7F6E-49D7-9EFC-E4F37E42ABA0}" xmlns:ds="http://schemas.openxmlformats.org/officeDocument/2006/customXml">
  <ds:schemaRefs>
     <ds:schemaRef ds:uri="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"/>
  </ds:schemaRefs>
</ds:datastoreItem>

Creates the folder ~/CustomXml/_rels : In the ~/CustomXml/_rels folder, creates item1.xml.rels :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
  <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXmlProps" Target="itemProps1.xml"/>
</Relationships>

The ~/docProps folder changed, but it's irrelevant as word regenerates/updates these values on every open/close, and they don't have any bearing on the content of the file.

Basically, this leaves the ~/word folder. So, ~/word/styles.xml changes to add at the end a style for the index, as follows:

<w:style w:type="paragraph" w:styleId="Index1">
  <w:name w:val="index 1"/>
  <w:basedOn w:val="Normal"/>
  <w:next w:val="Normal"/>
  <w:autoRedefine/>
  <w:uiPriority w:val="99"/>
  <w:semiHidden/>
  <w:unhideWhenUsed/>
  <w:rsid w:val="00C52B35"/>
  <w:pPr>
    <w:spacing w:after="0" w:line="240" w:lineRule="auto"/>
    <w:ind w:left="220" w:hanging="220"/>
  </w:pPr>
</w:style>

The w:rsid elements in ~/word/settings.xml all changed, and honestly, these are scattered throughout and I'm not sure how they work/are calculated or if they matter too much.

In ~/word/_rels/document.xml.rels, the following was added as a child of the Relationships node:

<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml" Target="../customXml/item1.xml"/>

Last and perhaps most importantly, in ~/word/document.xml, each occurrence of my term for indexing ("sit") is followed by the following elements:

<w:r w:rsidR="00C52B35">
  <w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r w:rsidR="00C52B35">
  <w:instrText xml:space="preserve">XE "</w:instrText>
</w:r>
<w:r w:rsidR="00C52B35" w:rsidRPr="00C90937">
  <w:instrText>sit</w:instrText>
</w:r>
<w:r w:rsidR="00C52B35">
  <w:instrText xml:space="preserve">"</w:instrText>
</w:r>
<w:r w:rsidR="00C52B35">
  <w:fldChar w:fldCharType="end"/>
</w:r>

There is also a little bit at the end where I inserted the index:

<w:p w:rsidR="00C52B35" w:rsidRDefault="00C52B35" w:rsidP="00DE5AB4">
  <w:pPr>
    <w:rPr>
      <w:b/>
      <w:noProof/>
    </w:rPr>
    <w:sectPr w:rsidR="00C52B35" w:rsidSect="00C52B35">
      <w:pgSz w:w="12240" w:h="15840"/>
      <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
      <w:cols w:space="720"/>
      <w:docGrid w:linePitch="360"/>
    </w:sectPr>
  </w:pPr>
  <w:r>
    <w:rPr>
      <w:b/>
    </w:rPr>
    <w:fldChar w:fldCharType="begin"/>
  </w:r>
  <w:r>
    <w:rPr>
      <w:b/>
    </w:rPr>
    <w:instrText xml:space="preserve">INDEX \c "2" \z "1033"</w:instrText>
  </w:r>
  <w:r>
    <w:rPr>
      <w:b/>
    </w:rPr>
    <w:fldChar w:fldCharType="separate"/>
  </w:r>
</w:p>
<w:p w:rsidR="00C52B35" w:rsidRDefault="00C52B35">
  <w:pPr>
    <w:rPr>
      <w:noProof/>
    </w:rPr>
  </w:pPr>
  <w:r>
    <w:rPr>
      <w:noProof/>
    </w:rPr>
    <w:lastRenderedPageBreak/>
    <w:br w:type="page"/>
  </w:r>
</w:p>
<w:p w:rsidR="00C52B35" w:rsidRDefault="00C52B35">
  <w:pPr>
    <w:pStyle w:val="Index1"/>
    <w:tabs>
      <w:tab w:val="right" w:leader="dot" w:pos="4310"/>
    </w:tabs>
    <w:rPr>
      <w:noProof/>
    </w:rPr>
  </w:pPr>
  <w:r>
    <w:rPr>
      <w:noProof/>
    </w:rPr>
    <w:lastRenderedPageBreak/>
    <w:t>sit, 1, 2</w:t>
  </w:r>
</w:p>
<w:p w:rsidR="00C52B35" w:rsidRDefault="00C52B35" w:rsidP="00DE5AB4">
  <w:pPr>
    <w:rPr>
      <w:b/>
      <w:noProof/>
    </w:rPr>
    <w:sectPr w:rsidR="00C52B35" w:rsidSect="00C52B35">
      <w:type w:val="continuous"/>
      <w:pgSz w:w="12240" w:h="15840"/>
      <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
      <w:cols w:num="2" w:space="720"/>
      <w:docGrid w:linePitch="360"/>
    </w:sectPr>
  </w:pPr>
</w:p>
<w:p w:rsidR="00371DB1" w:rsidRPr="00371DB1" w:rsidRDefault="00C52B35" w:rsidP="00DE5AB4">
  <w:pPr>
    <w:rPr>
      <w:b/>
    </w:rPr>
  </w:pPr>
  <w:r>
    <w:rPr>
      <w:b/>
    </w:rPr>
    <w:lastRenderedPageBreak/>
    <w:fldChar w:fldCharType="end"/>
  </w:r>
</w:p>

Hopefully this helps. I know it's a complex solution, but it's the only thing I can find that will help you and actually accomplish this in an automated fashion. It beats the official, manual way of doing this by a little bit, and should keep the features of an index that you did do that way. I'd recommend if you seriously doing this, using WinMerge to view the differences between files (right click and use Compare Special > XML once you open two folders) and I wish you the best of luck. If it's a one-off though, I'd just bite the bullet and do it manually. It's probably faster and has fewer headaches.

like image 51
jimcavoli Avatar answered Oct 09 '22 23:10

jimcavoli


I supose you could read the headings of the document: Read this post, is not exactly what you want but maybe with modifications could be the correct way:

Automatically generate nested table of contents based on heading tags

like image 32
user1723670 Avatar answered Oct 09 '22 22:10

user1723670