Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract Data from Word Document to an Excel SpreadSheet

Tags:

ms-word

excel

vba

I have a requirement to extract a value from a word document on a daily basis and write it to an excel workbook. I currently do this manually and it is border line regarding the most efficient method for me

  1. Using Excel file create a vba script and add any word document references.

2 Using the word navigate to the table “9. STOCKS...” (extracted example below – Appendix A) and read the Diesel (ltrs) daily usage highlighted in red.

3.Write this value to a spreadsheet cell.

  1. The date for this value is also key but it held in another part of the word document (Appendix B). Dates are also in the file name but we trust the internal value more than the word document name. With knowledge from points 3 and 4 extract the associated date to an adjacent spreadsheet cell.

The table is displayed below, because of the formatting I'm not able to send you the exact table but I will be able to send the values of it.

9.STOCKS (As of 00:01 hrs on Day of report issue). Stock Held Daily Usage Minimum Stock

Diesel (ltrs)
390436 15012 25000

Nitrogen (mm)
35 1 19 Champion 1033 (totes)
15 1 4 Nexguard (Boilers) 4
0.25 4 x 200 ltrs

Appendix B: Beatrice Period of Report: 00:01 – 24:00 10th August 2010

If you have any doubts regarding my question please get back to me, I appreciate your efforts and wanted to thanks in advance

like image 412
Srinivasa Avatar asked Aug 25 '10 15:08

Srinivasa


1 Answers

here's some code making use of late binding (declare objects rather than word.application etc). From Excel 2003, it

  1. opens a WORD document
  2. searches for string "minimum stock"
  3. moves the cursor some lines/words further
  4. expands/selects the WORD cursor
  5. pastes this WORD selection into EXCEL

steps 2-5 are repeated for "Period of report:" (note that the ":" is a word boundary, so we need to jump 8 words to the right to arrive at the date)

For WORD I copied the text from your Q just as is (no table, just plain text). If you use tables instead, you may need to play with the units of the various Move statements (e.g. for cells unit:=12); the strategy remains the same: find a constant text, move cursor to final destination, expand selection, create a word range and transfer.

Both items are placed into the current cell in Excel and its right neighbor.

Sub GrabUsage()
Dim FName As String, FD As FileDialog
Dim WApp As Object, WDoc As Object, WDR As Object
Dim ExR As Range

    Set ExR = Selection ' current location in Excel Sheet

    'let's select the WORD doc
    Set FD = Application.FileDialog(msoFileDialogOpen)
    FD.Show
    If FD.SelectedItems.Count <> 0 Then
        FName = FD.SelectedItems(1)
    Else
        Exit Sub
    End If

    ' open Word application and load doc
    Set WApp = CreateObject("Word.Application")
    ' WApp.Visible = True
    Set WDoc = WApp.Documents.Open(FName)

    ' go home and search
    WApp.Selection.HomeKey Unit:=6
    WApp.Selection.Find.ClearFormatting
    WApp.Selection.Find.Execute "Minimum Stock"

    ' move cursor from find to final data item
    WApp.Selection.MoveDown Unit:=5, Count:=1
    WApp.Selection.MoveRight Unit:=2, Count:=2

    ' the miracle happens here
    WApp.Selection.MoveRight Unit:=2, Count:=1, Extend:=1

    ' grab and put into excel        
    Set WDR = WApp.Selection
    ExR(1, 1) = WDR ' place at Excel cursor

    'repeat
    WApp.Selection.HomeKey Unit:=6
    WApp.Selection.Find.ClearFormatting
    WApp.Selection.Find.Execute "Period of Report:"
    WApp.Selection.MoveRight Unit:=2, Count:=8
    WApp.Selection.MoveRight Unit:=2, Count:=3, Extend:=1

    Set WDR = WApp.Selection
    ExR(1, 2) = WDR ' place in cell right of Excel cursor

    WDoc.Close
    WApp.Quit

End Sub

You can create a button and call that sub from there, or link GrabUsage() to a function key.

I commented out the WApp.Visible = True because in production you don't want WORD even to show up, but you will need it for debugging and playing with the cursor movements.

The disadvantage of late binding (and not using references to the Word library) is the hardcoding of units (6=story, 5=line, 2=word) instead of using Word enumerations, but I sometimes get OS crashes with early binding .... not very sexy but it seems to work.

The FileDialog object needs a reference to the MS Office Office Library. AFAIK this is standard in Excel 2003, but better to check than to crash.

And I didn't include code to check if the items are really found; I leave this to your creativity.

Hope that helps.

like image 166
MikeD Avatar answered Nov 07 '22 14:11

MikeD