Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data extraction with Excel

Tags:

excel

vba

etl

I monthly receive 100+ Excel spreadsheets from which I take a fixed range and paste in other spreadsheets to make a report.

I'm trying to write a VBA script to iterate my Excel files and copy the range in one spreadsheet, but I haven't been able to do it.

Is there an easy way to do this?

like image 397
Rodrigo Avatar asked Jul 24 '09 22:07

Rodrigo


People also ask

Is Excel a data extraction tool?

Excel is the most basic tool for the management of the screening and data extraction stages of the systematic review process. Customized workbooks and spreadsheets can be designed for the review process.


3 Answers

Here's some VBA code that demonstrates iterating over a bunch of Excel files in a directory and opening each one:

Dim sourcePath As String
Dim curFile As String
Dim curWB As Excel.Workbook
Dim destWB As Excel.Workbook

Set destWB = ActiveWorkbook
sourcePath = "C:\files"

curFile = Dir(sourcePath & "\*.xls")
While curFile <> ""
    Set curWB = Workbooks.Open(sourcePath & "\" & curFile)

    curWB.Close
    curFile = Dir()
Wend 

Hopefully that'll be a good enough starting point for you to work your existing macro code.

like image 121
Mark Biek Avatar answered Sep 30 '22 15:09

Mark Biek


I wrote this years ago, but maybe it will help you out. I added the extension for the latest version of Excel (xlsx). Seems to work.

Sub MergeExcelDocs()
    Dim lastRow As Integer
    Dim docPath As String
    Dim baseCell As Excel.range
    Dim sysObj As Variant, folderObj As Variant, fileObj As Variant
    Application.ScreenUpdating = False
    docPath = Application.GetOpenFilename(FileFilter:="Text Files (*.txt),*.txt,Excel Files (*.xls),*.xls,Excel 2007 Files (*.xlsx),*.xlsx", FilterIndex:=2, Title:="Choose any file")
    Workbooks.Add
    Set baseCell = range("A1")
    Set sysObj = CreateObject("scripting.filesystemobject")
    Set fileObj = sysObj.getFile(docPath)
    Set folderObj = fileObj.ParentFolder
    For Each fileObj In folderObj.Files
        Workbooks.Open Filename:=fileObj.path
        range(range("A1"), ActiveCell.SpecialCells(xlLastCell)).Copy
        lastRow = baseCell.SpecialCells(xlLastCell).row
        baseCell.Offset(lastRow, 0).PasteSpecial (xlPasteValues)
        baseCell.Copy
        ActiveWindow.Close SaveChanges:=False
    Next
End Sub

EDIT:

I should mention how it works. When you start the macro, it brings up an Open File dialog. Double-click the first file in the list (or any file for that matter). It will create a new workbook then loop through all the files in the folder. For each file, it copies all the content from the first worksheet and pastes it at the end of the new workbook. That's pretty much all there is to it.

like image 34
devuxer Avatar answered Sep 30 '22 15:09

devuxer


Another solution is to have your roll-up spreadsheet access the other spreadsheets by filename and grab the data itself.

To do that, you'll need to have all of the spreadsheets open at the same time so it can update the links, but that's still probably faster than opening and copying/pasting one at a time, even with a macro. Every spreadsheet will need to have a unique filename.

If the names of the spreadsheets aren't known until you receive them, or they change regularly, create a column in your roll-up table to store the filename of the sheets, then build the address you need using string manipulation and get the data using INDIRECT().

Example to grab one cell of data from one particular file:

=INDIRECT("'[C:\path\workbook.xls]MyWorksheet'!$A$2")

Rinse and repeat the above for each cell of each spreadsheet you want to get.

You should be clever about how to get the string to pass to INDIRECT(). Build it as a formula so you can use literally the same formula for every cell you need to retrieve.

Example:

= INDIRECT("'[" & $A2 & "]MyWorksheet'!$" & ADDRESS(3, COL()))

The formula above will go to the spreadsheet whose filename is in $A2 (note the lack of $ before "2" so you can paste the same formula to other rows for other files), and get the value of the cell on the MyWorksheet sheet on row three and the current column (so, if this is in B2 on your roll-up, it gets B3 from the other file).

Adjust the ADDRESS function to add offsets to the row and column needed.

The advantage of the solution above is that the same formula can be copied and pasted across the rows and columns you need to populate, and Excel will adjust the $A2 and COL() as needed. Very maintainable.

Edit once I had a similar situation, and I couldn't load all of the spreadsheets at once (more than 200). I think I ended up writing the VBA so it did not actually open and read the Excel files. Instead, I had it loop through the filenames, open an ODBC connection to each, and use ADO to read the values I needed from a prescribed named range (which appears as a "table" in ODBC--the worksheets also appear as "tables" but there are rules about allowed names). This was much faster than opening and closing Excel files, and had the added advantage of not crashing Excel.

like image 29
richardtallent Avatar answered Sep 30 '22 13:09

richardtallent