Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MS Access Batch Update via ADO.Net and COM Interoperability

This is kind of a follow-up to this thread. This is all with .Net 2.0; for me, at least.

Essentially, Marc (OP from above) tried several different approaches to update an MS Access table with 100,000 records and found that using a DAO connection was roughly 10 - 30x faster than using ADO.Net. I went down virtually the same path (examples below) and came to the same conclusion.

I guess I'm just trying to understand why OleDB and ODBC are so much slower and I'd love to hear if anyone has found a better answer than DAO since that post in 2011. I would really prefer to avoid DAO and/or Automation, since they're going to require the client machine to either have Access or the database engine redistributable (or I'm stuck with DAO 3.6 which doesn't support .ACCDB).

Original attempt; ~100 seconds for 100,000 records/10 columns:

Dim accessDB As New OleDb.OleDbConnection( _ 
                      "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & _
                                accessPath & ";Persist Security Info=True;")
accessDB.Open()

Dim accessCommand As OleDb.OleDbCommand = accessDB.CreateCommand
Dim accessDataAdapter As New OleDb.OleDbDataAdapter( _
                                   "SELECT * FROM " & tableName, accessDB)
Dim accessCommandBuilder As New OleDb.OleDbCommandBuilder(accessDataAdapter)

Dim accessDataTable As New DataTable
accessDataTable.Load(_Reader, System.Data.LoadOption.Upsert)

//This command is what takes 99% of the runtime; loops through each row and runs 
//the update command that is built by the command builder. The problem seems to 
//be that you can't change the UpdateBatchSize property with MS Access
accessDataAdapter.Update(accessDataTable)

Anyway, I thought this was really odd so I tried several flavors of the same thing:

  • Switching out OleDB for ODBC
  • Looping through the data table and running an INSERT statement for each row
    • This is what .Update does anyway
  • Using the ACE provider instead of Jet (ODBC and OleDB)
  • Running the Data Adapter Update from within the DataReader.Read loop
    • Out of frustration; it was hilarious.

Finally, I tried using DAO. The code should basically be doing the same thing; except it clearly isn't, because it this runs in ~10 seconds.

 Dim dbEngine As New DAO.DBEngine
 Dim accessDB As DAO.Database = dbEngine.OpenDatabase(accessPath)
 Dim accessTable As DAO.Recordset = accessDB.OpenRecordset(tableName)

While _Reader.Read
    accessTable.AddNew()
      For i = 0 To _Reader.FieldCount - 1
        accessTable.Fields(i).Value = _Reader.Item(i).ToString
      Next
    accessTable.Update()
End While

A few other notes:

  • Everything is converted to Strings in all examples to try to keep things as simple and consistent as possible
    • Exception: In my first example, using the Table.Load function, I don't because... well, I really can't, but I did basically the same thing when I looped through the reader and built insert commands (which is what it's doing, anyway). It didn't help.
  • For Each Field...Next vs. Field(i) vs. Field(name) made no difference for me
  • Every test I ran started with an empty, pre-built data table in a freshly compacted Access database
  • Loading the Data Reader to a Data Table in memory takes ~3 seconds
  • I don't think it's an issue with marshaling the data, because Marc's post indicated that loading a text file via Automation is as fast as DAO -- if anything, it shouldn't marshal the data when using ODBC/OleDB, but it should when using Automation
  • All of this bothers me way more than it should, because it doesn't make sense

Hopefully someone will be able to shed some light on this... it's just strange. Thanks in advance!

like image 302
Karter Avatar asked Apr 03 '13 13:04

Karter


2 Answers

The reason here is that the DAO driver sits much closer to the MS Access Database engine than the ODBC driver.

The DAO methods AddNew and Update delegate directly to MS Access equivalents, at no point does it generate SQL, so there's no SQL to be parsed by the MS Access.

On the other hand, the DataAdapter code generates an Update statement for each row, that update statement gets passed to ODBC, which then passes this to a MSAccess driver, which either

  1. independently parses the SQL and issues AddNew and Update commands to the Access database or
  2. passes the SQL to MS Access, which isn't optimised for parsing SQL, and which once parsed, ends up translating the SQL into AddNew and Update commands.

either way, your time is taken generating SQL and then having something interpret that SQL, where the DAO approach bypasses SQL generation / interpretation and goes straight to the metal.

One way around this is to create your own "database service" running on the machine with the access db. This marshals your selects & updates and could communicate with the client over remoting, WCF (http, or whatever). This is a lot of work and changes your application logic considerably.

Figuring out the correct name for the database driver (e.g. Jet or whatever) is an exercise left to the reader

like image 82
Binary Worrier Avatar answered Oct 30 '22 15:10

Binary Worrier


I know this question is old but the answer may help someone still struggling with this.

There is another method to consider. As both source and target connection strings are known, source tables can be linked to the target Access database, possibly with some connection string parsing needed, via DAO or ADOX (I know, ADOX is off-topic here).
The data in tables so linked can then be transferred fairly quickly by issuing statements like this on a DAO or OleDb connection to the target Access database:

SELECT * INTO Table1 FROM _LINKED_Table1

Some drawbacks (please point out anything I missed):

  • source table must contain a primary key
  • primary keys and indexes have to be re-created by examining the source Indexes schema
  • not easily getting transfer progress status while the query is running

Some advantages (please point out anything I missed):

  • only having to examine the source Tables schema, if all user tables are to be copied
  • not having to examine the source Columns schema to generate column definitions for CREATE TABLE statements
    (for instance, try getting the AUTONUMBER / IDENTITY info reliably out of an OleDb schema, ie without assumptions about combinations of column values and flag bits based on examining other schemas)
  • not having to generate vast amounts of INSERT INTO ... VALUES ... statements, accounting for AUTONUMBER / IDENTITY columns in your code, or otherwise have a database operation run for each row in your code
  • being able to specify criteria to filter transferred records
  • not having to worry about text, date or time columns or how to delimit, escape or format their values in queries except when used in query criteria

This method was employed in a production project and turned out to be the quickest, for me at least. :o)

like image 40
Ruutsa Avatar answered Oct 30 '22 14:10

Ruutsa