Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use RODBC to save dataframe to table with primary key generated at database

I would like to enter a data frame into an existing table in a database using an R script, and I want the table in the database to have a sequential primary key. My problem is that RODBC doesn't seem to allow the primary key constraint.

Here's the SQL for creating the table I want:

CREATE TABLE [dbo].[results] (
    [ID]         INT            IDENTITY (1, 1) NOT NULL,
    [FirstName]  VARCHAR (255) NULL,
    [LastName]   VARCHAR (255) NULL,
    [Birthday]   DATETIME      NULL,
    [CreateDate] DATETIME      NULL,
    CONSTRAINT [PK_dbo.results] PRIMARY KEY CLUSTERED ([ID] ASC)
);

And a test with some R code:

ConnectionString1="Driver=ODBC Driver 11 for SQL Server;Server=myserver; Database=TestDb; trusted_connection=yes"
ConnectionString2="Driver=ODBC Driver 11 for SQL Server;Server=notmyserver; Database=TestDb; trusted_connection=yes"
db1=odbcDriverConnect(ConnectionString1)    
query="SELECT a.[firstname] as FirstName
  , a.[lastname] as LastName
  , Cast(a.[dob] as datetime) as Birthday
  , cast(a.createDate as datetime) as CreateDate
FROM [dbo].[People] a"
results=NULL
results=sqlQuery(db1,query,stringsAsFactors=FALSE)
close(db1)

db2=odbcDriverConnect(ConnectionString)
sqlSave(db2, 
    results, 
    append = TRUE, 
    varTypes=c(Birthday="datetime", CreateDate="datetime"),
    colnames = FALSE,  
    rownames = FALSE,fast=FALSE)
close(db2)

The first part of the R code is just getting some test data into a dataframe--it works fine and it's not part of my question here (I'm just including it here so you can see what format the test data is). When I run the sqlSave function I get an error message:

Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

However, if I remove the primary key from the database, everything works fine with this table:

CREATE TABLE [dbo].[results] (
    [FirstName]  VARCHAR (255) NULL,
    [LastName]   VARCHAR (255) NULL,
    [Birthday]   DATETIME      NULL,
    [CreateDate] DATETIME      NULL
);

Clearly the primary key is the issue. Normally with entity framework or whatever (as I understand it), the primary key is created at the database when you enter data.

I'd like a way to append data to a table with a primary key using only an R script. Is that possible? There could already be data in the table I'm adding to, so I don't really see a way to create keys in R before trying to append to the table.

like image 965
Matthew Avatar asked May 12 '18 15:05

Matthew


People also ask

What is the rodbc package used for?

The RODBC package provides functions that you can use to access the data in your database. In the RODBC package: Functions with names that begin with odbc invoke the ODBC functions that have similar names. Functions with names that begin with sql can be used to read, save, copy, and manipulate data between data frames and SQL tables.

How do I access the data in my rodbc database?

The RODBC package provides functions that you can use to access the data in your database. The following script illustrates how to use the odbcConnect () method to establish a database connection. The specified data source name (DSN) is IDADB, and the believeNRows parameter is set to FALSE to avoid any initial connection issues.

How to read data from SQL Server table to spark dataframe?

Read data from SQL Server table dbo.Employees as a Spark dataframe using JDBC driver. Select a few columns from the table and then save this new dataframe into a new table named dbo.Employees2. In the sample code, I used a driver locates at 'sqljdbc_7.2/enu/mssql-jdbc-7.2.2.jre8.jar'.

How to go from The Dataframe to SQL?

Now let’s see how to go from the DataFrame to SQL, and then back to the DataFrame. For this example, you can create a new database called: ‘ test_database_2 ‘ Then, create the same products table using this syntax: Now, build the DataFrame: Apply the code to go from the DataFrame to SQL:


1 Answers

The problem is line 361 in http://github.com/cran/RODBC/blob/master/R/sql.R - the data.frame and the DB table must have exactly the same number of columns otherwise you get this error with this stacktrace:

Error in dimnames(x) <- dn : 
  length of 'dimnames' [2] not equal to array extent 
3. `colnames<-`(`*tmp*`, value = c("ID", "FirstName", "LastName", 
   "Birthday", "CreateDate")) at sql.R#361
2. sqlwrite(channel, tablename, dat, verbose = verbose, fast = fast, 
   test = test, nastring = nastring) at sql.R#211
1. sqlSave(db2, results, append = TRUE, varTypes = c(Birthday = "datetime", 
    CreateDate = "datetime"), colnames = FALSE, rownames = FALSE, 
    fast = FALSE, verbose = TRUE) 

If you add the ID column to your data.frame you can no longer use the autoinc ID column so this is no solution (or workaround).

A "simple" workaround to the "same columns" limitation of RODBC::sqlSave is:

  1. Use sqlSave to save the new rows into another table name
  2. Send an insert into ... select from ... via RODBC::sqlQuery to append the new rows to your original table that includes the autoinc ID column
  3. Delete the table with the new rows again (drop table...)

A better option would be to use the new odbc package which also offers better performance through bulk-alike inserts instead of sending single insert statements like RODBC does:

https://github.com/r-dbi/odbc

Look for the function dbWriteTable (which is an implementation of the interface DBI::dbWriteTable).

like image 120
R Yoda Avatar answered Sep 24 '22 19:09

R Yoda