Related to https://stackoverflow.com/a/33284035/3358272, I'm finding inconsistent behavior with pulling data from SQL Server (2014).
library(RODBC)
sqlQuery(.conn, "CREATE TABLE r2test ( [mychar] [NVARCHAR](16), [mynum] [FLOAT])")
# character(0)
sqlQuery(.conn, "INSERT INTO r2test (mychar,mynum) VALUES ('1',3.141593),('2',6.283185)")
character(0)
str(sqlQuery(.conn, "SELECT * FROM r2test", stringsAsFactors = FALSE))
# 'data.frame': 2 obs. of 2 variables:
# $ mychar: int 1 2
# $ mynum : num 3.14 6.28
In that example we see the undesired behavior: the characters of mychar
are being internally converted to integers. Per the previously-mentioned SO answer, the as.is
option defeats this, but has the unfortunate side-effect of also forcing decimal-represented floats to strings:
str(sqlQuery(.conn, "SELECT * FROM r2test", stringsAsFactors = FALSE, as.is = TRUE))
# 'data.frame': 2 obs. of 2 variables:
# $ mychar: chr "1" "2"
# $ mynum : chr "3.1415929999999999" "6.2831849999999996"
If at least one of mychar
is actually not integer-izable, things are fine:
sqlQuery(.conn, "INSERT INTO r2test (mychar,mynum) VALUES ('a',9.424778)")
# character(0)
str(sqlQuery(.conn, "SELECT * FROM r2test", stringsAsFactors = FALSE))
# 'data.frame': 3 obs. of 2 variables:
# $ mychar: chr "1" "2" "a"
# $ mynum : num 3.14 6.28 9.42
Unfortunately, the data model does not support arbitrarily adding something to encourage this behavior (or I just haven't thought of a good-enough way of doing it). The data model is such that values of mychar
include 01
and 1
which are character-wise distinct. The only workaround I've found is to use as.is = TRUE
, which will require me to as.numeric
all related columns, something that is both tedious and (theoretically) unnecessary work.
Since the docs suggest needing to set DBMSencoding
, I checked the current encoding (helped by https://stackoverflow.com/a/5182469/3358272):
sqlQuery(.conn, "SELECT SERVERPROPERTY('Collation')")
# 1 SQL_Latin1_General_CP1_CI_AS
I've tried using (for kicks): DBMSencoding="latin1"
, DBMSencoding="UTF-8"
, and explicitly though the default DBMSencoding=""
with no change in behavior.
How can I encourage the behavior of not over-coercing the data types?
Currently using R-3.2.5 and RODBC-1.3.13 on ubuntu.
If I understand correctly, I think this if what you are looking for,
str(sqlQuery(
.conn,
"SELECT * FROM r2test",
stringsAsFactors = FALSE,
as.is = c(TRUE, FALSE)
))
#'data.frame': 2 obs. of 2 variables:
# $ mychar: chr "1" "2"
# $ mynum : num 3.14 6.28
where as.is
is specified as a logical vector (must have the same length as the number of columns in the result set). To be fair, this isn't really spelled out very well. The man page for sqlQuery
just refers you to the as.is
argument in read.table
, which states:
Note that as.is is specified per column (not per variable) and so includes the column of row names (if any) and any columns to be skipped.
The downside of this approach is that you need to know in advance which columns you want to convert and which you don't. Personally I don't see why the default behavior isn't to just map SQL character types to R character types, SQL numeric types to R numeric types, etc, but perhaps there is a good reason for this on the backend. Automatically converting '1'
, '2'
, ... to integers does not seem like much of a "feature" to me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With