Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RODBC does not save greater than 8k varchar/text from a data frame

Tags:

r

rodbc

I am scraping web data and some results are more than 8K of text. I have a field in my MS SQL Server 2008 R2 that is varchar(max) but RODBC only saves 8K and truncates the rest. I have DSN connections set up using both SQL Native Client 10 from SS R2 and ODBC 11.

sqlType returns -1 for the field title, indicating varchar(max)/text.

Both connections truncate the text at 8K. Is there a fix/workaround for this or am I limited to 8K in SQL Server? I can switch to MySQL if the R mySQL driver will pass the larger amounts of text through. I'm using R 3.0.1 and RODBC 1.3-6 on Win 7 x64.

Here is some sample code to replicate the problem.

CREATE TABLE
[test](
[title] [varchar](max) NULL
) ON [PRIMARY]


library(RODBC)
ch <- odbcConnect("text", uid = "X", pwd = "X")
sqlTypeInfo(ch,"test")

testtext = data.frame("salasjflsjsf")
colnames(testtext) = "title"

## This works
sqlSave(ch,testtext,"test",append=TRUE, rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)


## text > 8K
testtext = data.frame(paste(rep("salasjflsjsf",5000),collapse=""))
colnames(testtext) = "title"

## use ODBC default type for title
sqlSave(ch,testtext,"test",append=TRUE, rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)

## this message from ODBC
## Query: INSERT INTO "test" ( "title" ) VALUES ( ? )
## Binding: 'title' DataType 12, ColSize 8000
## Parameters:
## no: 1: title salasjfl
## [*text removed*]
## lsjsfsalasjflsjsfsalasjflsjsfsalasjflsjsfsalasjflsjsfsalasjflsjsfsalasjflsjsf/***/
## Warning message:
## In odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
##   character data 'salasjflsjsfsalasjflsjsfsalasjflsjsfsalasjfl




## try with varType set to text
sqlSave(ch,testtext,"test",append=TRUE, varType = "text",rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)

## try with varType set to varchar(MAX)
sqlSave(ch,testtext,"test",append=TRUE, varType = "varchar(max)",rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)

close(ch)

I changed the column type to text and refreshed the DSN connection. Text type is being picked up but now there is an error message:

 chtest <-odbcConnect("test2", uid = "X", pwd = "X")
    > sqlColumns(chtest,"test")
      TABLE_CAT TABLE_SCHEM TABLE_NAME COLUMN_NAME DATA_TYPE TYPE_NAME COLUMN_SIZE BUFFER_LENGTH DECIMAL_DIGITS NUM_PREC_RADIX NULLABLE REMARKS
    1   RSS            dbo       test       title        -1      text  2147483647    2147483647             NA             NA        1    <NA>
      COLUMN_DEF SQL_DATA_TYPE SQL_DATETIME_SUB CHAR_OCTET_LENGTH ORDINAL_POSITION IS_NULLABLE SS_IS_SPARSE SS_IS_COLUMN_SET SS_IS_COMPUTED
    1       <NA>            -1               NA        2147483647                1         YES            0                0              0
      SS_IS_IDENTITY SS_UDT_CATALOG_NAME SS_UDT_SCHEMA_NAME SS_UDT_ASSEMBLY_TYPE_NAME SS_XML_SCHEMACOLLECTION_CATALOG_NAME
    1              0                <NA>               <NA>                      <NA>                                 <NA>
      SS_XML_SCHEMACOLLECTION_SCHEMA_NAME SS_XML_SCHEMACOLLECTION_NAME SS_DATA_TYPE
    1                                <NA>                         <NA>           35
    > sqlSave(chtest,testtext,"test",append=TRUE, varType = "text",rownames=FALSE, colnames = FALSE, safer=FALSE,  verbose = TRUE)
    Query: INSERT INTO "test" ( "title" ) VALUES ( ? )
    Binding: 'title' DataType -1, ColSize 2147483647
    Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  : 
      'Calloc' could not allocate memory (18446744071562067968 of 1 bytes)
    In addition: Warning messages:
    1: In odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
      Reached total allocation of 6013Mb: see help(memory.size)
    2: In odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
      Reached total allocation of 6013Mb: see help(memory.size)
like image 864
user2720047 Avatar asked Nov 11 '22 22:11

user2720047


1 Answers

You defined the text column to be of type varchar(max), so it will be stored that way, regardless of specifying varType when saving the data.

Try creating a table with a TEXT column instead, then using sqlSave. This way the data will be stored out of row context and you'll shouldn't have the truncation issue.

like image 118
Aert Avatar answered Nov 15 '22 05:11

Aert