Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling field types in database interaction with R

Tags:

types

r

rmysql

I use RMySQL and a MySQL database to store my datasets. Sometimes data gets revised or I store results back to the database as well. Long story short, there is quite some interaction between R and the database in my use case.

Most of the time I use convenience functions like dbWriteTable and dbReadTableto write and read my data. Unfortunately these are just completely ignoring R data types and the MySQL field types. I mean I would expect that MySQL date fields end up in a Date or POSIX class. The other way around I´d think that these R classes are stored as a somewhat corresponding MySQL field type. That means a date should not be character – I do not expect to distinguish between float and doubles here...

I also tried to use dbGetQuery – same result there. Is there something I have completely missed when reading the manual or is it simply not possible (yet) in these packages? What would by a nice work around?

EDIT: @mdsummer I tried to find something more in the documentation, but found only these disappointing lines: `MySQL tables are read into R as data.frames, but without coercing character or logical data into factors. Similarly while exporting data.frames, factors are exported as character vectors.

Integer columns are usually imported as R integer vectors, except for cases such as BIGINT or UNSIGNED INTEGER which are coerced to R's double precision vectors to avoid truncation (currently R's integers are signed 32-bit quantities).

Time variables are imported/exported as character data, so you need to convert these to your favorite date/time representation.

like image 290
Matt Bannert Avatar asked Feb 23 '11 11:02

Matt Bannert


People also ask

Which of the databases will work with R?

Databases in R Programming Language R can be connected to many relational databases such as Oracle, MySQL, SQL Server, etc, and fetches the result as a data frame.

Can you use R for database?

R can connect to almost any existing database type. Most common database types have R packages that allow you to connect to them (e.g., RSQLite , RMySQL, etc).

What are the two categories of relational database access packages in R?

RSQLite (This packages is used for bundled DBMS SQLite) RJDBC (This package uses Java and can connect to any DBMS with a JDBC driver) PL/R.


2 Answers

Ok, I got a working solution now. Here's a function that maps MySQL field types to R classes. This helps in particular handling the MySQL field type date...

dbReadMap <- function(con,table){
    statement <- paste("DESCRIBE ",table,sep="")
    desc <- dbGetQuery(con=con,statement)[,1:2]

  # strip row_names if exists because it's an attribute and not real column
  # otherweise it causes problems with the row count if the table has a row_names col
  if(length(grep(pattern="row_names",x=desc)) != 0){
  x <- grep(pattern="row_names",x=desc)
  desc <- desc[-x,]
  }



    # replace length output in brackets that is returned by describe
    desc[,2] <- gsub("[^a-z]","",desc[,2])

    # building a dictionary 
    fieldtypes <- c("int","tinyint","bigint","float","double","date","character","varchar","text")
    rclasses <- c("as.numeric","as.numeric","as.numeric","as.numeric","as.numeric","as.Date","as.character","as.character","as.character") 
    fieldtype_to_rclass = cbind(fieldtypes,rclasses)

    map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
    map$rclasses <- as.character(map$rclasses)
    #get data
    res <- dbReadTable(con=con,table)



    i=1
    for(i in 1:length(map$rclasses)) {
        cvn <- call(map$rclasses[i],res[,map$Field[i]])
        res[map$Field[i]] <- eval(cvn)
    }


    return(res)
}

Maybe this is not good programming practice – I just don't know any better. So, use it at your own risk or help me to improve it... And of course it's only half of it: reading. Hopefully I´ll find some time to write a writing function soon.

If you have suggestions for the mapping dictionary let me know :)

like image 52
Matt Bannert Avatar answered Oct 19 '22 19:10

Matt Bannert


Here is a more generic function of the function of @Matt Bannert that works with queries instead of tables:

# Extension to dbGetQuery2 that understands MySQL data types
dbGetQuery2 <- function(con,query){
    statement <- paste0("CREATE TEMPORARY TABLE `temp` ", query)
    dbSendQuery(con, statement)
    desc <- dbGetQuery(con, "DESCRIBE `temp`")[,1:2]
    dbSendQuery(con, "DROP TABLE `temp`")

    # strip row_names if exists because it's an attribute and not real column
    # otherweise it causes problems with the row count if the table has a row_names col
    if(length(grep(pattern="row_names",x=desc)) != 0){
        x <- grep(pattern="row_names",x=desc)
        desc <- desc[-x,]
    }

    # replace length output in brackets that is returned by describe
    desc[,2] <- gsub("[^a-z]","",desc[,2])

    # building a dictionary 
    fieldtypes <- c("int",        "tinyint",    "bigint",     "float",      "double",     "date",    "character",    "varchar",   "text")
    rclasses <-   c("as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.Date", "as.character", "as.factor", "as.character") 
    fieldtype_to_rclass = cbind(fieldtypes,rclasses)

    map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
    map$rclasses <- as.character(map$rclasses)
    #get data
    res <- dbGetQuery(con,query)

    i=1
    for(i in 1:length(map$rclasses)) {
        cvn <- call(map$rclasses[i],res[,map$Field[i]])
        res[map$Field[i]] <- eval(cvn)
    }

    return(res)
}
like image 33
ROLO Avatar answered Oct 19 '22 17:10

ROLO