Is there a dplyr
(or other package) command for getting the column (field?) types of an SQL table? For example...
library(RSQLite)
library(dplyr)
data(iris)
dat_sql <- src_sqlite("test.sqlite", create = TRUE)
copy_to(dat_sql, iris, name = "iris_df")
iris_tbl <- tbl(dat_sql, "iris_df")
iris_tbl
# Source: query [?? x 5]
# Database: sqlite 3.8.6 [test.sqlite]
#
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <chr>
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
# 7 4.6 3.4 1.4 0.3 setosa
# 8 5.0 3.4 1.5 0.2 setosa
# 9 4.4 2.9 1.4 0.2 setosa
# 10 4.9 3.1 1.5 0.1 setosa
# # ... with more rows
I'm interested in a command that would tell me that the first four columns are of type dbl
and the last is a chr
(or better yet, the R types numeric
and character
) without actually collect
ing the data in memory. Since it is printed, there has to be a way to do this, right? I tried str
to no avail:
str(iris_tbl)
# List of 2
# $ src:List of 2
# ..$ con :Formal class 'SQLiteConnection' [package "RSQLite"] with 5 slots
# .. .. ..@ Id :<externalptr>
# .. .. ..@ dbname : chr "test.sqlite"
# .. .. ..@ loadable.extensions: logi TRUE
# .. .. ..@ flags : int 6
# .. .. ..@ vfs : chr ""
# ..$ path: chr "test.sqlite"
# ..- attr(*, "class")= chr [1:3] "src_sqlite" "src_sql" "src"
# $ ops:List of 3
# ..$ src :List of 2
# .. ..$ con :Formal class 'SQLiteConnection' [package "RSQLite"] with 5 slots
# .. .. .. ..@ Id :<externalptr>
# .. .. .. ..@ dbname : chr "test.sqlite"
# .. .. .. ..@ loadable.extensions: logi TRUE
# .. .. .. ..@ flags : int 6
# .. .. .. ..@ vfs : chr ""
# .. ..$ path: chr "test.sqlite"
# .. ..- attr(*, "class")= chr [1:3] "src_sqlite" "src_sql" "src"
# ..$ x :Classes 'ident', 'sql', 'character' chr "iris_df"
# ..$ vars: chr [1:5] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ...
# ..- attr(*, "class")= chr [1:3] "op_base_remote" "op_base" "op"
# - attr(*, "class")= chr [1:4] "tbl_sqlite" "tbl_sql" "tbl_lazy" "tbl"
# NULL
When printing a preview of the remote table, it looks like dplyr does use collect
on the first few rows of the table. Because dplyr retrieves some sample data, you could do this as well.
Here, we make a query for the first few rows with head
, collect
the query results, and inspect the class of each column.
iris_tbl %>%
head %>%
collect %>%
lapply(class) %>%
unlist
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> "numeric" "numeric" "numeric" "numeric" "character"
(When used with a data-frame, lapply
does column-wise function application, so it applies class
to each column.)
To get the types names that dplyr uses, use type_sum
.
iris_tbl %>% head %>% collect %>% lapply(type_sum) %>% unlist
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> "dbl" "dbl" "dbl" "dbl" "chr"
Have a look at glimpse()
This is like a transposed version of print: columns run down the page, and data runs across. This makes it possible to see every column in a data frame. It's a little like str applied to a data frame but it tries to show you as much data as possible. (And it always shows the underlying data, even when applied to a remote data source.)
Which gives:
> glimpse(iris_tbl)
#Observations: NA
#Variables: 5
#$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0,...
#$ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4,...
#$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5,...
#$ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2,...
#$ Species <chr> "setosa", "setosa", "setosa", "setosa",...
Should you want to get a vector you could do:
vapply(as.data.frame(head(iris_tbl)), typeof, character(1))
Which gives:
#Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# "double" "double" "double" "double" "character"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With