Let's say that I'm working with the starwars
dataset from dplyr package, which contains list columns (for films, vehicles...). To simplify,
let's work with only the name and the films data:
library(dplyr)
ex_data <- starwars %>%
select(name, films)
ex_data
# A tibble: 87 x 2
name films
<chr> <list>
1 Luke Skywalker <chr [5]>
2 C-3PO <chr [6]>
3 R2-D2 <chr [7]>
4 Darth Vader <chr [4]>
5 Leia Organa <chr [5]>
6 Owen Lars <chr [3]>
7 Beru Whitesun lars <chr [3]>
8 R5-D4 <chr [1]>
9 Biggs Darklighter <chr [1]>
10 Obi-Wan Kenobi <chr [6]>
# ... with 77 more rows
And I want to copy this data to a PostgreSQL database, but I want the films column to be converted to an text array type in the SQL, something like this:
starwars=# \d test
Table "public.test"
Column | Type | Modifiers
------------+--------+-----------
name | text | not null
films | text[] |
Is there any easy way of doing this with dplyr
and/or dbplyr
?
Also I'm interested in the opposite operation, importing a table with an array type column from postgreSQL db into R and getting a list-type column for the array one.
Thank you in advance!
Below a poor man's workaround using the existing R database packages and some SQL code. I use the data.table package for wrangling the data (can easily be adapted to dplyr).
# packages
require(data.table)
require(RPostgreSQL)
# data preparation
dt = dplyr::starwars
setDT(dt)
dt = dt[ , .SD, .SDcols = c('name', 'films') ]
# list conversion
dt2 = dt[ , .(films = unlist(films)), name ]
# database I/O
drv = dbDriver('PostgreSQL')
con = dbConnect(drv,
dbname = dbname,
user = user, password = password,
host = host, port = port)
# write
dbWriteTable(con, name = 'test', value = dt2,
overwrite = TRUE, row.names = FALSE)
# convert to array in database
q = "DROP TABLE IF EXISTS test2;
CREATE TABLE test2 AS (
SELECT name, array_agg(films) AS films
FROM test
GROUP BY name
);"
dbSendQuery(con, q)
# read arrays
dt3 = dbReadTable(con, 'test')
# convert back to list columns
setDT(dt3)
dt4 = dt3[ , .(films = list(films)), name ]
dbDisconnect(con)
dbUnloadDriver(drv)
Some thoughts: It would be nice if such a feature existed in RPostgreSQL or one of the other database packages. It surely doesn't scale well. Unfortunately, I lack skills in C to help implementing this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With