Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres ARRAY column type to tbl list column in R and viceversa

Let's say that I'm working with the starwars dataset from dplyr package, which contains list columns (for films, vehicles...). To simplify, let's work with only the name and the films data:

library(dplyr)
ex_data <- starwars %>%
  select(name, films)

ex_data

# A tibble: 87 x 2
   name               films    
   <chr>              <list>   
 1 Luke Skywalker     <chr [5]>
 2 C-3PO              <chr [6]>
 3 R2-D2              <chr [7]>
 4 Darth Vader        <chr [4]>
 5 Leia Organa        <chr [5]>
 6 Owen Lars          <chr [3]>
 7 Beru Whitesun lars <chr [3]>
 8 R5-D4              <chr [1]>
 9 Biggs Darklighter  <chr [1]>
10 Obi-Wan Kenobi     <chr [6]>
# ... with 77 more rows

And I want to copy this data to a PostgreSQL database, but I want the films column to be converted to an text array type in the SQL, something like this:

starwars=# \d test
       Table "public.test"
   Column   |  Type  | Modifiers
------------+--------+-----------
 name       | text   | not null
 films      | text[] |

Is there any easy way of doing this with dplyr and/or dbplyr?

Also I'm interested in the opposite operation, importing a table with an array type column from postgreSQL db into R and getting a list-type column for the array one.

Thank you in advance!

like image 442
MalditoBarbudo Avatar asked Nov 07 '22 23:11

MalditoBarbudo


1 Answers

Below a poor man's workaround using the existing R database packages and some SQL code. I use the data.table package for wrangling the data (can easily be adapted to dplyr).

# packages
require(data.table)
require(RPostgreSQL)

# data preparation
dt = dplyr::starwars
setDT(dt)
dt = dt[ , .SD, .SDcols = c('name', 'films') ]
# list conversion
dt2 = dt[ , .(films = unlist(films)), name ]

# database I/O
drv = dbDriver('PostgreSQL')
con = dbConnect(drv,
                dbname = dbname,
                user = user, password = password,
                host = host, port = port)
# write
dbWriteTable(con, name = 'test', value = dt2,
             overwrite = TRUE, row.names = FALSE)
# convert to array in database
q = "DROP TABLE IF EXISTS test2;
     CREATE TABLE test2 AS (
       SELECT name, array_agg(films) AS films
       FROM test
       GROUP BY name
     );"
dbSendQuery(con, q)

# read arrays
dt3 = dbReadTable(con, 'test')

# convert back to list columns
setDT(dt3)
dt4 = dt3[ , .(films = list(films)), name ]

dbDisconnect(con)
dbUnloadDriver(drv)

Some thoughts: It would be nice if such a feature existed in RPostgreSQL or one of the other database packages. It surely doesn't scale well. Unfortunately, I lack skills in C to help implementing this.

like image 58
andschar Avatar answered Nov 14 '22 22:11

andschar