Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create the SQL query "SELECT * FROM myTable LIMIT 10" using dplyr

Suppose I have a connection to an external database called con.

I would like to use dplyr to reproduce this query

SELECT var1, var2, var3 from myTable LIMIT 10

I have tried

qry <- tbl(con, "myTable") %>%
    select(var1) %>%
    filter(between(row_number(), 1, 10)

but it does not give the desired result and the query it produces is much slower than the one I am after.

The query it produces is

SELECT "var1",
FROM SELECT "var1", row_number() OVER () AS "zzz25"
FROM SELECT "var1" AS "var1"
FROM myTable "yhydrqlhho") "zsdfunxidf"
WHERE zzz25 BETWEEN 1.0 AND 10.0)

When I send this query to the database it runs for a very long time. When I send

SELECT var1 from myTable limit 10 

the result comes back instantaneously.

like image 259
Adam Black Avatar asked Oct 27 '17 02:10

Adam Black


People also ask

How do I limit a SQL query?

The SQL LIMIT clause restricts how many rows are returned from a query. The syntax for the LIMIT clause is: SELECT * FROM table LIMIT X;. X represents how many records you want to retrieve. For example, you can use the LIMIT clause to retrieve the top five players on a leaderboard.

What is limit command in SQL?

The LIMIT clause is used to specify the number of records to return. The LIMIT clause is useful on large tables with thousands of records. Returning a large number of records can impact performance.

Is dplyr based on SQL?

dplyr data verbsBased on SQL syntax: select() -> SELECT. mutate() -> user-defined columns. summarize() -> aggregated columns.


2 Answers

You can try head(10), it generates the correct sql query on Postgres:

tbl(con, 'my_table') %>% select(var1, var2) %>% head(6) %>% explain()
# here con is a PostgreSQL connection

#<SQL>
#SELECT "var1" AS "var1", "var2" AS "var2"
#FROM "my_table"
#LIMIT 6
like image 99
Psidom Avatar answered Oct 17 '22 19:10

Psidom


If you're after the actual data from your query, rather than just recreating the SQL query, then specifying collect(n=10) will give the same output as @Psidom's answer.

tbl(con, 'my_table') %>% select(var1, var2) %>% collect(n=10)

# A tibble: 10 x 2
   var1  var2 
   <chr>       <dbl>   
 1 text1            87.8     
 2 text2            99.6    
 3 text3           100       
 4 text4            91.9     
 5 text5            76.8    
 6 text6            77.8    
 7 text7            77.2    
 8 text8            97.2  
 9 text9            97.5
10 text10            80.4

Note that the default in collect() is n = 1e+05, so if your data (after filtration) contains more rows, you'll need to specify collect(n=Inf) to retrieve it.

like image 6
IsoBar Avatar answered Oct 17 '22 19:10

IsoBar