Suppose I have a connection to an external database called con.
I would like to use dplyr to reproduce this query
SELECT var1, var2, var3 from myTable LIMIT 10
I have tried
qry <- tbl(con, "myTable") %>%
    select(var1) %>%
    filter(between(row_number(), 1, 10)
but it does not give the desired result and the query it produces is much slower than the one I am after.
The query it produces is
SELECT "var1",
FROM SELECT "var1", row_number() OVER () AS "zzz25"
FROM SELECT "var1" AS "var1"
FROM myTable "yhydrqlhho") "zsdfunxidf"
WHERE zzz25 BETWEEN 1.0 AND 10.0)
When I send this query to the database it runs for a very long time. When I send
SELECT var1 from myTable limit 10 
the result comes back instantaneously.
The SQL LIMIT clause restricts how many rows are returned from a query. The syntax for the LIMIT clause is: SELECT * FROM table LIMIT X;. X represents how many records you want to retrieve. For example, you can use the LIMIT clause to retrieve the top five players on a leaderboard.
The LIMIT clause is used to specify the number of records to return. The LIMIT clause is useful on large tables with thousands of records. Returning a large number of records can impact performance.
dplyr data verbsBased on SQL syntax: select() -> SELECT. mutate() -> user-defined columns. summarize() -> aggregated columns.
You can try head(10), it generates the correct sql query on Postgres:
tbl(con, 'my_table') %>% select(var1, var2) %>% head(6) %>% explain()
# here con is a PostgreSQL connection
#<SQL>
#SELECT "var1" AS "var1", "var2" AS "var2"
#FROM "my_table"
#LIMIT 6
                        If you're after the actual data from your query, rather than just recreating the SQL query, then specifying collect(n=10) will give the same output as @Psidom's answer.
tbl(con, 'my_table') %>% select(var1, var2) %>% collect(n=10)
# A tibble: 10 x 2
   var1  var2 
   <chr>       <dbl>   
 1 text1            87.8     
 2 text2            99.6    
 3 text3           100       
 4 text4            91.9     
 5 text5            76.8    
 6 text6            77.8    
 7 text7            77.2    
 8 text8            97.2  
 9 text9            97.5
10 text10            80.4
Note that the default in collect() is n = 1e+05, so if your data (after filtration) contains more rows, you'll need to specify collect(n=Inf) to retrieve it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With