Suppose I have a connection to an external database called con
.
I would like to use dplyr
to reproduce this query
SELECT var1, var2, var3 from myTable LIMIT 10
I have tried
qry <- tbl(con, "myTable") %>%
select(var1) %>%
filter(between(row_number(), 1, 10)
but it does not give the desired result and the query it produces is much slower than the one I am after.
The query it produces is
SELECT "var1",
FROM SELECT "var1", row_number() OVER () AS "zzz25"
FROM SELECT "var1" AS "var1"
FROM myTable "yhydrqlhho") "zsdfunxidf"
WHERE zzz25 BETWEEN 1.0 AND 10.0)
When I send this query to the database it runs for a very long time. When I send
SELECT var1 from myTable limit 10
the result comes back instantaneously.
The SQL LIMIT clause restricts how many rows are returned from a query. The syntax for the LIMIT clause is: SELECT * FROM table LIMIT X;. X represents how many records you want to retrieve. For example, you can use the LIMIT clause to retrieve the top five players on a leaderboard.
The LIMIT clause is used to specify the number of records to return. The LIMIT clause is useful on large tables with thousands of records. Returning a large number of records can impact performance.
dplyr data verbsBased on SQL syntax: select() -> SELECT. mutate() -> user-defined columns. summarize() -> aggregated columns.
You can try head(10)
, it generates the correct sql query on Postgres
:
tbl(con, 'my_table') %>% select(var1, var2) %>% head(6) %>% explain()
# here con is a PostgreSQL connection
#<SQL>
#SELECT "var1" AS "var1", "var2" AS "var2"
#FROM "my_table"
#LIMIT 6
If you're after the actual data from your query, rather than just recreating the SQL query, then specifying collect(n=10)
will give the same output as @Psidom's answer.
tbl(con, 'my_table') %>% select(var1, var2) %>% collect(n=10)
# A tibble: 10 x 2
var1 var2
<chr> <dbl>
1 text1 87.8
2 text2 99.6
3 text3 100
4 text4 91.9
5 text5 76.8
6 text6 77.8
7 text7 77.2
8 text8 97.2
9 text9 97.5
10 text10 80.4
Note that the default in collect()
is n = 1e+05
, so if your data (after filtration) contains more rows, you'll need to specify collect(n=Inf)
to retrieve it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With