Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limiting results in PROC SQL

Tags:

sas

proc-sql

I am trying to use PROC SQL to query a DB2 table with hundreds of millions of records. During the development stage, I want to run my query on an arbitrarily small subset of those records (say, 1000). I've tried using INOBS to limit the observations, but I believe that this parameter is simply limiting the number of records which SAS is processing. I want SAS to only fetch an arbitrary number of records from the database (and then process all of them).

If I were writing a SQL query myself, I would simply use SELECT * FROM x FETCH FIRST 1000 ROWS ONLY ... (the equivalent of SELECT TOP 1000 * FROM x in SQL Server). But PROC SQL doesn't seem to have any option like this. It's taking an extremely long time to fetch the records.

The question: How can I instruct SAS to arbitrarily limit the number of records to return from the database.

I've read that PROC SQL uses ANSI SQL, which doesn't have any specification for a row limiting keyword. Perhaps SAS didn't feel like making the effort to translate its SQL syntax to vendor-specific keywords? Is there no work around?

like image 771
JDB Avatar asked Jul 31 '12 18:07

JDB


People also ask

How do I limit records in SAS?

You can use the OBS= and FIRSTOBS= data set options to limit the number of observations that SAS processes. The OBS= data set option specifies the number of the last observation to process. It does not specify how many observations should be processed.

What is used to limit the result of a query?

The limit keyword is used to limit the number of rows returned in a query result.

What can I use instead of limits in SQL?

TIP: SELECT LIMIT is not supported in all SQL databases. For databases such as SQL Server or MSAccess, use the SELECT TOP statement to limit your results. The SELECT TOP statement is Microsoft's proprietary equivalent to the SELECT LIMIT statement.

How do I limit the number of rows in a SELECT statement?

If you don't need to omit any rows, you can use SQL Server's TOP clause to limit the rows returned. It is placed immediately after SELECT. The TOP keyword is followed by integer indicating the number of rows to return. In our example, we ordered by price and then limited the returned rows to 3.


2 Answers

Have you tried using the outobs option in your proc sql?

For example,

proc sql outobs=10; create table test
    as
    select * from schema.HUGE_TABLE
    order by n;
quit;

Alternatively, you can use SQL passthrough to write a query using DB2 syntax (FETCH FIRST 10 ROWS ONLY), although this requires you to store all your data in the database, at least temporarily.

Passthrough looks something like this:

proc sql;
    connect to db2 (user=&userid. password=&userpw.  database=MY_DB);

    create table test as
    select * from connection to db2 (
        select * from schema.HUGE_TABLE
        order by n
        FETCH FIRST 10 ROWS ONLY
    );
quit;

It requires more syntax and can't access your sas datasets, so if outobs works for you, I would recommend that.

like image 184
stevepastelan Avatar answered Oct 03 '22 00:10

stevepastelan


When SAS is talking to a database via SAS syntax, part of the query can be translated to DBMS language equivalent - this is called implicit pass through. The rest of the query is "post-processed" by SAS to produce final result. Depending on SAS version, DBMS vendor and DBMS version, and in some cases even some connection/libname options, different parts of SAS syntax are translatable/considered compatible between SAS and DBMS and thus sent to be performed by DBMS instead of SAS.

With SAS SQL options - INOBS and OUTOBS - I've worked a lot with MS SQL and Oracle via different versions of SAS, but I haven't seen those ever translated to TOP xxx type of queries, so this is probably not supported yet, although when query touches just DMBS data (no joins to SAS data etc), should be quite doable.

So I think you're left with the so called explicit pass-through - specific SAS SQL syntax to connect to database. This type of queries look like this:

proc sql;
    connect to oracle as db1 (user=user1 pw=pasw1 path=DB1);
    create table test_table as
    select *
    from connection to db1
        ( /* here we're in oracle */
                  select * from test.table1 where rownum <20 
                )
    ;
    disconnect from db1;
quit;

In SAS 9.3 the syntax can be simplified - if there's already a LIBNAME connection, you can reuse it for explicit pass-through:

LIBNAME ORALIB ORACLE user=...;

PROC SQL;
connect to oracle using ORALIB;
create table work.test_table as
        select *
        from connection to ORALIB (
....

When connecting using libname be sure to use READBUFF (I usually set some 5000 or so) or INSERTBUFF options (1000 or more) when loading database.

To see if implicit pass-through takes place, set sastrace option:

option sastrace=',,,ds' sastraceloc=saslog nostsuffix;
like image 39
vasja Avatar answered Oct 03 '22 01:10

vasja