Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting unique values from an internal table

What is the most efficient way to extract the unique values from a column or multiple columns of an internal table?

like image 366
Lilienthal Avatar asked Sep 30 '15 09:09

Lilienthal


3 Answers

If you have 7.40 SP08 or above you can simply use the inline syntax to populate the target table (no need for LOOP GROUP BY):

DATA: it_unique TYPE STANDARD TABLE OF fieldtype.
it_unique = VALUE #(
  FOR GROUPS value OF <line> IN it_itab
  GROUP BY <line>-field WITHOUT MEMBERS ( value ) ).

This works with any type of the target table.


For an older release use:

DATA: it_unique TYPE HASHED TABLE OF fieldtype WITH UNIQUE KEY table_line.
LOOP AT it_itab ASSIGNING <line>.
  INSERT <line>-field INTO TABLE lt_unique.
ENDLOOP.

The above works with sorted tables as well. Although I do not recommend to use sorted tables for this purpose unless you are really sure that only a few lines will be in the result.

The non-zero sy-subrc of INSERT is simply ignored. No need to do the key lookup twice (once for existence check, once for insert).


If the target must be a STANDARD TABLE and you have an old ABAP stack you can alternatively use

DATA: it_unique TYPE STANDARD TABLE OF fieldtype.
LOOP AT it_itab ASSIGNING <line>.
  READ TABLE lt_unique WITH TABLE KEY table_line = <line>-field
    TRANSPORTING NO FIELDS BINARY SEARCH.
  INSERT <line>-field INTO lt_unique INDEX sy-tabix.
ENDLOOP.

This provides the same behavior as with a sorted table but with a standard table. Whether this is more efficient than SORT / DELETE ADJACENT DUPLICATES depends on the number of duplicate entries in itab. The more duplicate entries exist the faster will be the above solution because it avoids the unnecessary appends to the target table. But on the other side appends are faster than inserts.

like image 135
Marcel Avatar answered Nov 11 '22 09:11

Marcel


Prior to ABAP 7.40's SP08 release, the most efficient way of extracting unique values from an internal table or itab is the following:

LOOP AT lt_itab ASSIGNING <ls_itab>.
    APPEND <ls_itab>-value TO lt_values.
ENDLOOP.
SORT lt_values.
DELETE ADJACENT DUPLICATES FROM lt_values.

Checking the presence of a given <ls_itab>-value before adding it to the internal table is another way of guaranteeing uniqueness but will probably be much more computationally expensive when inserting into a standard table. For sorted or hashed destination tables, use:

LOOP AT lt_itab ASSIGNING <ls_itab>.
    READ TABLE lt_sorted_values WITH KEY table_line = <ls_itab>-value BINARY SEARCH.
    IF sy-subrc <> 0.
        APPEND <ls_itab>-value TO lt_sorted_values.
    ENDIF.
ENDLOOP.

Note that using the first method but inserting the values into a dummy table followed by an APPEND LINES OF lt_dummy INTO lt_sorted_values may be faster, but the size of the intermediate tables can muddle that.


As of ABAP 7.40 Support Package 08 however, the GROUP BY loops offer a better way to extract unique values. As the name indicates these function similarly to SQL's GROUP BY. For instance, the following code will extract unique project numbers from an internal table:

LOOP AT lt_project_data ASSIGNING FIELD-SYMBOL(<ls_grp_proj>)
    GROUP BY ( project = <ls_grp_proj>-proj_number ) ASCENDING
    WITHOUT MEMBERS
    ASSIGNING FIELD-SYMBOL(<ls_grp_unique_proj>).
        APPEND <ls_grp_unique_proj>-project TO lt_unique_projects.
ENDLOOP.

The same logic can be extended to retrieve unique pairs, such as the composite primary keys of the EKPO table, EBELN ("Purchasing Document", po_nr) and EBELP ("Item Number of Purchasing Document", po_item):

LOOP AT lt_purchasing_document_items ASSIGNING FIELD-SYMBOL(<ls_grp_po>)
    GROUP BY ( number = <ls_grp_po>-po_nr
               item   = <ls_grp_po>-po_item ) ASCENDING
    WITHOUT MEMBERS
    ASSIGNING FIELD-SYMBOL(<ls_grp_po_item>).
        APPEND VALUE #( ebeln = <ls_grp_po_item>-number
                        ebelp = <ls_grp_po_item>-item ) TO lt_unique_po_items.
ENDLOOP.

According to Horst Keller, one of the SAP designers of the new ABAP 7.40 release, the performance of GROUP BY loops is likely to be the same as a manual implementation of these LOOPs. Depending on how (in)efficiently such a custom loop is implemented it may even be faster. Note that these will be faster than the two methods given above for systems where the GROUP BY loops are not available.


Note that in most cases querying the database to return DISTINCT values will be much faster and performance-wise doing that will blow any ABAP code that uses internal tables out of the water, especially on HANA systems.

like image 9
Lilienthal Avatar answered Nov 11 '22 08:11

Lilienthal


How about this?

lt_unique[] = lt_itab[].
SORT lt_unique[] BY field1 field2 field3...
DELETE ADJACENT DUPLICATES FROM lt_values COMPARING field1 field2 field3...
like image 4
Thomas Matecki Avatar answered Nov 11 '22 07:11

Thomas Matecki