Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can duplicates be removed from a file using COBOL?

The input file have records as: 8712351,8712353,8712353,8712354,8712356,8712352,8712355 8712352,8712355

Using COBOL, I need to remove duplicates from the above file and write to an output file. I wrote simple logic to read records and write to an output file.

Where do I need to put the logic of removing duplicates (say, 8712353, 8712352) from the above file?

Here is the program logic:

   IDENTIFICATION DIVISION.    PROGRAM-ID.RemoveDup.    ENVIRONMENT DIVISION.    INPUT-OUTPUT SECTION.    FILE-CONTROL.    SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'            ORGANIZATION IS LINE SEQUENTIAL.    SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'                ORGANIZATION IS LINE SEQUENTIAL.     DATA DIVISION.     FILE SECTION.    FD INPUTFILEDUP.    01 INPUTFILEDUPREC.        88 EOFINPUTFILEDUP    VALUE HIGH-VALUES.        02 INPUTFILEID        PIC 9(07).     FD  OUTFILEDUP.    01 OUTFILEDUPREC         PIC 9(07).     WORKING-STORAGE SECTION.    77 WS-VARIABLE            PIC 9(09).    77 REC-NOT-MATCH          PIC 9(01).    77 CUR-VARIABLE           PIC 9(09).     PROCEDURE DIVISION.    BEGIN.    OPEN INPUT  INPUTFILEDUP    OPEN OUTPUT OUTFILEDUP     READ INPUTFILEDUP        AT END SET EOFINPUTFILEDUP  TO TRUE    END-READ    PERFORM UNTIL (EOFINPUTFILEDUP)                 WRITE OUTFILEDUPREC  FROM  INPUTFILEID                READ  INPUTFILEDUP                      AT END SET EOFINPUTFILEDUP TO TRUE                            PERFORM UNTIL (EOFINPUTFILEDUP)   END-READ   END-PERFORM                    CLOSE   INPUTFILEDUP                    CLOSE  OUTFILEDUP   STOP RUN. 

I sorted the tnput file in ascending order as:

8712351,8712353,8712353,8712354,8712356,8712352,8712355,8712352,8712355

And it worked, and below is the modified code:

But suppose if my file is not in either ascending or descending order the where I need to write the sort logic before removing duplicates. How can update the below code for this? As I tried, but I was not successful in doing this if the input file structure is like:

8712351,8712353,8712353,8712354,8712356,8712352,8712355,8712352,8712355

   IDENTIFICATION DIVISION.    PROGRAM-ID.RemoveDup2.    ENVIRONMENT DIVISION.    INPUT-OUTPUT SECTION.    FILE-CONTROL.    SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'            ORGANIZATION IS LINE SEQUENTIAL.    SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'                ORGANIZATION IS LINE SEQUENTIAL.     DATA DIVISION.     FILE SECTION.    FD INPUTFILEDUP.    01 INPUTFILEDUPREC.        88 EOFINPUTFILEDUP    VALUE HIGH-VALUES.        02 INPUTFILEID        PIC 9(07).     FD  OUTFILEDUP.    01 OUTFILEDUPREC         PIC 9(07).     WORKING-STORAGE SECTION.    77 WS-VARIABLE            PIC 9(09) VALUE ZERO.    77 REC-NOT-MATCH          PIC 9(01).    77 CUR-VARIABLE           PIC 9(7) VALUE ZERO.     PROCEDURE DIVISION.    BEGIN.    OPEN INPUT  INPUTFILEDUP    OPEN OUTPUT OUTFILEDUP     READ INPUTFILEDUP        AT END SET EOFINPUTFILEDUP  TO TRUE    END-READ    PERFORM UNTIL (EOFINPUTFILEDUP)         IF INPUTFILEID NOT EQUAL TO  WS-VARIABLE               MOVE  INPUTFILEID TO WS-VARIABLE               WRITE OUTFILEDUPREC  FROM  INPUTFILEID               READ  INPUTFILEDUP                   AT END SET  EOFINPUTFILEDUP TO TRUE               PERFORM UNTIL (EOFINPUTFILEDUP)         ELSE               DISPLAY "dUPLICATE FOUND"   INPUTFILEID     READ INPUTFILEDUP      AT END SET EOFINPUTFILEDUP  TO TRUE     END-READ         END-PERFORM     CLOSE   INPUTFILEDUP    CLOSE  OUTFILEDUP    STOP RUN. 
like image 753
Sanjana Avatar asked Nov 18 '09 18:11

Sanjana


People also ask

Which option can be used to remove duplicate records?

To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates. To highlight unique or duplicate values, use the Conditional Formatting command in the Style group on the Home tab.


2 Answers

Finally it worked.

Here is the code:

   IDENTIFICATION DIVISION.    PROGRAM-ID.RemoveDup2.    ENVIRONMENT DIVISION.    INPUT-OUTPUT SECTION.    FILE-CONTROL.    SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'            ORGANIZATION IS LINE SEQUENTIAL.    SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'                ORGANIZATION IS LINE SEQUENTIAL.    SELECT WorkFile ASSIGN TO "WORK.TMP".     DATA DIVISION.     FILE SECTION.    FD INPUTFILEDUP.    01 INPUTFILEDUPREC.        88 EOFINPUTFILEDUP    VALUE HIGH-VALUES.        02 INPUTFILEID        PIC 9(07).     FD  OUTFILEDUP.    01 OUTFILEDUPREC         PIC 9(07).     SD WorkFile.    01 WORKREC.       02 WINPUTFILEID       PIC 9(07).     WORKING-STORAGE SECTION.    77 WS-VARIABLE            PIC 9(09) VALUE ZERO.    77 REC-NOT-MATCH          PIC 9(01).    77 CUR-VARIABLE           PIC 9(7) VALUE ZERO.     PROCEDURE DIVISION.    BEGIN.        SORT WorkFile ON ASCENDING KEY WINPUTFILEID        USING INPUTFILEDUP GIVING INPUTFILEDUP     OPEN INPUT  INPUTFILEDUP    OPEN OUTPUT OUTFILEDUP         READ INPUTFILEDUP                AT END SET EOFINPUTFILEDUP  TO TRUE    END-READ        PERFORM UNTIL (EOFINPUTFILEDUP)            IF INPUTFILEID NOT EQUAL TO  WS-VARIABLE                    MOVE  INPUTFILEID TO WS-VARIABLE                    WRITE OUTFILEDUPREC  FROM  INPUTFILEID                    READ  INPUTFILEDUP                        AT END SET  EOFINPUTFILEDUP TO TRUE        PERFORM UNTIL (EOFINPUTFILEDUP)            ELSE                    DISPLAY "DUPLICATE FOUND    "   INPUTFILEID     READ INPUTFILEDUP                AT END SET EOFINPUTFILEDUP  TO TRUE    END-READ    END-PERFORM     CLOSE   INPUTFILEDUP    CLOSE  OUTFILEDUP     STOP RUN. 
like image 115
Sanjana Avatar answered Sep 28 '22 01:09

Sanjana


When Organization is Sequential, the record deleted is the last record read. The Delete statement is valid only when the last operation against the file is a successful Read statement. If not, the Delete returns a File Status value of 43. Because a Delete cannot return File Status values beginning with a 2 when the file is Open with Sequential Access, coding Invalid Key on such a Delete is not allowed.

When Dynamic or Random access is selected for the file, the Delete statment, like the Rewrite, becomes a little less restrictive. The record being deleted need not have bene previously read. Simply fill in the primary Key information in the record description for the fle and issue the Delete statement. If the record does not exist, a File Status of 23 is returned and an Invalid Key condition exists.

From page 274 of

Sams Teach Yourself COBOL in 24 Hours

page 274 (which I have just dusted down from off my bookshelf). So in your case you'll presumably set up your records to be sorted by INPUTFILEID, make a record as you go through of occurences of a given INPUTFILEID past its first occurence, and Delete accordingly (after you have written it to your output file).

like image 33
davek Avatar answered Sep 28 '22 01:09

davek