The input file have records as: 8712351,8712353,8712353,8712354,8712356,8712352,8712355 8712352,8712355
Using COBOL, I need to remove duplicates from the above file and write to an output file. I wrote simple logic to read records and write to an output file.
Where do I need to put the logic of removing duplicates (say, 8712353, 8712352) from the above file?
Here is the program logic:
IDENTIFICATION DIVISION. PROGRAM-ID.RemoveDup. ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt' ORGANIZATION IS LINE SEQUENTIAL. SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt' ORGANIZATION IS LINE SEQUENTIAL. DATA DIVISION. FILE SECTION. FD INPUTFILEDUP. 01 INPUTFILEDUPREC. 88 EOFINPUTFILEDUP VALUE HIGH-VALUES. 02 INPUTFILEID PIC 9(07). FD OUTFILEDUP. 01 OUTFILEDUPREC PIC 9(07). WORKING-STORAGE SECTION. 77 WS-VARIABLE PIC 9(09). 77 REC-NOT-MATCH PIC 9(01). 77 CUR-VARIABLE PIC 9(09). PROCEDURE DIVISION. BEGIN. OPEN INPUT INPUTFILEDUP OPEN OUTPUT OUTFILEDUP READ INPUTFILEDUP AT END SET EOFINPUTFILEDUP TO TRUE END-READ PERFORM UNTIL (EOFINPUTFILEDUP) WRITE OUTFILEDUPREC FROM INPUTFILEID READ INPUTFILEDUP AT END SET EOFINPUTFILEDUP TO TRUE PERFORM UNTIL (EOFINPUTFILEDUP) END-READ END-PERFORM CLOSE INPUTFILEDUP CLOSE OUTFILEDUP STOP RUN.
I sorted the tnput file in ascending order as:
8712351,8712353,8712353,8712354,8712356,8712352,8712355,8712352,8712355
And it worked, and below is the modified code:
But suppose if my file is not in either ascending or descending order the where I need to write the sort logic before removing duplicates. How can update the below code for this? As I tried, but I was not successful in doing this if the input file structure is like:
8712351,8712353,8712353,8712354,8712356,8712352,8712355,8712352,8712355
IDENTIFICATION DIVISION. PROGRAM-ID.RemoveDup2. ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt' ORGANIZATION IS LINE SEQUENTIAL. SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt' ORGANIZATION IS LINE SEQUENTIAL. DATA DIVISION. FILE SECTION. FD INPUTFILEDUP. 01 INPUTFILEDUPREC. 88 EOFINPUTFILEDUP VALUE HIGH-VALUES. 02 INPUTFILEID PIC 9(07). FD OUTFILEDUP. 01 OUTFILEDUPREC PIC 9(07). WORKING-STORAGE SECTION. 77 WS-VARIABLE PIC 9(09) VALUE ZERO. 77 REC-NOT-MATCH PIC 9(01). 77 CUR-VARIABLE PIC 9(7) VALUE ZERO. PROCEDURE DIVISION. BEGIN. OPEN INPUT INPUTFILEDUP OPEN OUTPUT OUTFILEDUP READ INPUTFILEDUP AT END SET EOFINPUTFILEDUP TO TRUE END-READ PERFORM UNTIL (EOFINPUTFILEDUP) IF INPUTFILEID NOT EQUAL TO WS-VARIABLE MOVE INPUTFILEID TO WS-VARIABLE WRITE OUTFILEDUPREC FROM INPUTFILEID READ INPUTFILEDUP AT END SET EOFINPUTFILEDUP TO TRUE PERFORM UNTIL (EOFINPUTFILEDUP) ELSE DISPLAY "dUPLICATE FOUND" INPUTFILEID READ INPUTFILEDUP AT END SET EOFINPUTFILEDUP TO TRUE END-READ END-PERFORM CLOSE INPUTFILEDUP CLOSE OUTFILEDUP STOP RUN.
To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates. To highlight unique or duplicate values, use the Conditional Formatting command in the Style group on the Home tab.
Finally it worked.
Here is the code:
IDENTIFICATION DIVISION. PROGRAM-ID.RemoveDup2. ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt' ORGANIZATION IS LINE SEQUENTIAL. SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt' ORGANIZATION IS LINE SEQUENTIAL. SELECT WorkFile ASSIGN TO "WORK.TMP". DATA DIVISION. FILE SECTION. FD INPUTFILEDUP. 01 INPUTFILEDUPREC. 88 EOFINPUTFILEDUP VALUE HIGH-VALUES. 02 INPUTFILEID PIC 9(07). FD OUTFILEDUP. 01 OUTFILEDUPREC PIC 9(07). SD WorkFile. 01 WORKREC. 02 WINPUTFILEID PIC 9(07). WORKING-STORAGE SECTION. 77 WS-VARIABLE PIC 9(09) VALUE ZERO. 77 REC-NOT-MATCH PIC 9(01). 77 CUR-VARIABLE PIC 9(7) VALUE ZERO. PROCEDURE DIVISION. BEGIN. SORT WorkFile ON ASCENDING KEY WINPUTFILEID USING INPUTFILEDUP GIVING INPUTFILEDUP OPEN INPUT INPUTFILEDUP OPEN OUTPUT OUTFILEDUP READ INPUTFILEDUP AT END SET EOFINPUTFILEDUP TO TRUE END-READ PERFORM UNTIL (EOFINPUTFILEDUP) IF INPUTFILEID NOT EQUAL TO WS-VARIABLE MOVE INPUTFILEID TO WS-VARIABLE WRITE OUTFILEDUPREC FROM INPUTFILEID READ INPUTFILEDUP AT END SET EOFINPUTFILEDUP TO TRUE PERFORM UNTIL (EOFINPUTFILEDUP) ELSE DISPLAY "DUPLICATE FOUND " INPUTFILEID READ INPUTFILEDUP AT END SET EOFINPUTFILEDUP TO TRUE END-READ END-PERFORM CLOSE INPUTFILEDUP CLOSE OUTFILEDUP STOP RUN.
When Organization
is Sequential
, the record deleted is the last record read. The Delete
statement is valid only when the last operation against the file is a successful Read
statement. If not, the Delete
returns a File Status
value of 43. Because a Delete
cannot return File Status
values beginning with a 2 when the file is Open
with Sequential
Access, coding Invalid Key
on such a Delete
is not allowed.
When Dynamic
or Random
access is selected for the file, the Delete
statment, like the Rewrite
, becomes a little less restrictive. The record being deleted need not have bene previously read. Simply fill in the primary Key
information in the record description for the fle and issue the Delete
statement. If the record does not exist, a File Status
of 23 is returned and an Invalid Key
condition exists.
From page 274 of
Sams Teach Yourself COBOL in 24 Hours
page 274 (which I have just dusted down from off my bookshelf). So in your case you'll presumably set up your records to be sorted by INPUTFILEID
, make a record as you go through of occurences of a given INPUTFILEID
past its first occurence, and Delete
accordingly (after you have written it to your output file).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With