Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parser for signed overpunch values?

I am working with some old data imports and came across a bunch of data from an external source that reports financial numbers with a signed overpunch. I've seen alot, but this is before my time. Before I go about creating a function to parse these strangers, I wanted to check to see if there was a standard way to handle these.

I guess my question is, does the .Net framework provide a standard facility for converting signed overpunch strings? If not .NET, are there any third party tools I can use so I don't reinvent the wheel?

like image 286
Slider345 Avatar asked Nov 15 '14 00:11

Slider345


2 Answers

Over-punched numeric (Zoned-Decimal in Cobol) comes from the old-punched cards where they over-punched the sign on the last digit in a number. The format is commonly used in Cobol.

As there are both Ascii and Ebcdic Cobol compilers, there are both Ascii and EBCDIC versions of the Zoned-Numeric. To make it even more complicated, the -0 and +0 values ({} for US-Ebcdic (IBM037) are different for say German-Ebcdic (IBM273 where they are äü) and different again in other Ebcdic language versions).

To process successfully, You need to know:

  • Did the data originate in a Ebcdic or Ascii system
  • if Ebcdic - which language US, German etc

If the data is in the original character set, you can calculate the sign by

For EBCDIC the numeric hex codes are:

Digit          0     1     2   ..    9

unsigned:   x'F0' x'F1' x'F2'  .. x'F9'     012 .. 9 
Negative:   x'D0' x'D1' x'D2'  .. x'D9'     }JK .. R
Positive:   x'C0' x'C1' x'C2'  .. x'C9'     {AB .. I

For US-Ebcdic Zoned this is the java code to convert a string:

int positiveDiff = 'A' - '1';
int negativeDiff = 'J' - '1';

lastChar = ret.substring(ret.length() - 1).toUpperCase().charAt(0);

    switch (lastChar) {
        case '}' : sign = "-";
        case '{' :
            lastChar = '0';
        break;
        case 'A':
        case 'B':
        case 'C':
        case 'D':
        case 'E':
        case 'F':
        case 'G':
        case 'H':
        case 'I':
            lastChar = (char) (lastChar - positiveDiff);
        break;
        case 'J':
        case 'K':
        case 'L':
        case 'M':
        case 'N':
        case 'O':
        case 'P':
        case 'Q':
        case 'R':
            sign = "-";
            lastChar = (char) (lastChar - negativeDiff);
        default:
    }
    ret = sign + ret.substring(0, ret.length() - 1) + lastChar;

For German-EBCDIC {} become äü, for other EBCDIC-Language you would need lookup the appropriate coded page.

For Ascii Zoned this is the java code

    int positiveFjDiff = '@' - '0';
    int negativeFjDiff = 'P' - '0';

    lastChar = ret.substring(ret.length() - 1).toUpperCase().charAt(0);

    switch (lastChar) {
        case '@':
        case 'A':
        case 'B':
        case 'C':
        case 'D':
        case 'E':
        case 'F':
        case 'G':
        case 'H':
        case 'I':
            lastChar = (char) (lastChar - positiveFjDiff);
        break;
        case 'P':
        case 'Q':
        case 'R':
        case 'S':
        case 'T':
        case 'U':
        case 'V':
        case 'W':
        case 'X':
        case 'Y':
            sign = "-";
            lastChar = (char) (lastChar - negativeFjDiff);
        default:
    }
    ret = sign + ret.substring(0, ret.length() - 1) + lastChar;

Finally if you are working in EBCDIC you can calculate it like

sign = '+'
if (last_digit & x'F0' == x'D0') {
   sign = '-' 
} 
last_digit = last_digit | x'F0'

One last problem is decimal points are not stored in a Zoned, decimal they are assumed. You need to look at the Cobol-Copybook.

e.g.

 if the cobol Copybook is

    03 fld                 pic s99999.

 123 is stored as     0012C (EBCDIC source)

 but if the copybook is (v stands for assumed decimal point) 

   03 fld                  pic s999v99.

 then 123 is stored as 1230{  

It would be best to do the translated in Cobol !!! or using a Cobol Translation packages.

There are several Commercial Packages for handling Cobol Data, they tend to be expensive. There are some Java are some open source packages that can deal with Mainframe Cobol Data.

like image 108
Bruce Martin Avatar answered Oct 20 '22 15:10

Bruce Martin


Presumably in the specification for the file or your program you are told how to deal with this? No?

As Bruce Martin has said, a true Overpunch goes back to the days of punched-cards. You punched the final digit of a number, then re-punched (overpunched) the same position on the card.

The link to the Wiki that you included in your question is fine for that. But I'm pretty sure the source of your data is not punched-cards.

Although part of this answer presumes you are using a Mainframe, the solution proposed is machine-independent.

The source of your data is a Mainframe? We don't know, although it is important information. For the moment, let's assume it is so.

Unless it is very old data which is unchanging, it has been processed on the Mainframe in the last 20 years. Unless the compiler used (assuming it has come from a COBOL program) is very, very old, then you need to know the setting of compiler option NUMPROC. Here's why: http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/igy3pg50/2.4.36?DT=20090820210412

Default is: NUMPROC(NOPFD)

Abbreviations are: None

The compiler accepts any valid sign configuration: X'A', X'B', X'C', X'D', X'E', or X'F'. NUMPROC(NOPFD) is the recommended option in most cases.

NUMPROC(PFD) improves the performance of processing numeric internal decimal and zoned decimal data. Use this option only if your program data agrees exactly with the following IBM system standards:

Zoned decimal, unsigned: High-order 4 bits of the sign byte contain X'F'.

Zoned decimal, signed overpunch: High-order 4 bits of the sign byte contain X'C' if the number is positive or 0, and X'D' if it is not.

Zoned decimal, separate sign: Separate sign contains the character '+' if the number is positive or 0, and '-' if it is not.

Internal decimal, unsigned: Low-order 4 bits of the low-order byte contain X'F'.

Internal decimal, signed: Low-order 4 bits of the low-order byte contain X'C' if the number is positive or 0, and X'D' if it is not.

Data produced by COBOL arithmetic statements conforms to the above IBM system standards. However, using REDEFINES and group moves could change data so that it no longer conforms. If you use NUMPROC(PFD), use the INITIALIZE statement to initialize data fields, rather than using group moves.

Using NUMPROC(PFD) can affect class tests for numeric data. You should use NUMPROC(NOPFD) or NUMPROC(MIG) if a COBOL program calls programs written in PL/I or FORTRAN.

Sign representation is affected not only by the NUMPROC option, but also by the installation-time option NUMCLS.

Use NUMPROC(MIG) to aid in migrating OS/VS COBOL programs to Enterprise COBOL. When NUMPROC(MIG) is in effect, the following processing occurs:

Preferred signs are created only on the output of MOVE statements and arithmetic operations.

No explicit sign repair is done on input.

Some implicit sign repair might occur during conversion.

Numeric comparisons are performed by a decimal comparison, not a logical comparison.

What does that mean to you? If NUMPROC(NOPFD) is being used, you may see X'A' through X'F' in the high-order nybble of the final byte of the field. If NUMPROC(PFD) is being used you shouldn't see anything other that X'C' or X'D' in that position.

Note that if the file you are receiving has been generated by the installed Mainframe SORT product, you have the same potential issue.

may and shouldn't are not good things to see in a specification.

Is your data remotely business-critical in a financial environment? Then you almost certainly have issues of audit and compliance. It runs something like this:

Auditor, "What do you do with the data when you receive it?"
You, "The first thing I do is change it"
Auditor, "Really? How do you verify the data once you have changed it?"
You, "Errr..."

You might get lucky and never have an auditor look at it.

All those non-deterministic words aren't very good for programming.

So how do you get around it?

There should be no fields on the data that you receive which have embedded signs. There should be no numeric fields which are not represented as character data (no binary, packed, or floating-point). If a field is signed, the sign should be presented separately. If a field has decimal places, an actual . or , (depending on home-country of the system) should be provided, or as an alternative a separate field with a scaling-factor.

Is this difficult for your Mainframe people to do? Not remotely. Insist on it. If they will not do it, document it such that problems arising are not yours, but theirs.

If all numeric data presented to you is plain character data (plus, minus, comma, digits 0 to 9) then you will have absolutely no problem in understanding the data, whether it is any variant of EBCDIC or any variant of ASCII.

Be aware that any fields with decimal-places coming from COBOL are exact decimal amounts. Do not store/use them in anything other than fields in your language which can processes exact decimal amounts.

You don't provide any sample data. So here's a sample:

123456{

This should be represented to yous as:

+1234560

If it has two decimal places:

+12345.60
or
+12345602 (where the trailing 2 is a scaling-factor, which you validate)

If numeric data is to be transferred from external systems, it should always be done in character format. It will make everything so much easier to code, understand, maintain, and audit.

like image 20
Bill Woodger Avatar answered Oct 20 '22 15:10

Bill Woodger