Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get rid of trailing and embedded spaces in a string?

Tags:

cobol

I am writing a program that converts national and international account numbers into IBAN numbers. To start, I need to form a string: Bank ID + Branch ID + Account Number + ISO Country Code without the trailing spaces that may be present in these fields. But not every account number has the same length, some account numbers have branch identifiers while others don't, so I will always end up with trailing spaces from these fields.

My working storage looks something like this:

      01 Input-IBAN.
          05 BANK-ID                    PIC N(10) VALUE "LOYD".
          05 BRANCH-ID                  PIC N(10) VALUE "     ".
          05 ACCOUNT-NR                 PIC N(28) VALUE "012345678912   ". 
          05 COUNTRY-CODE               PIC N(02) VALUE "GB".
      01 Output-IBAN                    PIC N(34).

I've put some values in there for the example; in reality it would depend on the input. The branch code is optional, hence me leaving it empty in the example.

I basically want to go from this input strung together: "LOYD 012345678912 GB"

to this: "LOYD012345678912GB"

Does anyone know a way to do this that does not result in performance issues? I have thought of using the FUNCTION REVERSE and then using an INSPECT for tallying leading spaces. But I've heard that's a slow way to do it. Does anyone have any ideas? And maybe an example on how to use said idea?

EDIT: I've been informed that the elementary fields may contain embedded spaces.

like image 720
Lena Avatar asked Dec 15 '22 07:12

Lena


1 Answers

I see now that you have embedded blanks in the data. Neither answer you have so far works, then. Gilbert's "squeezes out" the embedded blanks, mine would lose any data after the first blank in each field.

However, just to point out, I don't really believe you can have embedded blanks if you are in any way generating an "IBAN". For instance, https://en.wikipedia.org/wiki/International_Bank_Account_Number#Structure, specifically:

The IBAN should not contain spaces when transmitted electronically. When printed it is expressed in groups of four characters separated by a single space, the last group being of variable length

If your source-data has embedded blanks, at the field level, then you need to refer that back up the line for a decision on what to do. Presuming that you receive the correct answer (no embedded blanks at the field level) then both existing answers are back on the table. You amend Gilbert's by (logically) changing LENGTH OF to FUNCTION LENGTH and dealing with any possibility of overflowing the output.

With the STRING you again have to deal with the possibility of overflowing the output.

Original answer based on the assumption of no embedded blanks.


I'll assume you don't have embedded blanks in the elementary items which make up your structure, as they are sourced by standard values which do not contain embedded blanks.

       MOVE SPACE                   TO OUTPUT-IBAN
       STRING                       BANK-ID 
                                    BRANCH-ID 
                                    ACCOUNT-NR 
                                    COUNTRY-CODE 
         DELIMITED                  BY SPACE 
         INTO                       OUTPUT-IBAN 

STRING only copies the values until it runs out of data to copy, so it is necessary to clear the OUTPUT-IBAN before the STRING.

Copying of the data from each source field will end when the first SPACE is encountered in each source field. If a field is entirely space, no data will be copied from it.

STRING will almost certainly cause a run-time routine to be executed and there will be some overhead for that. Gilbert LeBlanc's example may be slightly faster, but with STRING the compiler deals automatically with all the lengths of all the fields. Because you have National fields, ensure you use the figurative-constant SPACE (or SPACES, they are identical) not a literal value which you think contains a space " ". It does, but it doesn't contain a National space.

If the result of the STRING is greater than 34 characters, the excess characters will be quietly truncated. If you want to deal with that, STRING has an ON OVERFLOW phrase, where you specify what you want done in that case. If using ON OVERFLOW, or indeed NOT ON OVERFLOW you should use the END-STRING scope-terminator. A full-stop/period will terminate the STRING statement as well, but when used like that it can never, with ON/NOT ON, be used within a conditional statement of any type.

Don't use full-stops/periods to terminate scopes.

COBOL doesn't have "strings". You cannot get rid of trailing spaces in fixed-length fields, unless the data fills the field. Your output IBAN will always contain trailing spaces when the data is short.


If you were to actually have embedded blanks at the field level:

Firstly, if you want to "squeeze out" embedded blanks so that they don't appear in the output, I can't think of a simpler way (using COBOL) than Gilbert's.

Otherwise, if you want to preserve embedded blanks, you have no reasonable choice other than to count the trailing blanks so that you can calculate the length of the actual data in each field.

COBOL implementations do have Language Extensions. It is unclear which COBOL compiler you are using. If it happens to be AcuCOBOL (now from Micro Focus) then INSPECT supports TRAILING, and you can count trailing blanks that way. GnuCOBOL also supports TRAILING on INSPECT and in addition has a useful intrinsic FUNCTION, TRIM, which you could use to do exactly what you want (trimming trailing blanks) in a STRING statement.

       move space                   to your-output-field
       string function 
               trim 
                ( your-first-national-source 
                  trailing )
              function 
               trim 
                ( your-second-national-source 
                  trailing )
              function 
               trim 
                ( your-third-national-source 
                  trailing )
              ...
         delimited                  by size
         into                       your-output-field

Note that other than the PIC N in your definitions, the code is the same as if using alphanumeric fields.

However, for Standard COBOL 85 code...

You mentioned using FUNCTION REVERSE followed by INSPECT. INSPECT can count leading spaces, but not, by Standard, trailing spaces. So you can reverse the bytes in a field, and then count the leading spaces.

You have National data (PIC N). A difference with that is that it is not bytes you need to count, but characters, which are made up of two bytes. Since the compiler knows you are using PIC N fields, there is only one thing to trip you - the Special Register, LENGTH OF, counts bytes, you need FUNCTION LENGTH to count characters.

National data is UTF-16. Which happens to mean the two bytes for each character happen to be "ASCII", when one of the bytes happens to represent a displayable character. That doesn't matter either, running on z/OS, an EBCDIC machine, as the compiler will do necessary conversions automatically for literals or alpha-numeric data-items.

       MOVE ZERO                    TO a-count-for-each-field 
       INSPECT FUNCTION 
                REVERSE 
                 ( each-source-field )
         TALLYING                   a-count-for-each-field 
          FOR LEADING               SPACE 

After doing one of those for each field, you could use reference-modification.

How to use reference-modification for this?

Firstly, you have to be careful. Secondly you don't.

Secondly first:

MOVE SPACE                   TO output-field
STRING field-1 ( 1 : length-1 )
       field-2 ( 1 : length-2 )
  DELIMITED BY               SIZE
  INTO                       output-field

Again deal with overflow if possible/necessary.

It is also possible with plain MOVEs and reference-modification, as in this answer, https://stackoverflow.com/a/31941665/1927206, whose question is close to a duplicate of your question.

Why do you have to be careful? Again, from the answer linked previously, theoretically a reference-modification can't have a zero length.

In practice, it will probably work. COBOL programmers generally seem to be so keen on reference-modification that they don't bother to read about it fully, so don't worry about a zero-length not being Standard, and don't notice that it is non-Standard, because it "works". For now. Until the compiler changes.

If you are using Enterprise COBOL V5.2 or above (possibly V5.1 as well, I just haven't checked) then you can be sure, by compiler option, if you want, that a zero-length reference-modification works as expected.

Some other ways to achieve your task, if embedded blanks can exist and can be significant in the output, are covered in that answer. With National, just always watch to use FUNCTION LENGTH (which counts characters), not LENGTH OF (which counts bytes). Usually LENGTH OF and FUNCTION LENGTH give the same answer. For multi-byte characters, they do not.

like image 96
Bill Woodger Avatar answered Mar 08 '23 11:03

Bill Woodger