Decoding Ebcdic

I'm being passed data that is ebcdic encoded. Something like:

s = u'@@@@@@@@@@@@@@@@@@@ÂÖÉâÅ@ÉÄ'

Attempting to .decode('cp500') is wrong, but what's the correct approach? If I copy the string into something like Notepad++ I can convert it from EBCDIC to ascii, but I can't seem to find a viable approach in python to achieve the same. For what it's worth, the correct result is: BOISE ID (plus or minus space padding).

The information is being retrieved from a file of lines of JSON objects. That file looks like this:

{ "command": "flush-text", "text": "@@@@@O@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@O" }
{ "command": "flush-text", "text": "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\u00C9\u00C4@\u00D5\u00A4\u0094\u0082\u0085\u0099z@@@@@@@@@@\u00D9\u00F5\u00F9\u00F7\u00F6\u00F8\u00F7\u00F2\u00F4" }
{ "command": "flush-text", "text": "@@@@@OmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmO" }
{ "command": "flush-text", "text": "@@@@@O@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@O" }

And the processing loop looks something like:

with open('myfile.txt', 'rb') as fh:
  for line in fh:
    data = json.loads(line)

How do I decode EBCDIC?

How to decrypt EBCDIC cipher? By using the ASCII- EBCDIC equivalent table, any message can be decrypted. Example: 196,195,214,196,197 in EBCDIC becomes 68,67,79,68,69 in ASCII, which corresponds to the letters' DCODE'.

How do you convert EBCDIC to text?

Select the Text Conversion tab. Select the option Allow file text conversion. Type * (an asterisk) in the File extensions for automatic EBCDIC/ASCII text conversion: input area and click on the Add button. Click on the OK button to save the changes.

What is EBCDIC encoding?

EBCDIC, in full extended binary-coded decimal interchange code, data-encoding system, developed by IBM and used mostly on its computers, that uses a unique eight-bit binary code for each number and alphabetic character as well as punctuation marks and accented letters and nonalphabetic characters.

If Notepad++ converts it ok, then you should simply need:

Python 2.7:

with io.open('myfile.txt', 'r', encoding="cp500") as fh:
  for line in fh:
    data = json.loads(line)

Python 3.x:

with open('myfile.txt', 'r', encoding="cp500") as fh:
  for line in fh:
    data = json.loads(line)

This uses a TextWrapper to decode the file as it's read using the given decoding. io module provides Python 3 open to Python 2.x, with codecs/TextWrapper and universal newline support

My guess is that you need the value of the corresponding Unicode ordinals as bytes, and then decode that with cp500.

>>> s = u'@@@@@@@@@@@@@@@@@@@ÂÖÉâÅ@ÉÄ'
>>> bytearray(ord(c) for c in s).decode('cp500')
u'                   BOISE ID'

Alternatively:

>>> s.encode('latin-1').decode('cp500')
u'                   BOISE ID'

Decoding Ebcdic

Tags:

python

character-encoding

ebcdic

g.d.d.c

People also ask

2 Answers

Alastair McCormack

timgeb

Recent Activity

Donate For Us

Decoding Ebcdic

Tags:

python

character-encoding

ebcdic

g.d.d.c

People also ask

2 Answers

Alastair McCormack

timgeb

Related questions

Recent Activity

Donate For Us