Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decoding Ebcdic

I'm being passed data that is ebcdic encoded. Something like:

s = u'@@@@@@@@@@@@@@@@@@@ÂÖÉâÅ@ÉÄ'

Attempting to .decode('cp500') is wrong, but what's the correct approach? If I copy the string into something like Notepad++ I can convert it from EBCDIC to ascii, but I can't seem to find a viable approach in python to achieve the same. For what it's worth, the correct result is: BOISE ID (plus or minus space padding).

The information is being retrieved from a file of lines of JSON objects. That file looks like this:

{ "command": "flush-text", "text": "@@@@@O@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@O" }
{ "command": "flush-text", "text": "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\u00C9\u00C4@\u00D5\u00A4\u0094\u0082\u0085\u0099z@@@@@@@@@@\u00D9\u00F5\u00F9\u00F7\u00F6\u00F8\u00F7\u00F2\u00F4" }
{ "command": "flush-text", "text": "@@@@@OmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmO" }
{ "command": "flush-text", "text": "@@@@@O@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@O" }

And the processing loop looks something like:

with open('myfile.txt', 'rb') as fh:
  for line in fh:
    data = json.loads(line)
like image 503
g.d.d.c Avatar asked Jan 31 '16 08:01

g.d.d.c


People also ask

How do I decode EBCDIC?

How to decrypt EBCDIC cipher? By using the ASCII- EBCDIC equivalent table, any message can be decrypted. Example: 196,195,214,196,197 in EBCDIC becomes 68,67,79,68,69 in ASCII, which corresponds to the letters' DCODE'.

How do you convert EBCDIC to text?

Select the Text Conversion tab. Select the option Allow file text conversion. Type * (an asterisk) in the File extensions for automatic EBCDIC/ASCII text conversion: input area and click on the Add button. Click on the OK button to save the changes.

What is EBCDIC encoding?

EBCDIC, in full extended binary-coded decimal interchange code, data-encoding system, developed by IBM and used mostly on its computers, that uses a unique eight-bit binary code for each number and alphabetic character as well as punctuation marks and accented letters and nonalphabetic characters.


2 Answers

If Notepad++ converts it ok, then you should simply need:

Python 2.7:

with io.open('myfile.txt', 'r', encoding="cp500") as fh:
  for line in fh:
    data = json.loads(line)

Python 3.x:

with open('myfile.txt', 'r', encoding="cp500") as fh:
  for line in fh:
    data = json.loads(line)

This uses a TextWrapper to decode the file as it's read using the given decoding. io module provides Python 3 open to Python 2.x, with codecs/TextWrapper and universal newline support

like image 196
Alastair McCormack Avatar answered Oct 17 '22 23:10

Alastair McCormack


My guess is that you need the value of the corresponding Unicode ordinals as bytes, and then decode that with cp500.

>>> s = u'@@@@@@@@@@@@@@@@@@@ÂÖÉâÅ@ÉÄ'
>>> bytearray(ord(c) for c in s).decode('cp500')
u'                   BOISE ID'

Alternatively:

>>> s.encode('latin-1').decode('cp500')
u'                   BOISE ID'
like image 3
timgeb Avatar answered Oct 17 '22 23:10

timgeb