Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default DICOM encoding without Specific Character Set

If a DICOM file does not define a Specific Character Set (0008,0005), what character set does it use by default? Is ASCII the default encoding for DICOM files?

TL;DR

A DICOM file contains German ä in one of the tags, but the file does not specify any character set. I assume that in this case the file is allowed to contain only ASCII symbols (the default character set) and report this file as invalid. Before I submit my change, I want to make sure that I understood DICOM correctly.

like image 784
Pavlo Dyban Avatar asked Dec 16 '14 14:12

Pavlo Dyban


People also ask

What is the default character encoding?

encoding attribute, Java uses “UTF-8” character encoding by default. Character encoding basically interprets a sequence of bytes into a string of specific characters. The same combination of bytes can denote different characters in different character encoding.

Which character encoding is best?

As a content author or developer, you should nowadays always choose the UTF-8 character encoding for your content or data. This Unicode encoding is a good choice because you can use a single character encoding to handle any character you are likely to need.


2 Answers

As specified in the Dicom Data Structures and Encoding

6.1.2.5.4 Levels of Implementation and Initial Designation

a) Attribute Specific Character Set (0008,0005) not present:

7-bit code Implementation level: ISO 2022 Level 1 - Elementary 7-bit code (code-level identifier 1)

Initial designation: ISO-IR 6 (ASCII) as G0. Code Extension shall not be used

Reference:

  • http://dicom.nema.org/medical/dicom/current/output/chtml/part05/chapter_6.html#sect_6.1.2.5.4
like image 90
JohnnyQ Avatar answered Sep 21 '22 00:09

JohnnyQ


To add to answer by JonnyQ, DICOM standard also defines mechanisms when confronted with character sets that are unknown to implementations or unsupported (see PS 3.5 section 6.1.2.3). Implementations can print or display such characters by replacing all unknown characters with the four characters "\nnn", where "nnn" is the three digit octal representation of each byte.

An example given in the standard for an ASCII based machine as follows:

Character String: Günther

Encoded representation: 04/07 15/12 06/14 07/04 06/08 06/05 07/02

ASCII based machine: G\374nther

Implementations may also encounter Control Characters which they have no means to print or display. Application may print or display such Control Characters by replacing the Control Character with the four characters “\nnn”, where “nnn” is the three digit octal representation of each byte.

like image 25
LEADTOOLS Support Avatar answered Sep 21 '22 00:09

LEADTOOLS Support