If a DICOM file does not define a Specific Character Set (0008,0005)
, what character set does it use by default? Is ASCII the default encoding for DICOM files?
TL;DR
A DICOM file contains German ä in one of the tags, but the file does not specify any character set. I assume that in this case the file is allowed to contain only ASCII symbols (the default character set) and report this file as invalid. Before I submit my change, I want to make sure that I understood DICOM correctly.
encoding attribute, Java uses “UTF-8” character encoding by default. Character encoding basically interprets a sequence of bytes into a string of specific characters. The same combination of bytes can denote different characters in different character encoding.
As a content author or developer, you should nowadays always choose the UTF-8 character encoding for your content or data. This Unicode encoding is a good choice because you can use a single character encoding to handle any character you are likely to need.
As specified in the Dicom Data Structures and Encoding
6.1.2.5.4 Levels of Implementation and Initial Designation
a) Attribute Specific Character Set (0008,0005) not present:
7-bit code Implementation level: ISO 2022 Level 1 - Elementary 7-bit code (code-level identifier 1)
Initial designation: ISO-IR 6 (ASCII) as G0. Code Extension shall not be used
Reference:
To add to answer by JonnyQ, DICOM standard also defines mechanisms when confronted with character sets that are unknown to implementations or unsupported (see PS 3.5 section 6.1.2.3). Implementations can print or display such characters by replacing all unknown characters with the four characters "\nnn", where "nnn" is the three digit octal representation of each byte.
An example given in the standard for an ASCII based machine as follows:
Character String: Günther
Encoded representation: 04/07 15/12 06/14 07/04 06/08 06/05 07/02
ASCII based machine: G\374nther
Implementations may also encounter Control Characters which they have no means to print or display. Application may print or display such Control Characters by replacing the Control Character with the four characters “\nnn”, where “nnn” is the three digit octal representation of each byte.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With