Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ASCII non readable characters 28, 29 31

I am processing a file which I need to split based on the separator.

The following code shows the separators defined for the files I am processing

private static final String    component   = Character.toString((char) 31);
private static final String    data        = Character.toString((char) 29);
private static final String    segment     = Character.toString((char) 28);

Can someone please explain the significance of these specific separators?

Looking at the ASCII codes, these separators are file, group and unit separators. I don't really understand what this means.

like image 257
ziggy Avatar asked Feb 26 '11 17:02

ziggy


People also ask

What ASCII 31?

ASCII 0-31 are called "control" characters so it should come as no surprise that you could type them using the Control key. Unit Separator is Control-_ (underscore) and Record Separator is Control-^ (caret), for instance. Most modern text editors won't pass through every control character.

How do I type non ascii characters?

This is easily done on a Windows platform: type the decimal ascii code (on the numeric keypad only) while holding down the ALT key, and the corresponding character is entered. For example, Alt-132 gives you a lowercase "a" with an umlaut.


2 Answers

Found this here. Cool website!

28 – FS – File separator The file separator FS is an interesting control code, as it gives us insight in the way that computer technology was organized in the sixties. We are now used to random access media like RAM and magnetic disks, but when the ASCII standard was defined, most data was serial. I am not only talking about serial communications, but also about serial storage like punch cards, paper tape and magnetic tapes. In such a situation it is clearly efficient to have a single control code to signal the separation of two files. The FS was defined for this purpose.

29 – GS – Group separator Data storage was one of the main reasons for some control codes to get in the ASCII definition. Databases are most of the time setup with tables, containing records. All records in one table have the same type, but records of different tables can be different. The group separator GS is defined to separate tables in a serial data storage system. Note that the word table wasn't used at that moment and the ASCII people called it a group.

30 – RS – Record separator Within a group (or table) the records are separated with RS or record separator.

31 – US – Unit separator The smallest data items to be stored in a database are called units in the ASCII definition. We would call them field now. The unit separator separates these fields in a serial data storage environment. Most current database implementations require that fields of most types have a fixed length. Enough space in the record is allocated to store the largest possible member of each field, even if this is not necessary in most cases. This costs a large amount of space in many situations. The US control code allows all fields to have a variable length. If data storage space is limited—as in the sixties—this is a good way to preserve valuable space. On the other hand is serial storage far less efficient than the table driven RAM and disk implementations of modern times. I can't imagine a situation where modern SQL databases are run with the data stored on paper tape or magnetic reels...

like image 126
geaw35 Avatar answered Sep 18 '22 14:09

geaw35


The ascii control characters range from 28-31. (0x1C to 0x1F)

31 Unit Separator
30 Record Separator
29 Group Separator
28 File Separator

Sample invocation:

char record_separator = 0x1F;
String s = "hello" + record_separator + "world"
like image 43
Balaji Boggaram Ramanarayan Avatar answered Sep 20 '22 14:09

Balaji Boggaram Ramanarayan