Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print special characters explicitly in C?

Tags:

c

printf

When I use below code:

#include <stdio.h>

int main(void)
{
    printf("%s","Hello world\nHello world");
    return 0;
}

it prints as:

 Hello world
 Hello world

How can I prevent this and print it as raw string literal in C? I mean it should be displayed as it is in terminal window like below:

Hello world\nHello world

I know I can achieve this by using backslash for printf but is there any other C function or way to do this without backslashing? It would be helpful when reading files.

like image 488
Jessie Avatar asked Apr 06 '15 18:04

Jessie


2 Answers

Thank you the user @chunk for contributing to the improvement this answer.


Why did not you write general-purpose solution? It would keep you from many problems in the future.

char *
str_escape(char str[])
{
    char chr[3];
    char *buffer = malloc(sizeof(char));
    unsigned int len = 1, blk_size;

    while (*str != '\0') {
        blk_size = 2;
        switch (*str) {
            case '\n':
                strcpy(chr, "\\n");
                break;
            case '\t':
                strcpy(chr, "\\t");
                break;
            case '\v':
                strcpy(chr, "\\v");
                break;
            case '\f':
                strcpy(chr, "\\f");
                break;
            case '\a':
                strcpy(chr, "\\a");
                break;
            case '\b':
                strcpy(chr, "\\b");
                break;
            case '\r':
                strcpy(chr, "\\r");
                break;
            default:
                sprintf(chr, "%c", *str);
                blk_size = 1;
                break;
        }
        len += blk_size;
        buffer = realloc(buffer, len * sizeof(char));
        strcat(buffer, chr);
        ++str;
    }
    return buffer;
}

How it work!

int
main(const int argc, const char *argv[])
{
    puts(str_escape("\tAnbms\n"));
    puts(str_escape("\tA\v\fZ\a"));
    puts(str_escape("txt \t\n\r\f\a\v 1 \t\n\r\f\a\v tt"));
    puts(str_escape("dhsjdsdjhs hjd hjds "));
    puts(str_escape(""));
    puts(str_escape("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\f\a\v"));
    puts(str_escape("\x0b\x0c\t\n\r\f\a\v"));
    puts(str_escape("\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14"));
}

Output

\tAnbms\n
\tA\v\fZ\a
txt \t\n\r\f\a\v 1 \t\n\r\f\a\v tt
dhsjdsdjhs hjd hjds 

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ \t\n\r\f\a\v
\v\f\t\n\r\f\a\v
\a\b\t\n\v\f\r

This solution based on an information from the Wikipedia https://en.wikipedia.org/wiki/Escape_sequences_in_C#Table_of_escape_sequences and the answers other users of the stackoverflow.com.


Testing environment

$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.6 (jessie)
Release:    8.6
Codename:   jessie
$ uname -a
Linux localhost 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux
$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
like image 64
PADYMKO Avatar answered Oct 30 '22 21:10

PADYMKO


There is no built-in mechanism to do this. You have to do it manually, character-by-character. However, the functions in ctype.h may help. Specifically, in the "C" locale, the function isprint is guaranteed to be true for all of the graphic characters in the basic execution character set, which is effectively the same as all the graphic characters in 7-bit ASCII, plus space; and it is guaranteed not to be true for all the control characters in 7-bit ASCII, which includes tab, carriage return, etc.

Here is a sketch:

#include <stdio.h>
#include <ctype.h>
#include <locale.h>

int main(void)
{
    int x;
    setlocale(LC_ALL, "C"); // (1)

    while ((x = getchar()) != EOF)
    {
        unsigned int c = (unsigned int)(unsigned char)x; // (2)

        if (isprint(c) && c != '\\')
            putchar(c);
        else
            printf("\\x%02x", c);
    }
    return 0;
}

This does not escape ' nor ", but it does escape \, and it is straightforward to extend that if you need it to.

Printing \n for U+000A, \r for U+000D, etc. is left as an exercise. Dealing with characters outside the basic execution character set (e.g. UTF-8 encoding of U+0080 through U+10FFFF) is also left as an exercise.

This program contains two things which are not necessary with a fully standards-compliant C library, but in my experience have been necessary on real operating systems. They are marked with (1) and (2).

1) This explicitly sets the 'locale' configuration the way it is supposed to be set by default.

2) The value returned from getchar is an int. It is supposed to be either a number in the range representable by unsigned char (normally 0-255 inclusive), or the special value EOF (which is not in the range representable by unsigned char). However, buggy C libraries have been known to return negative numbers for characters with their highest bit set. If that happens, the printf will print (for instance) \xffffffa1 when it should've printed \xa1. Casting x to unsigned char and then back to unsigned int corrects this.

like image 40
zwol Avatar answered Oct 30 '22 22:10

zwol