Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there 'strings' command for utf-8? [closed]

Tags:

string

bash

utf-8

The linux command strings looks for ASCII strings in a binary file. Are there any command line tools to show UTF-8 strings in a binary file?

like image 787
Naoyoshi Aikawa Avatar asked Jul 04 '13 12:07

Naoyoshi Aikawa


People also ask

What is an invalid UTF-8 string?

This error is created when the uploaded file is not in a UTF-8 format. UTF-8 is the dominant character encoding format on the World Wide Web. This error occurs because the software you are using saves the file in a different type of encoding, such as ISO-8859, instead of UTF-8.

Is UTF-8 a string?

Any ASCII string is a valid UTF-8 string. An ASCII character is simply a byte value in [0,127] or [0x00, 0x7F] in hexadecimal. That is, the most significant bit is always zero. However, there are many more unicode characters than can be represented using a single byte.

What is the difference between UTF-8 and utf16?

UTF-8 encodes a character into a binary string of one, two, three, or four bytes. UTF-16 encodes a Unicode character into a string of either two or four bytes. This distinction is evident from their names. In UTF-8, the smallest binary representation of a character is one byte, or eight bits.

What is a non UTF-8 character?

UTF-8 is unique because it represents characters in one-byte units that contain 8 bits each hence the “-8” suffix. Non-UTF-8 characters are characters that are not supported by UTF-8 encoding and, they may include symbols or characters from foreign unsupported languages.


1 Answers

The strings command supports the --encoding option. Check the man page.

But however, I failed to extract UTF-8 strings using any possible option value. Currently searching their mailing list. will update this if I find more help

like image 155
hek2mgl Avatar answered Oct 10 '22 12:10

hek2mgl