Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get a list of all the printable characters in C#?

Tags:

c#

I'd like to be able to get a char array of all the printable characters in C#, does anybody know how to do this?

edit:

By printable I mean the visible European characters, so yes, umlauts, tildes, accents etc.

like image 930
Phil Bennett Avatar asked May 20 '09 11:05

Phil Bennett


People also ask

How can I check all characters in a string is printable?

The isprintable() method returns “True” if all characters in the string are printable or the string is empty, Otherwise, It returns “False”. This function is used to check if the argument contains any printable characters such as: Digits ( 0123456789 ) Uppercase letters ( ABCDEFGHIJKLMNOPQRSTUVWXYZ )

How many printable characters are there?

Related subjects: Computing hardware and infrastructure. There are 95 printable ASCII characters, numbered 32 to 126. ASCII (American Standard Code for Information Interchange), generally pronounced [ˈæski], is a character encoding based on the English alphabet.

What is a printable character in C?

Those characters that occupies printing space are known as printable characters. Printable characters are just the opposite of control characters which can be checked using iscntrl().

How many printable Unicode characters are there?

As of Unicode version 14.0, there are 144,697 characters with code points, covering 159 modern and historical scripts, as well as multiple symbol sets.


4 Answers

This will give you a list with all characters that are not considered control characters:

List<Char> printableChars = new List<char>();
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
    char c = Convert.ToChar(i);
    if (!char.IsControl(c))
    {
        printableChars.Add(c);
    }
}

You may want to investigate the other Char.IsXxxx methods to find a combination that suits your requirements.

like image 72
Fredrik Mörk Avatar answered Sep 18 '22 13:09

Fredrik Mörk


Here's a LINQ version of Fredrik's solution. Note that Enumerable.Range yields an IEnumerable<int> so you have to convert to chars first. Cast<char> would have worked in 3.5SP0 I believe, but as of 3.5SP1 you have to do a "proper" conversion:

var chars = Enumerable.Range(0, char.MaxValue+1)
                      .Select(i => (char) i)
                      .Where(c => !char.IsControl(c))
                      .ToArray();

I've created the result as an array as that's what the question asked for - it's not necessarily the best idea though. It depends on the use case.

Note that this also doesn't consider full Unicode characters, only those in the basic multilingual plane. I don't know what it returns for high/low surrogates, but it's worth at least knowing that a single char doesn't really let you represent everything :(

like image 25
Jon Skeet Avatar answered Sep 18 '22 13:09

Jon Skeet


A LINQ solution (based on Fredrik Mörk's):

Enumerable.Range(char.MinValue, char.MaxValue).Select(c => (char)c).Where(
    c => !char.IsControl(c)).ToArray();
like image 36
Noldorin Avatar answered Sep 20 '22 13:09

Noldorin


TLDR Answer

Use this Regex...

var regex = new Regex(@"[^\p{Cc}^\p{Cn}^\p{Cs}]");

TLDR Explanation

  • ^\p{Cc} : Do not match control characters.
  • ^\p{Cn} : Do not match unassigned characters.
  • ^\p{Cs} : Do not match UTF-8-invalid characters.

Working Demo

I test two strings in this demo: "Hello, World!" and "Hello, World!" + (char)4. char(4) is the character for END TRANSMISSION.

using System;
using System.Text.RegularExpressions;

public class Test {
    public static MatchCollection getPrintableChars(string haystack) {
        var regex = new Regex(@"[^\p{Cc}^\p{Cn}^\p{Cs}]");
        var matches = regex.Matches(haystack);
        return matches;
    }
    public static void Main() {
        var teststring1 = "Hello, World!";
        var teststring2 = "Hello, World!" + (char)4;
        
        var teststring1unprintablechars = getPrintableChars(teststring1);
        var teststring2unprintablechars = getPrintableChars(teststring2);
        
        Console.WriteLine("Testing a Printable String: " + teststring1unprintablechars.Count + " Printable Chars Detected");
        Console.WriteLine("Testing a String With 1-Unprintable Char: " + teststring2unprintablechars.Count + " Printable Chars Detected");
        
        foreach (Match unprintablechar in teststring1unprintablechars) {
            Console.WriteLine("String 1 Printable Char:" + unprintablechar);
        }
        
        foreach (Match unprintablechar in teststring2unprintablechars) {
            Console.WriteLine("String 2 Printable Char:" + unprintablechar);
        }
    }
}

Full Working Demo at IDEOne.com

Alternatives

  • \P{C} : Match only visible characters. Do not match any invisible characters.
  • \P{Cc} : Match only non-control characters. Do not match any control characters.
  • \P{Cc}\P{Cn} : Match only non-control characters that have been assigned. Do not match any control or unassigned characters.
  • \P{Cc}\P{Cn}\P{Cs} : Match only non-control characters that have been assigned and are UTF-8 valid. Do not match any control, unassigned, or UTF-8-invalid characters.
  • \P{Cc}\P{Cn}\P{Cs}\P{Cf} : Match only non-control, non-formatting characters that have been assigned and are UTF-8 valid. Do not match any control, unassigned, formatting, or UTF-8-invalid characters.

Source and Explanation

Take a look at the Unicode Character Properties available that can be used to test within a regex. You should be able to use these regexes in Microsoft .NET, JavaScript, Python, Java, PHP, Ruby, Perl, Golang, and even Adobe. Knowing Unicode character classes is very transferable knowledge, so I recommend using it!

like image 31
HoldOffHunger Avatar answered Sep 16 '22 13:09

HoldOffHunger