I'd like to be able to get a char array of all the printable characters in C#, does anybody know how to do this?
edit:
By printable I mean the visible European characters, so yes, umlauts, tildes, accents etc.
The isprintable() method returns “True” if all characters in the string are printable or the string is empty, Otherwise, It returns “False”. This function is used to check if the argument contains any printable characters such as: Digits ( 0123456789 ) Uppercase letters ( ABCDEFGHIJKLMNOPQRSTUVWXYZ )
Related subjects: Computing hardware and infrastructure. There are 95 printable ASCII characters, numbered 32 to 126. ASCII (American Standard Code for Information Interchange), generally pronounced [ˈæski], is a character encoding based on the English alphabet.
Those characters that occupies printing space are known as printable characters. Printable characters are just the opposite of control characters which can be checked using iscntrl().
As of Unicode version 14.0, there are 144,697 characters with code points, covering 159 modern and historical scripts, as well as multiple symbol sets.
This will give you a list with all characters that are not considered control characters:
List<Char> printableChars = new List<char>();
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
char c = Convert.ToChar(i);
if (!char.IsControl(c))
{
printableChars.Add(c);
}
}
You may want to investigate the other Char.IsXxxx methods to find a combination that suits your requirements.
Here's a LINQ version of Fredrik's solution. Note that Enumerable.Range
yields an IEnumerable<int>
so you have to convert to chars first. Cast<char>
would have worked in 3.5SP0 I believe, but as of 3.5SP1 you have to do a "proper" conversion:
var chars = Enumerable.Range(0, char.MaxValue+1)
.Select(i => (char) i)
.Where(c => !char.IsControl(c))
.ToArray();
I've created the result as an array as that's what the question asked for - it's not necessarily the best idea though. It depends on the use case.
Note that this also doesn't consider full Unicode characters, only those in the basic multilingual plane. I don't know what it returns for high/low surrogates, but it's worth at least knowing that a single char
doesn't really let you represent everything :(
A LINQ solution (based on Fredrik Mörk's):
Enumerable.Range(char.MinValue, char.MaxValue).Select(c => (char)c).Where(
c => !char.IsControl(c)).ToArray();
Use this Regex...
var regex = new Regex(@"[^\p{Cc}^\p{Cn}^\p{Cs}]");
^\p{Cc}
: Do not match control characters.^\p{Cn}
: Do not match unassigned characters.^\p{Cs}
: Do not match UTF-8-invalid characters.I test two strings in this demo: "Hello, World!"
and "Hello, World!" + (char)4
. char(4)
is the character for END TRANSMISSION
.
using System;
using System.Text.RegularExpressions;
public class Test {
public static MatchCollection getPrintableChars(string haystack) {
var regex = new Regex(@"[^\p{Cc}^\p{Cn}^\p{Cs}]");
var matches = regex.Matches(haystack);
return matches;
}
public static void Main() {
var teststring1 = "Hello, World!";
var teststring2 = "Hello, World!" + (char)4;
var teststring1unprintablechars = getPrintableChars(teststring1);
var teststring2unprintablechars = getPrintableChars(teststring2);
Console.WriteLine("Testing a Printable String: " + teststring1unprintablechars.Count + " Printable Chars Detected");
Console.WriteLine("Testing a String With 1-Unprintable Char: " + teststring2unprintablechars.Count + " Printable Chars Detected");
foreach (Match unprintablechar in teststring1unprintablechars) {
Console.WriteLine("String 1 Printable Char:" + unprintablechar);
}
foreach (Match unprintablechar in teststring2unprintablechars) {
Console.WriteLine("String 2 Printable Char:" + unprintablechar);
}
}
}
Full Working Demo at IDEOne.com
\P{C}
: Match only visible characters. Do not match any invisible characters.\P{Cc}
: Match only non-control characters. Do not match any control characters.\P{Cc}\P{Cn}
: Match only non-control characters that have been assigned. Do not match any control or unassigned characters.\P{Cc}\P{Cn}\P{Cs}
: Match only non-control characters that have been assigned and are UTF-8 valid. Do not match any control, unassigned, or UTF-8-invalid characters.\P{Cc}\P{Cn}\P{Cs}\P{Cf}
: Match only non-control, non-formatting characters that have been assigned and are UTF-8 valid. Do not match any control, unassigned, formatting, or UTF-8-invalid characters.Take a look at the Unicode Character Properties available that can be used to test within a regex. You should be able to use these regexes in Microsoft .NET, JavaScript, Python, Java, PHP, Ruby, Perl, Golang, and even Adobe. Knowing Unicode character classes is very transferable knowledge, so I recommend using it!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With