Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking a string contains only ASCII characters

Tags:

ascii

go

Does Go have any method or there is a suggestion how to check if a string contains only ASCII characters? What is the right way to do it?

From my research, one of the solution is to check whatever there is any char greater than 127.

func isASCII(s string) bool {
    for _, c := range s {
        if c > unicode.MaxASCII {
            return false
        }
    }

    return true
}
like image 406
Maxian Nicu Avatar asked Oct 30 '18 16:10

Maxian Nicu


People also ask

How do I check if a string contains only ASCII characters?

str , bytes , and bytearray gained support for the new isascii() method, which can be used to test if a string or bytes contain only the ASCII characters.

How do I check if a string is ASCII?

Check if a string contains only ASCII: str.isascii() returns True if all characters in the string are ASCII characters (U+0000 - U+007F). Symbols such as + and - are also determined as True . Hiragana, etc., which are not ASCII, are determined as False .

How do I find ASCII characters?

Just paste your ASCII text in the input area and you will instantly get the ASCII status in the output area. If the input contains only ASCII characters, you'll get a green badge, otherwise a red badge. Fast, free, and without ads. Import ASCII โ€“ get ASCII status.

Which function is used to check if all characters in a string conform to ASCII?

isascii() will check if the strings is ascii. "\x03". isascii() is also True.


3 Answers

In Go, we care about performance, Therefore, we would benchmark your code:

func isASCII(s string) bool {
    for _, c := range s {
        if c > unicode.MaxASCII {
            return false
        }
    }
    return true
}

BenchmarkRange-4    20000000    82.0 ns/op

A faster (better, more idiomatic) version, which avoids unnecessary rune conversions:

func isASCII(s string) bool {
    for i := 0; i < len(s); i++ {
        if s[i] > unicode.MaxASCII {
            return false
        }
    }
    return true
}

BenchmarkIndex-4    30000000    55.4 ns/op

ascii_test.go:

package main

import (
    "testing"
    "unicode"
)

func isASCIIRange(s string) bool {
    for _, c := range s {
        if c > unicode.MaxASCII {
            return false
        }
    }
    return true
}

func BenchmarkRange(b *testing.B) {
    str := ascii()
    b.ResetTimer()
    for N := 0; N < b.N; N++ {
        is := isASCIIRange(str)
        if !is {
            b.Fatal("notASCII")
        }
    }
}

func isASCIIIndex(s string) bool {
    for i := 0; i < len(s); i++ {
        if s[i] > unicode.MaxASCII {
            return false
        }
    }
    return true
}

func BenchmarkIndex(b *testing.B) {
    str := ascii()
    b.ResetTimer()
    for N := 0; N < b.N; N++ {
        is := isASCIIIndex(str)
        if !is {
            b.Log("notASCII")
        }
    }
}

func ascii() string {
    byt := make([]byte, unicode.MaxASCII+1)
    for i := range byt {
        byt[i] = byte(i)
    }
    return string(byt)
}

Output:

$ go test ascii_test.go -bench=.
BenchmarkRange-4    20000000    82.0 ns/op
BenchmarkIndex-4    30000000    55.4 ns/op
$
like image 147
peterSO Avatar answered Oct 10 '22 02:10

peterSO


Another option:

package main
import "golang.org/x/exp/utf8string"

func main() {
   {
      b := utf8string.NewString("south north").IsASCII()
      println(b) // true
   }
   {
      b := utf8string.NewString("๐Ÿงก๐Ÿ’›๐Ÿ’š๐Ÿ’™๐Ÿ’œ").IsASCII()
      println(b) // false
   }
}

https://pkg.go.dev/golang.org/x/exp/utf8string#String.IsASCII

like image 32
Zombo Avatar answered Oct 10 '22 00:10

Zombo


It looks like your way is best.

ASCII is simply defined as:

ASCII encodes 128 specified characters into seven-bit integers

As such, characters have values 0-27 (or 0-127, 0x0-0x7F).

Go provides no way to check that every rune in a string (or byte in a slice) has numerical values in a specific range, so your code seems to be the best way to do it.

like image 8
maerics Avatar answered Oct 10 '22 01:10

maerics