Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case insensitive string search in golang

How do I search through a file for a word in a case insensitive manner?

For example

If I'm searching for UpdaTe in the file, if the file contains update, the search should pick it and count it as a match.

like image 649
user3841581 Avatar asked Jul 19 '14 02:07

user3841581


People also ask

How do you make a case insensitive in Golang?

In Golang string are UTF-8 encoded. strings package of GO provides an EqualFold method that can be used to do case insensitive comparison of two strings in Go. Below is the signature of the function. The methods return boolean indicating whether the two strings supplied are case insensitive equal or not.

Is Go Language case sensitive?

The Go Language is case sensitive.

What is string literal in Golang?

String literals A string literal represents a string constant obtained from concatenating a sequence of characters. There are two forms: raw string literals and interpreted string literals. Raw string literals are character sequences between back quotes, as in `foo` .


4 Answers

strings.EqualFold() can check if two strings are equal, while ignoring case. It even works with Unicode. See http://golang.org/pkg/strings/#EqualFold for more info.

http://play.golang.org/p/KDdIi8c3Ar

package main

import (
    "fmt"
    "strings"
)

func main() {
    fmt.Println(strings.EqualFold("HELLO", "hello"))
    fmt.Println(strings.EqualFold("ÑOÑO", "ñoño"))
}

Both return true.

like image 93
425nesp Avatar answered Oct 20 '22 11:10

425nesp


Presumably the important part of your question is the search, not the part about reading from a file, so I'll just answer that part.

Probably the simplest way to do this is to convert both strings (the one you're searching through and the one that you're searching for) to all upper case or all lower case, and then search. For example:

func CaseInsensitiveContains(s, substr string) bool {
    s, substr = strings.ToUpper(s), strings.ToUpper(substr)
    return strings.Contains(s, substr)
}

You can see it in action here.

like image 32
joshlf Avatar answered Oct 20 '22 11:10

joshlf


Do not use strings.Contains unless you need exact matching rather than language-correct string searches

None of the current answers are correct unless you are only searching ASCII characters the minority of languages (like english) without certain diaeresis / umlauts or other unicode glyph modifiers (the more "correct" way to define it as mentioned by @snap). The standard google phrase is "searching non-ASCII characters".

For proper support for language searching you need to use http://golang.org/x/text/search.

func SearchForString(str string, substr string) (int, int) {
    m := search.New(language.English, search.IgnoreCase)
    return = m.IndexString(str, substr)
}

start, end := SearchForString('foobar', 'bar');
if start != -1 && end != -1 {
    fmt.Println("found at", start, end);
}

Or if you just want the starting index:

func SearchForStringIndex(str string, substr string) (int, bool) {
    m := search.New(language.English, search.IgnoreCase)
    start, _ := m.IndexString(str, substr)
    if start == -1 {
        return 0, false
    }
    return start, true
}

index, found := SearchForStringIndex('foobar', 'bar');
if found {
    fmt.Println("match starts at", index);
}

Search the language.Tag structs here to find the language you wish to search with or use language.Und if you are not sure.

Update

There seems to be some confusion so this following example should help clarify things.

package main

import (
    "fmt"
    "strings"

    "golang.org/x/text/language"
    "golang.org/x/text/search"
)

var s = `Æ`
var s2 = `Ä`

func main() {
    m := search.New(language.Finnish, search.IgnoreDiacritics)
    fmt.Println(m.IndexString(s, s2))
    fmt.Println(CaseInsensitiveContains(s, s2))
}

// CaseInsensitiveContains in string
func CaseInsensitiveContains(s, substr string) bool {
    s, substr = strings.ToUpper(s), strings.ToUpper(substr)
    return strings.Contains(s, substr)
}
like image 13
Xeoncross Avatar answered Oct 20 '22 10:10

Xeoncross


If your file is large, you can use regexp and bufio:

//create a regex `(?i)update` will match string contains "update" case insensitive
reg := regexp.MustCompile("(?i)update")
f, err := os.Open("test.txt")
if err != nil {
    log.Fatal(err)
}
defer f.Close()

//Do the match operation
//MatchReader function will scan entire file byte by byte until find the match
//use bufio here avoid load enter file into memory
println(reg.MatchReader(bufio.NewReader(f)))

About bufio

The bufio package implements a buffered reader that may be useful both for its efficiency with many small reads and because of the additional reading methods it provides.

like image 9
chendesheng Avatar answered Oct 20 '22 10:10

chendesheng