Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrent access to maps with 'range' in Go

The "Go maps in action" entry in the Go blog states:

Maps are not safe for concurrent use: it's not defined what happens when you read and write to them simultaneously. If you need to read from and write to a map from concurrently executing goroutines, the accesses must be mediated by some kind of synchronization mechanism. One common way to protect maps is with sync.RWMutex.

However, one common way to access maps is to iterate over them with the range keyword. It is not clear if for the purposes of concurrent access, execution inside a range loop is a "read", or just the "turnover" phase of that loop. For example, the following code may or may not run afoul of the "no concurrent r/w on maps" rule, depending on the specific semantics / implementation of the range operation:

 var testMap map[int]int
 testMapLock := make(chan bool, 1)
 testMapLock <- true
 testMapSequence := 0

...

 func WriteTestMap(k, v int) {
    <-testMapLock
    testMap[k] = v
    testMapSequence++
    testMapLock<-true
 }

 func IterateMapKeys(iteratorChannel chan int) error {
    <-testMapLock
    defer func() { 
       testMapLock <- true
    }
    mySeq := testMapSequence
    for k, _ := range testMap {
       testMapLock <- true
       iteratorChannel <- k
       <-testMapLock
       if mySeq != testMapSequence {
           close(iteratorChannel)
           return errors.New("concurrent modification")
       }
    }
    return nil
 }

The idea here is that the range "iterator" is open when the second function is waiting for a consumer to take the next value, and the writer is not blocked at that time. However, it is never the case that two reads in a single iterator are on either side of a write - this is a "fail fast" iterator, the borrow a Java term.

Is there anything anywhere in the language specification or other documents that indicates if this is a legitimate thing to do, however? I could see it going either way, and the above quoted document is not clear on exactly what consititutes a "read". The documentation seems totally quiet on the concurrency aspects of the for/range statement.

(Please note this question is about the currency of for/range, but not a duplicate of: Golang concurrent map access with range - the use case is completely different and I am asking about the precise locking requirement wrt the 'range' keyword here!)

like image 249
BadZen Avatar asked Nov 05 '16 20:11

BadZen


People also ask

Are Golang maps concurrent safe?

It's important to note that maps in go are not safe for concurrent use. Maps are not safe for concurrent use: it's not defined what happens when you read and write to them simultaneously.

How does Go handle concurrency?

In Go, concurrency works through the use of in-built functions known as Goroutines. Goroutines are functions, unique to Go, that run at the same time alongside other code or programs.

How do I copy a map in Golang?

Maps in Go are reference types, so to deep copy the contents of a map, you cannot assign one instance to another. You can do this by creating a new, empty map and then iterating over the old map in a for range loop to assign the appropriate key-value pairs to the new map.


2 Answers

You are using a for statement with a range expression. Quoting from Spec: For statements:

The range expression is evaluated once before beginning the loop, with one exception: if the range expression is an array or a pointer to an array and at most one iteration variable is present, only the range expression's length is evaluated; if that length is constant, by definition the range expression itself will not be evaluated.

We're ranging over a map, so it's not an exception: the range expression is evaluated only once before beginning the loop. The range expression is simply a map variable testMap:

for k, _ := range testMap {}

The map value does not include the key-value pairs, it only points to a data structure that does. Why is this important? Because the map value is only evaluated once, and if later pairs are added to the map, the map value –evaluated once before the loop– will be a map that still points to a data structure that includes those new pairs. This is in contrast to ranging over a slice (which would be evaluated once too), which is also only a header pointing to a backing array holding the elements; but if elements are added to the slice during the iteration, even if that does not result in allocating and copying over to a new backing array, they will not be included in the iteration (because the slice header also contains the length - already evaluated). Appending elements to a slice may result in a new slice value, but adding pairs to a map will not result in a new map value.

Now on to iteration:

for k, v := range testMap {
    t1 := time.Now()
    someFunction()
    t2 := time.Now()
}

Before we enter into the block, before the t1 := time.Now() line k and v variables are holding the values of the iteration, they are already read out from the map (else they couldn't hold the values). Question: do you think the map is read by the for ... range statement between t1 and t2? Under what circumstances could that happen? We have here a single goroutine that is executing someFunc(). To be able to access the map by the for statement, that would either require another goroutine, or it would require to suspend someFunc(). Obviously neither of those happen. (The for ... range construct is not a multi-goroutine monster.) No matter how many iterations there are, while someFunc() is executed, the map is not accessed by the for statement.

So to answer one of your questions: the map is not accessed inside the for block when executing an iteration, but it is accessed when the k and v values are set (assigned) for the next iteration. This implies that the following iteration over the map is safe for concurrent access:

var (
    testMap         = make(map[int]int)
    testMapLock     = &sync.RWMutex{}
)

func IterateMapKeys(iteratorChannel chan int) error {
    testMapLock.RLock()
    defer testMapLock.RUnlock()
    for k, v := range testMap {
        testMapLock.RUnlock()
        someFunc()
        testMapLock.RLock()
        if someCond {
            return someErr
        }
    }
    return nil
}

Note that unlocking in IterateMapKeys() should (must) happen as a deferred statement, as in your original code you may return "early" with an error, in which case you didn't unlock, which means the map remained locked! (Here modeled by if someCond {...}).

Also note that this type of locking only ensures locking in case of concurrent access. It does not prevent a concurrent goroutine to modify (e.g. add a new pair) the map. The modification (if properly guarded with write lock) will be safe, and the loop may continue, but there is no guarantee that the for loop will iterate over the new pair:

If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next.

The write-lock-guarded modification may look like this:

func WriteTestMap(k, v int) {
    testMapLock.Lock()
    defer testMapLock.Unlock()
    testMap[k] = v
}

Now if you release the read lock in the block of the for, a concurrent goroutine is free to grab the write lock and make modifications to the map. In your code:

testMapLock <- true
iteratorChannel <- k
<-testMapLock

When sending k on the iteratorChannel, a concurrent goroutine may modify the map. This is not just an "unlucky" scenario, sending a value on a channel is often a "blocking" operation, if the channel's buffer is full, another goroutine must be ready to receive in order for the send operation to proceed. Sending a value on a channel is a good scheduling point for the runtime to run other goroutines even on the same OS thread, not to mention if there are multiple OS threads, of which one may already be "waiting" for the write lock in order to carry out a map modification.

To sum the last part: you releasing the read lock inside the for block is like yelling to others: "Come, modify the map now if you dare!" Consequently in your code encountering that mySeq != testMapSequence is very likely. See this runnable example to demonstrate it (it's a variation of your example):

package main

import (
    "fmt"
    "math/rand"
    "sync"
)

var (
    testMap         = make(map[int]int)
    testMapLock     = &sync.RWMutex{}
    testMapSequence int
)

func main() {
    go func() {
        for {
            k := rand.Intn(10000)
            WriteTestMap(k, 1)
        }
    }()

    ic := make(chan int)
    go func() {
        for _ = range ic {
        }
    }()

    for {
        if err := IterateMapKeys(ic); err != nil {
            fmt.Println(err)
        }
    }
}

func WriteTestMap(k, v int) {
    testMapLock.Lock()
    defer testMapLock.Unlock()
    testMap[k] = v
    testMapSequence++
}

func IterateMapKeys(iteratorChannel chan int) error {
    testMapLock.RLock()
    defer testMapLock.RUnlock()
    mySeq := testMapSequence
    for k, _ := range testMap {
        testMapLock.RUnlock()
        iteratorChannel <- k
        testMapLock.RLock()
        if mySeq != testMapSequence {
            //close(iteratorChannel)
            return fmt.Errorf("concurrent modification %d", testMapSequence)
        }
    }
    return nil
}

Example output:

concurrent modification 24
concurrent modification 41
concurrent modification 463
concurrent modification 477
concurrent modification 482
concurrent modification 496
concurrent modification 508
concurrent modification 521
concurrent modification 525
concurrent modification 535
concurrent modification 541
concurrent modification 555
concurrent modification 561
concurrent modification 565
concurrent modification 570
concurrent modification 577
concurrent modification 591
concurrent modification 593

We're encountering concurrent modification quite often!

Do you want to avoid this kind of concurrent modification? The solution is quite simple: don't release the read lock inside the for. Also run your app with the -race option to detect race conditions: go run -race testmap.go

Final thoughts

The language spec clearly allows you to modify the map in the same goroutine while ranging over it, this is what the previous quote relates to ("If map entries that have not yet been reached are removed during iteration.... If map entries are created during iteration..."). Modifying the map in the same goroutine is allowed and is safe, but how it is handled by the iterator logic is not defined.

If the map is modified in another goroutine, if you use proper synchronization, The Go Memory Model guarantees that the goroutine with the for ... range will observe all modifications, and the iterator logic will see it just as if "its own" goroutine would have modified it – which is allowed as stated before.

like image 185
icza Avatar answered Nov 10 '22 08:11

icza


The unit of concurrent access for a for range loop over a map is the map. Go maps in action.

A map is a dynamic data structure that changes for inserts, updates and deletes. Inside the Map Implementation. For example,

The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next. If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next. If the map is nil, the number of iterations is 0. For statements, The Go Programming Language Specification

Reading a map with a for range loop with interleaved inserts, updates and deletes is unlikely to be useful.

Lock the map:

package main

import (
    "sync"
)

var racer map[int]int

var race sync.RWMutex

func Reader() {
    race.RLock() // Lock map
    for k, v := range racer {
        _, _ = k, v
    }
    race.RUnlock()
}

func Write() {
    for i := 0; i < 1e6; i++ {
        race.Lock()
        racer[i/2] = i
        race.Unlock()
    }
}

func main() {
    racer = make(map[int]int)
    Write()
    go Write()
    Reader()
}

Don't lock after the read -- fatal error: concurrent map iteration and map write:

package main

import (
    "sync"
)

var racer map[int]int

var race sync.RWMutex

func Reader() {
    for k, v := range racer {
        race.RLock() // Lock after read
        _, _ = k, v
        race.RUnlock()
    }
}

func Write() {
    for i := 0; i < 1e6; i++ {
        race.Lock()
        racer[i/2] = i
        race.Unlock()
    }
}

func main() {
    racer = make(map[int]int)
    Write()
    go Write()
    Reader()
}

Use the Go Data Race Detector. Read Introducing the Go Race Detector.

like image 28
peterSO Avatar answered Nov 10 '22 08:11

peterSO