I'm new to D programming language, just started reading The D Programming Language book.
I run into error when trying one associative array example code
#!/usr/bin/rdmd
import std.stdio, std.string;
void main() {
uint[string] dict;
foreach (line; stdin.byLine()) {
foreach (word; splitter(strip(line))) {
if (word in dict) continue;
auto newId = dict.length;
dict[word] = newId;
writeln(newId, '\t', word);
}
}
}
DMD shows this Error message:
./vocab.d(11): Error: associative arrays can only be assigned values with immutable keys, not char[]
I'm using DMD compile 2.051
I was guessing the rules for associative arrays has changed since the TDPL book.
How should I use Associative arrays with string keys?
Thanks.
Update:
I found the solution in later parts of the book.
use string.idup to make a duplicate immutable value before putting into the array.
so
dict[word.idup] = newId;
would do the job.
But is that efficient ?
An associative array data type is a data type used to represent a generalized array with no predefined cardinality. Associative arrays contain an ordered set of zero or more elements of the same data type, where each element is ordered by and can be referenced by an index value.
An associative array is an array with string keys rather than numeric keys. Associative arrays are dynamic objects that the user redefines as needed. When you assign values to keys in a variable of type Array, the array is transformed into an object, and it loses the attributes and methods of Array.
Associative array values cannot be stored in table columns.
JavaScript does not support associative arrays. You should use objects when you want the element names to be strings (text). You should use arrays when you want the element names to be numbers.
Associative arrays require that their keys be immutable. It makes sense when you think about the fact that if it's not immutable, then it might change, which means that its hash changes, which means that when you go to get the value out again, the computer won't find it. And if you go to replace it, you'll end up with another value added to the associative array (so, you'll have one with the correct hash and one with an incorrect hash). However, if the key is immutable, it cannot change, and so there is no such problem.
Prior to dmd 2.051, the example worked (which was a bug). It has now been fixed though, so the example in TDPL is no longer correct. However, it's not so much the case that the rules for associative arrays have changed as that there was a bug in them which was not caught. The example compiled when it shouldn't have, and Andrei missed it. It's listed in the official errata for TDPL and should be fixed in future printings.
The corrected code should use either dictionary[word.idup]
or dictionary[to!string(word)]
. word.idup
creates a duplicate of word
which is immutable. to!string(word)
, on the other hand converts word
to a string
in the most appropriate manner. As word
is a char[]
in this case, that would be to use idup
. However, if word
were already a string
, then it would simply return the value which was passed in and not needlessly copy it. So, in the general case, to!string(word)
is the better choice (particularly in templated functions), but in this case, either works just fine (to!()
is in std.conv
).
It is technically possible to cast a char[]
to a string
, but it's generally a bad idea. If you know that the char[]
will never change, then you can get away with it, but in the general case, you're risking problems, since the compiler will then assume that the resulting string
can never change, and it could generate code which is incorrect. It may even segfault. So, don't do it unless profiling shows that you really need the extra efficiency of avoiding the copy, you can't otherwise avoid the copy by doing something like just using a string
in the first place (so no conversion would be necessary), and you know that the string
will never be changed.
In general, I wouldn't worry too much of the efficiency of copying strings. Generally, you should be using string
instead of char[]
, so you can copy them around (that is copy their reference around (e.g. str1 = str2;
) rather than copying their entire contents like dup
and idup
do) without worrying about it being particularly inefficient. The problem with the example is that stdin.byLine()
returns a char[]
rather than a string
(presumably to avoid copying the data if its not necessary). So, splitter()
returns a char[]
, and so word
is a char[]
instead of a string
. Now, you could do splitter(strip(line.idup))
or splitter(strip(line).idup)
instead of idup
ing the key. That way, splitter()
would return a string
rather than char[]
, but that's probably essentially just as efficient as idup
ing word
. Regardless, because of where the text is coming from originally, it's a char[]
instead of a string
, which forces you to idup
it somewhere along the line if you intend to use it as a key in an associative array. In the general case, however, it's better to just use string
and not char[]
. Then you don't need to idup
anything.
EDIT:
Actually, even if you find a situation where casting from char[]
to string
seems both safe and necessary, consider using std.exception.assumeUnique()
(documentation). It's essentially the preferred way of converting a mutable array to an immutable one when you need to and know that you can. It would typically be done in cases where you've constructed an array which you couldn't make immutable because you had to do it in pieces but which has no other references, and you don't want to create a deep copy of it. It wouldn't be useful in situations like the example that you're asking about though, since you really do need to copy the array.
No, it's not efficient, since it obviously duplicates the string. If you can guarantee that the string you create will never be modified in memory, feel free to explicitly use a cast cast(immutable)str
on it, instead of duplicating it.
(Although, I've noticed that the garbage collector works well, so I suggest you don't actually try that unless you see a bottleneck, since you might decide to change the string later. Just place a comment in your code to help you find the bottleneck later, if it exists.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With