Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a string character by character as a range in D?

Tags:

d

How to read a line as a range in D?

I know there is ranges in D, but I just wondered how to simply iterate over each character of a string using this concept?

To show what I'm after, the similar code in Go is:

for _, someChar := range someString {
    // Do something
}
like image 805
Samuel Lampa Avatar asked May 16 '13 14:05

Samuel Lampa


2 Answers

That would depend on whether you want to iterate over code units or code points. The language itself iterates over arrays by array elements, and strings are arrays of code units, so if you simply use foreach with type inference, then with

foreach(c; "La Verité")
    writeln(c);

the last two characters printed would be gibberish, because é is a code point made up of two UTF-8 code units, and you're printing out individual code units (since char is a UTF-8 code unit). Whereas, if you do

foreach(dchar c; "La Verité")
    writeln(c);

then the runtime will decode the code units to code points, and é will be printed as the last character. But none of this is really operating on strings as ranges. foreach operates on arrays natively without having to use the input range API. However, for all string types, the range API looks like

@property bool empty();
@property dchar front();
void popFront();

It operates on strings as ranges of dchar - not their code unit type. This avoids issues with functions like std.algorithm.filter operating on individual code units, since that would make no sense. Operating on code points isn't 100% correct either, since Unicode gets very complicated with regards to combining code points and graphemes and whatnot, but operating on code points is far closer to being correct (and I believe there's work being done on adding range support for graphemes into the standard library for the cases where you need that and are willing to pay the performance hit). So, having the range API for strings operate on them as ranges of dchar is far more correct, and if you did something like

foreach(c; filter!"true"("La Verité"))
    writeln(c);

you would be iterating over dchar, and é would print correctly. The downside to all of this of course is the fact that foreach on strings operates on the code unit level by default whereas the range API for strings operate on them as code points, so you have to be careful when mixing array operations and range-based operations on strings. That's also why string and wstring are not considered random-access ranges - just bidirectional ranges. You can't do random access in O(1) on code points when they're made up of varying numbers of code units (whereas dstring is a random-access range, because with UTF-32, every code unit is a code point).

like image 112
Jonathan M Davis Avatar answered Nov 04 '22 00:11

Jonathan M Davis


foreach(ch; str)
    do_something(ch);

A string is an InputRange. An InputRange implements three things:

  • empty; is it empty?
  • front; give me the next item.
  • popFront; advance the range, otherwise front will return the same.

foreach "understands" how to work with ranges, so it "just works".

But I don't speak Go, so I'm not entirely sure we're speaking the same language.

like image 32
0b1100110 Avatar answered Nov 04 '22 02:11

0b1100110