The following naive code for reading from stdin and counting the number of occurrences of each byte is extremely slow, taking about 1m40 to process 1 GiB of data on my machine.
int counts[256] {0};
uint8_t byte;
while (std::cin >> std::noskipws >> byte) {
++counts[byte];
}
Doing a buffered read is, of course, much faster, processing 1 GiB in less than a second.
uint8_t buf[4096];
uint8_t byte;
int n;
while (n = read(0, (void *)buf, 4096), n > 0) {
for (int i = 0; i < n; ++i) {
++counts[buf[i]];
}
}
However, it has the disadvantage of being more complicated and requiring manual buffer management.
Is there any way of reading a stream byte-by-byte in standard C++ that is as simple, obvious, and idiomatic as the first snippet, but as performant as the second?
This seems to be an interesting problem. My results are here:
without cin sync : 34.178s
with cin sync : 14.347s
with getchar : 03.911s
with getchar_unlocked : 00.700s
The source file was generated using:
$ dd if=/dev/urandom of=file.txt count=1024 bs=1048576
The first one is my reference, no changes: 34.178s
#include <bits/stdc++.h>
int main(int argc, char **argv) {
FILE *f = freopen(argv[1], "rb", stdin);
int counts[256] {0};
uint8_t byte;
while (std::cin >> std::noskipws >> byte) {
++counts[byte];
}
return 0;
}
Using std::ios::sync_with_stdio(false);:14.347s
#include <bits/stdc++.h>
int main(int argc, char **argv) {
std::ios::sync_with_stdio(false);
FILE *f = freopen(argv[1], "rb", stdin);
int counts[256] {0};
uint8_t byte;
while (std::cin >> std::noskipws >> byte) {
++counts[byte];
}
return 0;
}
With getchar: 3.911s
#include <bits/stdc++.h>
int main(int argc, char **argv) {
FILE *f = freopen(argv[1], "rb", stdin);
int v[256] {0};
unsigned int b;
while ((b = getchar()) != EOF) {
++v[b];
}
return 0;
}
With getchar_unlocked: 0.700s
#include <bits/stdc++.h>
int main(int argc, char **argv) {
FILE *f = freopen(argv[1], "rb", stdin);
int v[256] {0};
unsigned int b;
while ((b = getchar_unlocked()) != EOF) {
++v[b];
}
return 0;
}
My machine config:
CPU : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
MEM : 12GB
Build: g++ speed.cc -O3 -o speed
g++ v: g++ (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
exec : time ./speed file.txt
For me, getchar_unlocked is the fastest way to read bytes without maintaining a buffer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With