We have a 'delete all my data' feature. I'd like to delete a set of IPs from many many web log files.
Currently at runtime I open a CSV with the IP addresses to delete, turn it into a set, scan through files, and execute the delete logic if log IPs match.
Is there any way I can load the CSV and turn it into a set at compile time? We're trying to migrate things to AWS lambda, and it's nifty to have only a single static binary to deploy with no dependencies.
have only a single static binary to deploy
Inline your entire CSV file using include!
or include_str!
and then go about the rest of your program as usual.
use csv; // 1.0.5
static CSV_FILE: &[u8] = include_bytes!("/etc/hosts");
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut rdr = csv::ReaderBuilder::new()
.delimiter(b'\t')
.from_reader(CSV_FILE);
for result in rdr.records() {
let record = result?;
println!("{:?}", record);
}
Ok(())
}
See also:
The Rust-PHF crate provides compile-time data structures, including (ordered) maps and sets.
Unfortunately, to date, it does not support initialization of a set of std::net::IpAddr
, but can be used with static strings:
static IP_SET: phf::Set<&'static str> = phf_set! {
"127.0.0.1",
"::1",
};
I would recommend to simply use a Build Script to read the CSV and produce a source file containing the initialized of a standard HashSet
with a custom hasher (FxHash
, for example).
This would let you keep the convenience of editing a CSV file, while still baking all the data into a binary. It would require some initialization time (unlike PHF), but the ability to specify a custom hash is quite beneficial.
Also, depending on the format of IPs in the logs, you may want to store either &'static str
or u32
; the latter is more efficient (search-wise), but the gain may be negated if a conversion is required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With