I need to parse a large about of csv
data where the first line of the file is the headers. The library :csv
already gives me a Stream of lists, I need to deduce the structure from the first line but ignore it and then produce a Stream of Maps with the given structure.
I like this:
data.csv
a,b
1,2
3,4
...
CSV.stream_map(filename) Output
{a: 1, b: 2} #1
{a: 3, b: 4} #2
...
I was looking into Stream.transform
but couldn't figure out how to skip the first element. The structure can be stored in the accumulator.
If you pass headers: true
as the second argument to CSV.decode/2
(as mentioned in the docs), it'll automatically use the first row as key names and return a Map for all the following rows.
iex(1)> CSV.decode(File.stream!("data.csv"), headers: true) |> Enum.to_list
[%{"a" => "1", "b" => "2"}, %{"a" => "3", "b" => "4"}]
data.csv
contains:
a,b
1,2
3,4
While the csv
module already does this as I've found out, I also found a way to implement this myself. It turns out that if you send back an empty list []
on the Stream.transform
callback, no element gets pushed into the stream:
def map_stream(enum) do
enum
|> Stream.transform(:first, &structure_from_header/2)
end
#The accumulator starts as :first, the its the structure of the csv
#that is the first line
def structure_from_header(line, :first),
do: { [ ], line } #<=================== Here is the trick
def structure_from_header(line, structure) do
map =
structure
|> Enum.zip(line)
|> Enum.into(%{})
{ [ map ], structure }
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With