Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV to Stream of Maps in Elixir

Tags:

csv

elixir

I need to parse a large about of csv data where the first line of the file is the headers. The library :csv already gives me a Stream of lists, I need to deduce the structure from the first line but ignore it and then produce a Stream of Maps with the given structure.

I like this:

data.csv

a,b
1,2
3,4
...

CSV.stream_map(filename) Output

{a: 1, b: 2} #1
{a: 3, b: 4} #2
...

I was looking into Stream.transform but couldn't figure out how to skip the first element. The structure can be stored in the accumulator.

like image 525
Cristian Garcia Avatar asked Sep 20 '25 23:09

Cristian Garcia


2 Answers

If you pass headers: true as the second argument to CSV.decode/2 (as mentioned in the docs), it'll automatically use the first row as key names and return a Map for all the following rows.

iex(1)> CSV.decode(File.stream!("data.csv"), headers: true) |> Enum.to_list
[%{"a" => "1", "b" => "2"}, %{"a" => "3", "b" => "4"}]

data.csv contains:

a,b
1,2
3,4
like image 168
Dogbert Avatar answered Sep 23 '25 10:09

Dogbert


While the csv module already does this as I've found out, I also found a way to implement this myself. It turns out that if you send back an empty list [] on the Stream.transform callback, no element gets pushed into the stream:

def map_stream(enum) do
    enum
    |> Stream.transform(:first, &structure_from_header/2)
end

#The accumulator starts as :first, the its the structure of the csv
#that is the first line
def structure_from_header(line, :first),
    do: { [ ], line } #<=================== Here is the trick

def structure_from_header(line, structure) do
    map = 
      structure
      |> Enum.zip(line)
      |> Enum.into(%{})

{ [ map ], structure }
end
like image 28
Cristian Garcia Avatar answered Sep 23 '25 09:09

Cristian Garcia