Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a csv file into a list of lists in SWI prolog where the inner list represents each line of the CSV?

I have a CSV file that look something like below: i.e. not in Prolog format

james,facebook,intel,samsung
rebecca,intel,samsung,facebook
Ian,samsung,facebook,intel

I am trying to write a Prolog predicate that reads the file and returns a list that looks like

[[james,facebook,intel,samsung],[rebecca,intel,samsung,facebook],[Ian,samsung,facebook,intel]]

to be used further in other predicates.

I am still a beginner and have found some good information from SO and modified them to see if I can get it but I`m stuck because I only generate a list that looks like this

[[(james,facebook,intel,samsung)],[(rebecca,intel,samsung,facebook)],[(Ian,samsung,facebook,intel)]]

which means when I call the head of the inner lists I get (james,facebook,intel,samsung) and not james.

Here is the code being used :- (seen on SO and modified)

stream_representations(Input,Lines) :-
    read_line_to_codes(Input,Line),
    (   Line == end_of_file 
    ->  Lines = []
    ;   atom_codes(FinalLine, Line), 
        term_to_atom(LineTerm,FinalLine), 
        Lines = [[LineTerm] | FurtherLines],
        stream_representations(Input,FurtherLines) 
    ).
main(Lines) :- 
    open('file.txt', read, Input), 
    stream_representations(Input, Lines), 
    close(Input).
like image 747
Udoka Jose-Maria Nwabuike Avatar asked Mar 20 '20 18:03

Udoka Jose-Maria Nwabuike


1 Answers

The problem lies with term_to_atom(LineTerm,FinalLine).

First we read a line of the CSV file into a list of character codes in read_line_to_codes(Input,Line).

Let's simulate input with atom_codes/2:

?- atom_codes('james,facebook,intel,samsung',Line).
Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...].

Then we recompose the original atom read in into FinalLine (this seems wasteful, there must be a way to hoover up a line into an atom directly)

?- atom_codes('james,facebook,intel,samsung',Line), 
   atom_codes(FinalLine, Line). 

Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung'.

The we try to map this atom in FinalLine into a term, LineTerm, using term_to_atom/2

?- atom_codes('james,facebook,intel,samsung',Line), 
   atom_codes(FinalLine, Line),
   term_to_atom(LineTerm,FinalLine).

Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung',
LineTerm =  (james, facebook, intel, samsung).

You see the problem here: LineTerm is not quite a list, but a nested term using the functor , to separate elements:

?- atom_codes('james,facebook,intel,samsung',Line), 
   atom_codes(FinalLine, Line),
   term_to_atom(LineTerm,FinalLine),
   write_canonical(LineTerm).

','(james,','(facebook,','(intel,samsung)))

Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung',
LineTerm =  (james, facebook, intel, samsung).

This ','(james,','(facebook,','(intel,samsung))) term will thus also be in the final result, just written differently: (james,facebook,intel,samsung) and packed into a list: [(james,facebook,intel,samsung)]

You do not want this term, you want a list. You could use atomic_list_concat/2 to create a new atom that can be read as a list:

?- atom_codes('james,facebook,intel,samsung',Line), 
   atom_codes(FinalLine, Line),
   atomic_list_concat(['[',FinalLine,']'],ListyAtom),
   term_to_atom(LineTerm,ListyAtom),
   LineTerm = [V1,V2,V3,V4].

Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung',
ListyAtom = '[james,facebook,intel,samsung]',
LineTerm = [james, facebook, intel, samsung],
V1 = james,
V2 = facebook,
V3 = intel,
V4 = samsung.

But that's rather barbaric.

We must do this whole processing in fewer steps:

  1. Read a line of comma-separated strings on input.
  2. Transform this into a list of either atoms or strings directly.

DCGs seem like the correct solution. Maybe someone can add a two-liner.

like image 104
David Tonhofer Avatar answered Nov 10 '22 17:11

David Tonhofer