Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: list element in CSV file

I have a csv file of such structure:

Id,Country,Cities
1,Canada,"['Toronto','Ottawa','Montreal']"
2,Italy,"['Rome','Milan','Naples', 'Palermo']"
3,France,"['Paris','Cannes','Lyon']"
4,Spain,"['Seville','Alicante','Barcelona']"

The last column contains a list, but it is represented as a string so that it is treated as a single element. When parsing the file, I need to have this element as a list, not a string. So far I've found the way to convert it:

L = "['Toronto','Ottawa','Montreal']"
seq = ast.literal_eval(L)

Since I'm a newbie in python, my question is -- is this normal way of doing this, or there's a right way to represent lists in CSV so that I don't have to do conversions, or there's a simpler way to convert?

Thanks!

like image 249
Mark Avatar asked Oct 29 '25 00:10

Mark


1 Answers

Using ast.literal_eval(...) will work, but it requires special syntax that other CSV-reading software won't recognize, and uses an eval statement which is a red flag.

Using eval can be dangerous, even though in this case you're using the safer literal_eval option which is more restrained than the raw eval function.

Usually what you'll see in CSV files that have many values in a single column is that they'll use a simple delimiter and quote the field.

For instance:

ID,Country,Cities
1,Canada,"Toronto;Ottawa;Montreal"

Then in python, or any other language, it becomes trivial to read without having to resort to eval:

import csv

with open("data.csv") as fobj:
    reader = csv.reader(fobj)
    field_names = next(reader)

    rows = []
    for row in reader:
        row[-1] = row[-1].split(";")
        rows.append(row)

Issues with ast.literal_eval

Even though the ast.literal_eval function is much safer than using a regular eval on user input, it still might be exploitable. The documentation for literal_eval has this warning:

Warning: It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.

A demonstration of this can be found here:

>>> import ast
>>> ast.literal_eval("()" * 10 ** 6)
[1]    48513 segmentation fault  python

I'm definitely not an expert, but giving a user the ability to crash a program and potentially exploit some obscure memory vulnerability is bad, and in this use-case can be avoided.

If the reason you want to use literal_eval is to get proper typing, and you're positive that the input data is 100% trusted, then I suppose it's fine to use. But, you could always wrap the function to perform some sanity checks:

def sanely_eval(value: str, max_size: int = 100_000) -> object:
    if len(value) > max_size:
        raise ValueError(f"len(value) is greater than the max_size={max_size!r}")
    return ast.literal_eval(value)

But, depending on how you're creating and using the CSV files, this may make the data less portable, since it's a python-specific format.

like image 167
damon Avatar answered Oct 31 '25 14:10

damon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!