Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read_csv converter – How to handle exceptions (literal_eval SyntaxError)

Into a Pandas DataFrame, I'm reading a csv file that looks like:

          A              B
  +--------------+---------------+
0 |              | ("t1", "t2")  |
  +--------------+---------------+
1 | ("t3", "t4") |               |
  +--------------+---------------+

Two of the cells have literal tuples in them, and two of the cells are empty.

df = pd.read_csv(my_file.csv, dtype=str, delimiter=',',
    converters={'A': ast.literal_eval, 'B': ast.literal_eval})

The converter ast.literal_eval works fine to convert the literal tuples into Python tuple objects within the code – but only as long as there are no empty cells. Because I have empty cells, I get the error:

SyntaxError: unexpected EOF while parsing

According to this S/O answer, I should try to catch the SyntaxError exception for empty strings:

ast uses compile to compile the source string (which must be an expression) into an AST. If the source string is not a valid expression (like an empty string), a SyntaxError will be raised by compile.

However, I am not sure how to catch exceptions for individual cells, within the context of the read_csv converters.

What would be the best way to go about this? Is there otherwise some way to convert empty strings/cells into objects which literal_eval would accept or ignore?

NB: My understanding is that having literal tuples in readable files isn't always the best thing, but in my case it's useful.

like image 415
P A N Avatar asked Dec 13 '22 14:12

P A N


1 Answers

You can create a custom function which uses ast.literal_eval conditionally:

from ast import literal_eval
from io import StringIO

# replicate csv file
x = StringIO("""A,B
,"('t1', 't2')"
"('t3', 't4')",""")

def literal_converter(val):
    # replace first val with '' or some other null identifier if required
    return val if val == '' else literal_eval(val)

df = pd.read_csv(x, delimiter=',', converters=dict.fromkeys('AB', literal_converter))

print(df)

          A         B
0            (t1, t2)
1  (t3, t4)          

Alternatively, you can use try / except to catch SyntaxError. This solution is more lenient as it will deal with other malformed syntax, i.e. SyntaxError / ValueError caused by reasons other than empty values.

def literal_converter(val):
    try:
        return literal_eval(val)
    except SyntaxError, ValueError:
        return val
like image 51
jpp Avatar answered Dec 22 '22 17:12

jpp