Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse lists in pandas columns

I am trying to figure out how to parse pandas columns containing lists: my problem is that these are recognised as strings, whereas I would like them to be treated as lists, to iterate through them.

This is an example of my cells: [('P105', 1), ('P31', 1), ('P225', 1), ('P70', 1)]

When I try to iterate through it, I only get the characters contained in the string one by one (i.e. [, (, ', P etc.). How do I make pandas 'understand' that these are lists?

Edit: I have found a way to do that: I apply ast.literal_eval to each line.

Example:

line = month_statement['properties_claims'][12]
for i in line:
    print i

[
(
'
P
7
6
'
...

If I use ast.literal_eval, instead:

line = ast.literal_eval(month_statement['properties_claims'][12])
line
Out[23]: 
[('P76', 1),
 ('P77', 1),
 ('P75', 1),
 ('P273', 1),
 ('P70', 1),
 ('P107', 1),
 ('P225', 1)]

My doubt now is how efficient this approach will be to process millions of lines.

like image 995
Aliossandro Avatar asked Nov 16 '25 21:11

Aliossandro


2 Answers

pretty old question, but i guess this should work:

import ast

df['col'].apply(ast.literal_eval)

read data into chunks if file is too big, using e.g. pd.read_csv(...,cunksize=50000)

like image 116
muon Avatar answered Nov 18 '25 11:11

muon


I would personally split that into further columns and iterate over them:

   df['col'].apply(lambda x : pd.Series(x.split(',')))

or

   df['col'].apply(lambda x : pd.Series( x.replace( '),' , ')&&' ).split('&&'))) 
like image 20
user2589273 Avatar answered Nov 18 '25 11:11

user2589273



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!