I am trying to figure out how to parse pandas columns containing lists: my problem is that these are recognised as strings, whereas I would like them to be treated as lists, to iterate through them.
This is an example of my cells: [('P105', 1), ('P31', 1), ('P225', 1), ('P70', 1)]
When I try to iterate through it, I only get the characters contained in the string one by one (i.e. [, (, ', P etc.). How do I make pandas 'understand' that these are lists?
Edit: I have found a way to do that: I apply ast.literal_eval to each line.
Example:
line = month_statement['properties_claims'][12]
for i in line:
print i
[
(
'
P
7
6
'
...
If I use ast.literal_eval, instead:
line = ast.literal_eval(month_statement['properties_claims'][12])
line
Out[23]:
[('P76', 1),
('P77', 1),
('P75', 1),
('P273', 1),
('P70', 1),
('P107', 1),
('P225', 1)]
My doubt now is how efficient this approach will be to process millions of lines.
pretty old question, but i guess this should work:
import ast
df['col'].apply(ast.literal_eval)
read data into chunks if file is too big, using e.g. pd.read_csv(...,cunksize=50000)
I would personally split that into further columns and iterate over them:
df['col'].apply(lambda x : pd.Series(x.split(',')))
or
df['col'].apply(lambda x : pd.Series( x.replace( '),' , ')&&' ).split('&&')))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With