Here's some data from another question:
positive negative neutral
1 [marvel, moral, bold, destiny] [] [view, should]
2 [beautiful] [complicated, need] []
3 [celebrate] [crippling, addiction] [big]
What I would do first is to add quotes across all words, and then:
import ast
df = pd.read_clipboard(sep='\s{2,}')
df = df.applymap(ast.literal_eval)
Is there a smarter way to do this?
For basic structures you can use yaml without having to add quotes:
import yaml
df = pd.read_clipboard(sep='\s{2,}').applymap(yaml.load)
type(df.iloc[0, 0])
Out: list
Under certain conditions, you can read your lists as strings and the convert them using literal_eval (or pd.eval, if they are simple lists).
For example,
A B
0 [1, 2, 3] 11
1 [4, 5, 6] 12
First, ensure there are at least two spaces between the columns, then copy your data and run the following:
import ast
df = pd.read_clipboard(sep=r'\s{2,}', engine='python')
df['A'] = df['A'].map(ast.literal_eval)
df
A B
0 [1, 2, 3] 11
1 [4, 5, 6] 12
df.dtypes
A object
B int64
dtype: object
Notes
for multiple columns, use
applymapin the conversion step:df[['A', 'B', ...]] = df[['A', 'B', ...]].applymap(ast.literal_eval)if your columns can contain NaNs, define a function that can handle them appropriately:
parser = lambda x: x if pd.isna(x) else ast.literal_eval(x) df[['A', 'B', ...]] = df[['A', 'B', ...]].applymap(parser)if your columns contain lists of strings, you will need something like
yaml.load(requires installation) to parse them instead if you don't want to manually add quotes to the data. See above.
I did it this way:
df = pd.read_clipboard(sep='\s{2,}', engine='python')
df = df.apply(lambda x: x.str.replace(r'[\[\]]*', '').str.split(',\s*', expand=False))
PS i'm sure - there must be a better way to do that...
Another alternative is
In [43]: df.applymap(lambda x: x[1:-1].split(', '))
Out[43]:
positive negative neutral
1 [marvel, moral, bold, destiny] [] [view, should]
2 [beautiful] [complicated, need] []
3 [celebrate] [crippling, addiction] [big]
Note that this assumes the first and last character in each cell is [ and ].
It also assumes there is exactly one space after the commas.
Another version:
df.applymap(lambda x:
ast.literal_eval("[" + re.sub(r"[[\]]", "'",
re.sub("[,\s]+", "','", x)) + "]"))
Per help from @MaxU
df = pd.read_clipboard(sep='\s{2,}', engine='python')
Then:
>>> df.apply(lambda col: col.str[1:-1].str.split(', '))
positive negative neutral
1 [marvel, moral, bold, destiny] [] [view, should]
2 [beautiful] [complicated, need] []
3 [celebrate] [crippling, addiction] [big]
>>> df.apply(lambda col: col.str[1:-1].str.split()).loc[3, 'negative']
['crippling', 'addiction']
And per the notes from @unutbu who came up with a similar solution:
assumes the first and last character in each cell is [ and ]. It also assumes there is exactly one space after the commas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With