I have a column in my csv
file which has values like this:
['Type: CARDINAL, Value: 50p', 'Type: CARDINAL, Value: 10', 'Type: CARDINAL, Value: 10']
The problem is when I load my data in a dataframe, I get a string
instead of getting an array and I can't traverse through it.
I also have tried json.loads()
but the problem is sometimes I have values like ["Type: TIME, Value: last night's"]
so I can't replace single quotes (')
by double quotes (")
and this stops json from parsing my string.
Any idea how to read my column as an array?
Use ast.literal_eval
for convert string representation of lists to lists
:
import ast
a = "['Type: CARDINAL, Value: 50p', 'Type: CARDINAL, Value: 10', 'Type: CARDINAL, Value: 10']"
df = pd.DataFrame({'col':[a, a]})
df['col'] = df['col'].apply(ast.literal_eval)
print (df)
col
0 [Type: CARDINAL, Value: 50p, Type: CARDINAL, V...
1 [Type: CARDINAL, Value: 50p, Type: CARDINAL, V...
print (type(df.loc[0, 'col']))
<class 'list'>
EDIT:
If need to find all values which cannot be converted:
a = "['Type: CARDINAL, Value: 50p', 'Type: CARDINAL, Value: 10', 'Type: CARDINAL, Value: 10']"
df = pd.DataFrame({'col':[a, a, 'wrong "']})
def test(x):
try:
return ast.literal_eval(x)
except:
return np.nan
df['new'] = df['col'].apply(test)
print (df)
col \
0 ['Type: CARDINAL, Value: 50p', 'Type: CARDINAL...
1 ['Type: CARDINAL, Value: 50p', 'Type: CARDINAL...
2 wrong "
new
0 [Type: CARDINAL, Value: 50p, Type: CARDINAL, V...
1 [Type: CARDINAL, Value: 50p, Type: CARDINAL, V...
2 NaN
print (df[df['new'].isna()])
col new
2 wrong " NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With