I have a DataFrame....
_id doc_count doc_media_url image_tagging
0 327bcc224b8c7049 1.0 URL1 {'success': True, 'tags': [], 'custom_tags': []}
1 e466c4966666c69e 1.0 URL2 {'success': True, 'tags': [{'tag': 'Cartoon', ...
2 b4303830389cf8f9 1.0 URL3 {'success': True, 'tags': [{'tag': 'Poster', '...
3 00a424323220b68e 1.0 URL4 {'success': True, 'tags': [{'tag': 'Stage', 'c...
4 c66e3e2921a7c7cd 1.0 URL5 {'success': True, 'tags': [], 'custom_tags': []}
... and my issue is with the image_tagging
column. Currently it is a column of dictionaries. I intend to extract out the keys of the dictionary into their own columns, however I'm being impeded because one single row of the data is not a dictionary but a list which throws off any operations that are expecting a dictionary.
df.image_tagging.apply(lambda x: type(x)).value_counts()
<class 'dict'> 14067
<class 'list'> 1
Name: image_tagging, dtype: int64
This list item shouldn't be there so I'd like to clean out that row. However I'm having an issue selecting rows by type because Pandas mainly focuses on dtypes and a dict and a list are classified as the same (I think anyway!).
Is there a way I can select the row with the list item in that column so that I can remove it from the DataFrame?
Thanks for any assistance!
try this:
df = df[df.image_tagging.map(type)==dict]
Demo:
In [146]: df = pd.DataFrame({
...: 'A': [{'1':1, 'a':2}, [1,2,3], {'2':2}],
...: })
In [147]: df
Out[147]:
A
0 {'1': 1, 'a': 2}
1 [1, 2, 3]
2 {'2': 2}
In [148]: df = df[df.A.map(type) == dict]
In [149]: df
Out[149]:
A
0 {'1': 1, 'a': 2}
2 {'2': 2}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With