Pandas

Question

I have a DataFrame....

    _id           doc_count doc_media_url   image_tagging
0   327bcc224b8c7049    1.0 URL1    {'success': True, 'tags': [], 'custom_tags': []}
1   e466c4966666c69e    1.0 URL2    {'success': True, 'tags': [{'tag': 'Cartoon', ...
2   b4303830389cf8f9    1.0 URL3    {'success': True, 'tags': [{'tag': 'Poster', '...
3   00a424323220b68e    1.0 URL4    {'success': True, 'tags': [{'tag': 'Stage', 'c...
4   c66e3e2921a7c7cd    1.0 URL5    {'success': True, 'tags': [], 'custom_tags': []}

... and my issue is with the image_tagging column. Currently it is a column of dictionaries. I intend to extract out the keys of the dictionary into their own columns, however I'm being impeded because one single row of the data is not a dictionary but a list which throws off any operations that are expecting a dictionary.

df.image_tagging.apply(lambda x: type(x)).value_counts()

<class 'dict'>    14067
<class 'list'>        1
Name: image_tagging, dtype: int64

This list item shouldn't be there so I'd like to clean out that row. However I'm having an issue selecting rows by type because Pandas mainly focuses on dtypes and a dict and a list are classified as the same (I think anyway!).

Is there a way I can select the row with the list item in that column so that I can remove it from the DataFrame?

Thanks for any assistance!

MaxU - stop WAR against UA · Accepted Answer

try this:

df = df[df.image_tagging.map(type)==dict]

Demo:

In [146]: df = pd.DataFrame({
     ...:     'A': [{'1':1, 'a':2}, [1,2,3], {'2':2}],
     ...: })

In [147]: df
Out[147]:
                  A
0  {'1': 1, 'a': 2}
1         [1, 2, 3]
2          {'2': 2}

In [148]: df = df[df.A.map(type) == dict]

In [149]: df
Out[149]:
                  A
0  {'1': 1, 'a': 2}
2          {'2': 2}

Pandas - Select Rows by 'type' (Not dtype)

Tags:

python

James Allen-Robertson

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us