Extract dictionary value from column in data frame

Tags:

pandas

I'm looking for a way to optimize my code.

I have entry data in this form:

import pandas as pn

a=[{'Feature1': 'aa1','Feature2': 'bb1','Feature3': 'cc2' },
 {'Feature1': 'aa2','Feature2': 'bb2' },
 {'Feature1': 'aa1','Feature2': 'cc1' }
 ]
b=['num1','num2','num3']


df= pn.DataFrame({'num':b, 'dic':a })

I would like to extract element 'Feature3' from dictionaries in column 'dic'(if exist) in above data frame. So far I was able to solve it but I don't know if this is the fastest way, it seems to be a little bit over complicated.

Feature3=[]
for idx, row in df['dic'].iteritems():
    l=row.keys()

    if 'Feature3' in l:
        Feature3.append(row['Feature3'])
    else:
        Feature3.append(None)

df['Feature3']=Feature3
print df

Is there a better/faster/simpler way do extract this Feature3 to separate column in the dataframe?

Thank you in advance for help.

693

asked Feb 29 '16 22:02

michalk

5 Answers

df['Feature3'] = df['dic'].apply(lambda x: x.get('Feature3'))

Agree with maxymoo. Consider changing the format of your dataframe.

(Sidenote: pandas is generally imported as pd)

answered Oct 11 '22 01:10

as133

You can use a list comprehension to extract feature 3 from each row in your dataframe, returning a list.

feature3 = [d.get('Feature3') for d in df.dic]

If 'Feature3' is not in dic, it returns None by default.

You don't even need pandas, as you can again use a list comprehension to extract the feature from your original dictionary a.

feature3 = [d.get('Feature3') for d in a]

191

answered Oct 11 '22 00:10

Alexander

If you apply a Series, you get a quite nice DataFrame:

>>> df.dic.apply(pn.Series)
    Feature1    Feature2    Feature3
0   aa1 bb1 cc2
1   aa2 bb2 NaN
2   aa1 cc1 NaN

From this point, you can just use regular pandas operations.

answered Oct 11 '22 01:10

Ami Tavory

I think you can first create new DataFrame by comprehension and then create new column like:

df1 = pd.DataFrame([x for x in df['dic']])
print df1
  Feature1 Feature2 Feature3
0      aa1      bb1      cc2
1      aa2      bb2      NaN
2      aa1      cc1      NaN

df['Feature3'] = df1['Feature3']
print df
                                                 dic   num Feature3
0  {u'Feature2': u'bb1', u'Feature3': u'cc2', u'F...  num1      cc2
1         {u'Feature2': u'bb2', u'Feature1': u'aa2'}  num2      NaN
2         {u'Feature2': u'cc1', u'Feature1': u'aa1'}  num3      NaN

Or one line:

df['Feature3'] = pd.DataFrame([x for x in df['dic']])['Feature3']
print df
                                                 dic   num Feature3
0  {u'Feature2': u'bb1', u'Feature3': u'cc2', u'F...  num1      cc2
1         {u'Feature2': u'bb2', u'Feature1': u'aa2'}  num2      NaN
2         {u'Feature2': u'cc1', u'Feature1': u'aa1'}  num3      NaN

Timings:

len(df) = 3:

In [24]: %timeit pd.DataFrame([x for x in df['dic']])
The slowest run took 4.63 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 596 µs per loop

In [25]: %timeit df.dic.apply(pn.Series)
1000 loops, best of 3: 1.43 ms per loop

len(df) = 3000:

In [27]: %timeit pd.DataFrame([x for x in df['dic']])
100 loops, best of 3: 3.16 ms per loop

In [28]: %timeit df.dic.apply(pn.Series)
1 loops, best of 3: 748 ms per loop

answered Oct 11 '22 00:10

jezrael

I think you're thinking about the data structures slightly wrong. It's better to create the data frame with the features as columns from the start; pandas is actually smart enough to do this by default:

In [240]: pd.DataFrame(a)
Out[240]:
  Feature1 Feature2 Feature3
0      aa1      bb1      cc2
1      aa2      bb2      NaN
2      aa1      cc1      NaN

You would then add on your "num" column in a separate step, since the data is in a different orientation, either with

df['num'] = b

df = df.assign(num = b)

(I prefer the second option since it's got a more functional flavour).

answered Oct 11 '22 02:10

maxymoo

Related questions
                            
                                How to make a Python HTTP Request with POST data and Cookie?
                            
                                How do I handle multiple asserts within a single Python unittest?
                            
                                How can I change Django admin language?
                            
                                Write to StringIO object using Pandas Excelwriter?
                            
                                cannot import name 'ImageTK' - python 3.5
                            
                                Python iterate through array while finding the mean of the top k elements
                            
                                Method name doesn't conform to snake_case naming style
                            
                                Converting JSON into newline delimited JSON in Python
                            
                                What does this tensorflow message mean? Any side effect? Was the installation successful?
                            
                                Consecutive, Overlapping Subsets of Array (NumPy, Python)
                            
                                Perl for a Python programmer
                            
                                Display Listbox with columns using Tkinter?
                            
                                How do I have python httplib accept untrusted certs?
                            
                                Calculate difference between adjacent items in a python list
                            
                                How to get all messages in Amazon SQS queue using boto library in Python?
                            
                                How to read file attributes in a directory?
                            
                                How to execute a for loop in batches?
                            
                                Django REST framework foreign keys and filtering
                            
                                Flask Restful add resource parameters
                            
                                Dictionary column in pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With