I have a Pandas dataframe: <pre class="prettyprint"><code>type(original) pandas.core.frame.DataFrame </code></pre> which includes the series object <code>original['user']</code>: <pre class="prettyprint"><code>type(original['user']) pandas.core.series.Series </code></pre> <code>original['user']</code> points to a number of dicts: <pre class="prettyprint"><code>type(original['user'].ix[0]) dict </code></pre> Each dict has the same keys: <pre class="prettyprint"><code>original['user'].ix[0].keys() [u'follow_request_sent', u'profile_use_background_image', u'profile_text_color', u'id', u'verified', u'profile_location', # ... keys removed for brevity ] </code></pre> Above is (part of) one of the dicts of <code>user</code> fields in a tweet from tweeter API. I want to build a data frame from these dicts. When I try to make a data frame directly, I get only one column for each row and this column contains the whole dict: <pre class="prettyprint"><code>pd.DataFrame(original['user'][:2]) user 0 {u'follow_request_sent': False, u'profile_use_... 1 {u'follow_request_sent': False, u'profile_use_.. </code></pre> When I try to create a data frame using from_dict() I get the same result: <pre class="prettyprint"><code>pd.DataFrame.from_dict(original['user'][:2]) user 0 {u'follow_request_sent': False, u'profile_use_... 1 {u'follow_request_sent': False, u'profile_use_.. </code></pre> Next I tried a list comprehension which returned an error: <pre class="prettyprint"><code>item = [[k, v] for (k,v) in users] ValueError: too many values to unpack </code></pre> When I create a data frame from a single row, it nearly works: <pre class="prettyprint"><code>df = pd.DataFrame.from_dict(original['user'].ix[0]) df.reset_index() index contributors_enabled created_at default_profile default_profile_image description entities favourites_count follow_request_sent followers_count following friends_count geo_enabled id id_str is_translation_enabled is_translator lang listed_count location name notifications profile_background_color profile_background_image_url profile_background_image_url_https profile_background_tile profile_image_url profile_image_url_https profile_link_color profile_location profile_sidebar_border_color profile_sidebar_fill_color profile_text_color profile_use_background_image protected screen_name statuses_count time_zone url utc_offset verified 0 description False Mon May 26 11:58:40 +0000 2014 True False {u'urls': []} 0 False 157 </code></pre> It works almost like I want it to, except it sets the <code>description</code> field as the default index. Each of the dicts has 40 keys but I only need about 10 of them and I have 28734 rows in data frame. How can I filter out the keys which I do not need?

what I would try to do is the following: <pre class="prettyprint"><code>new_df = pd.DataFrame(list(original['user'])) </code></pre> this will convert the series to list then pass it to pandas dataframe and it should take care of the rest.

<code>df = original['user'].apply(pd.Series)</code> works well credit

Python: Pandas dataframe from Series of dict

Tags:

python

pandas

dataframe

I have a Pandas dataframe:

type(original) pandas.core.frame.DataFrame

which includes the series object original['user']:

type(original['user']) pandas.core.series.Series

original['user'] points to a number of dicts:

type(original['user'].ix[0]) dict

Each dict has the same keys:

original['user'].ix[0].keys()  [u'follow_request_sent',  u'profile_use_background_image',  u'profile_text_color',  u'id',  u'verified',  u'profile_location',  # ... keys removed for brevity ]

Above is (part of) one of the dicts of user fields in a tweet from tweeter API. I want to build a data frame from these dicts.

When I try to make a data frame directly, I get only one column for each row and this column contains the whole dict:

pd.DataFrame(original['user'][:2])     user 0   {u'follow_request_sent': False, u'profile_use_... 1   {u'follow_request_sent': False, u'profile_use_..

When I try to create a data frame using from_dict() I get the same result:

pd.DataFrame.from_dict(original['user'][:2])      user 0   {u'follow_request_sent': False, u'profile_use_... 1   {u'follow_request_sent': False, u'profile_use_..

Next I tried a list comprehension which returned an error:

item = [[k, v] for (k,v) in users] ValueError: too many values to unpack

When I create a data frame from a single row, it nearly works:

df = pd.DataFrame.from_dict(original['user'].ix[0]) df.reset_index()      index   contributors_enabled    created_at  default_profile     default_profile_image   description     entities    favourites_count    follow_request_sent     followers_count     following   friends_count   geo_enabled     id  id_str  is_translation_enabled  is_translator   lang    listed_count    location    name    notifications   profile_background_color    profile_background_image_url    profile_background_image_url_https  profile_background_tile     profile_image_url   profile_image_url_https     profile_link_color  profile_location    profile_sidebar_border_color    profile_sidebar_fill_color  profile_text_color  profile_use_background_image    protected   screen_name     statuses_count  time_zone   url     utc_offset  verified 0   description     False   Mon May 26 11:58:40 +0000 2014  True    False       {u'urls': []}   0   False   157

It works almost like I want it to, except it sets the description field as the default index.

Each of the dicts has 40 keys but I only need about 10 of them and I have 28734 rows in data frame.

How can I filter out the keys which I do not need?

341

asked Apr 16 '15 17:04

makambi

2 Answers

what I would try to do is the following:

new_df = pd.DataFrame(list(original['user']))

this will convert the series to list then pass it to pandas dataframe and it should take care of the rest.

166

answered Sep 17 '22 18:09

Eyad

df = original['user'].apply(pd.Series)

works well

credit

answered Sep 19 '22 18:09

saynah

Related questions
                            
                                What does "del" do exactly?
                            
                                How to add a key-value to JSON data retrieved from a file?
                            
                                multiple key value pairs in dict comprehension
                            
                                How do you access tree depth in Python's scikit-learn?
                            
                                Using Google API for Python- where do I get the client_secrets.json file from?
                            
                                How can I optimize this Python code to generate all words with word-distance 1?
                            
                                Decorators in Ruby (migrating from Python)
                            
                                How can I efficiently process a numpy array in blocks similar to Matlab's blkproc (blockproc) function
                            
                                Numpy Adding two vectors with different sizes
                            
                                python supervisord program dependency
                            
                                Add minor gridlines to matplotlib plot using seaborn
                            
                                tornado vs wsgi(with gunicorn)
                            
                                Stack two pandas data frames
                            
                                Numpy: Fix array with rows of different lengths by filling the empty elements with zeros
                            
                                Create a custom Transformer in PySpark ML
                            
                                How do I import module in jupyter notebook directory into notebooks in lower directories? [duplicate]
                            
                                What is pipenv [dev-packages] section for?
                            
                                what does yield as assignment do? myVar = (yield)
                            
                                Manage #TODO (lots of files) with VIM
                            
                                Read MP3 in Python 3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With