Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Pandas dataframe from Series of dict

I have a Pandas dataframe:

type(original) pandas.core.frame.DataFrame 

which includes the series object original['user']:

type(original['user']) pandas.core.series.Series 

original['user'] points to a number of dicts:

type(original['user'].ix[0]) dict 

Each dict has the same keys:

original['user'].ix[0].keys()  [u'follow_request_sent',  u'profile_use_background_image',  u'profile_text_color',  u'id',  u'verified',  u'profile_location',  # ... keys removed for brevity ] 

Above is (part of) one of the dicts of user fields in a tweet from tweeter API. I want to build a data frame from these dicts.

When I try to make a data frame directly, I get only one column for each row and this column contains the whole dict:

pd.DataFrame(original['user'][:2])     user 0   {u'follow_request_sent': False, u'profile_use_... 1   {u'follow_request_sent': False, u'profile_use_.. 

When I try to create a data frame using from_dict() I get the same result:

pd.DataFrame.from_dict(original['user'][:2])      user 0   {u'follow_request_sent': False, u'profile_use_... 1   {u'follow_request_sent': False, u'profile_use_.. 

Next I tried a list comprehension which returned an error:

item = [[k, v] for (k,v) in users] ValueError: too many values to unpack 

When I create a data frame from a single row, it nearly works:

df = pd.DataFrame.from_dict(original['user'].ix[0]) df.reset_index()      index   contributors_enabled    created_at  default_profile     default_profile_image   description     entities    favourites_count    follow_request_sent     followers_count     following   friends_count   geo_enabled     id  id_str  is_translation_enabled  is_translator   lang    listed_count    location    name    notifications   profile_background_color    profile_background_image_url    profile_background_image_url_https  profile_background_tile     profile_image_url   profile_image_url_https     profile_link_color  profile_location    profile_sidebar_border_color    profile_sidebar_fill_color  profile_text_color  profile_use_background_image    protected   screen_name     statuses_count  time_zone   url     utc_offset  verified 0   description     False   Mon May 26 11:58:40 +0000 2014  True    False       {u'urls': []}   0   False   157 

It works almost like I want it to, except it sets the description field as the default index.

Each of the dicts has 40 keys but I only need about 10 of them and I have 28734 rows in data frame.

How can I filter out the keys which I do not need?

like image 341
makambi Avatar asked Apr 16 '15 17:04

makambi


People also ask

Can python dictionary will be converted into pandas series?

You can create a pandas series from a dictionary by passing the dictionary to the command: pandas. Series() . In this article, you will learn about the different methods of configuring the pandas. Series() command to make a pandas series from a dictionary followed by a few practical tips for using them.

How do you create a DataFrame from a dictionary?

Method 1: Create DataFrame from Dictionary using default Constructor of pandas. Dataframe class. Method 2: Create DataFrame from Dictionary with user-defined indexes. Method 3: Create DataFrame from simple dictionary i.e dictionary with key and simple value like integer or string value.

Can we create DataFrame from list and dictionary in python?

Create Dataframe from list of dicts with custom indexes. As all the dictionaries have similar keys, so the keys became the column names. Then for each key all the values associated with that key in all the dictionaries became the column values.


2 Answers

what I would try to do is the following:

new_df = pd.DataFrame(list(original['user'])) 

this will convert the series to list then pass it to pandas dataframe and it should take care of the rest.

like image 166
Eyad Avatar answered Sep 17 '22 18:09

Eyad


df = original['user'].apply(pd.Series)

works well

credit

like image 40
saynah Avatar answered Sep 19 '22 18:09

saynah