I have a Pandas dataframe:
type(original) pandas.core.frame.DataFrame   which includes the series object original['user']:
type(original['user']) pandas.core.series.Series   original['user'] points to a number of dicts:
type(original['user'].ix[0]) dict   Each dict has the same keys:
original['user'].ix[0].keys()  [u'follow_request_sent',  u'profile_use_background_image',  u'profile_text_color',  u'id',  u'verified',  u'profile_location',  # ... keys removed for brevity ]   Above is (part of) one of the dicts of user fields in a tweet from tweeter API. I want to build a data frame from these dicts.
When I try to make a data frame directly, I get only one column for each row and this column contains the whole dict:
pd.DataFrame(original['user'][:2])     user 0   {u'follow_request_sent': False, u'profile_use_... 1   {u'follow_request_sent': False, u'profile_use_..   When I try to create a data frame using from_dict() I get the same result:
pd.DataFrame.from_dict(original['user'][:2])      user 0   {u'follow_request_sent': False, u'profile_use_... 1   {u'follow_request_sent': False, u'profile_use_..   Next I tried a list comprehension which returned an error:
item = [[k, v] for (k,v) in users] ValueError: too many values to unpack   When I create a data frame from a single row, it nearly works:
df = pd.DataFrame.from_dict(original['user'].ix[0]) df.reset_index()      index   contributors_enabled    created_at  default_profile     default_profile_image   description     entities    favourites_count    follow_request_sent     followers_count     following   friends_count   geo_enabled     id  id_str  is_translation_enabled  is_translator   lang    listed_count    location    name    notifications   profile_background_color    profile_background_image_url    profile_background_image_url_https  profile_background_tile     profile_image_url   profile_image_url_https     profile_link_color  profile_location    profile_sidebar_border_color    profile_sidebar_fill_color  profile_text_color  profile_use_background_image    protected   screen_name     statuses_count  time_zone   url     utc_offset  verified 0   description     False   Mon May 26 11:58:40 +0000 2014  True    False       {u'urls': []}   0   False   157   It works almost like I want it to, except it sets the description field as the default index.
Each of the dicts has 40 keys but I only need about 10 of them and I have 28734 rows in data frame.
How can I filter out the keys which I do not need?
You can create a pandas series from a dictionary by passing the dictionary to the command: pandas. Series() . In this article, you will learn about the different methods of configuring the pandas. Series() command to make a pandas series from a dictionary followed by a few practical tips for using them.
Method 1: Create DataFrame from Dictionary using default Constructor of pandas. Dataframe class. Method 2: Create DataFrame from Dictionary with user-defined indexes. Method 3: Create DataFrame from simple dictionary i.e dictionary with key and simple value like integer or string value.
Create Dataframe from list of dicts with custom indexes. As all the dictionaries have similar keys, so the keys became the column names. Then for each key all the values associated with that key in all the dictionaries became the column values.
what I would try to do is the following:
new_df = pd.DataFrame(list(original['user']))   this will convert the series to list then pass it to pandas dataframe and it should take care of the rest.
df = original['user'].apply(pd.Series)
works well
credit
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With