Pandas, for each unique value in one column, get unique values in another column

Tags:

I have a dataframe where each row contains various meta-data pertaining to a single Reddit comment (e.g. author, subreddit, comment text).

I want to do the following: for each author, I want to grab a list of all the subreddits they have comments in, and transform this data into a pandas dataframe where each row corresponds to an author, and a list of all the unique subreddits they comment in.

I am currently trying some combination of the following, but can't get it down:

Attempt 1:

group = df['subreddit'].groupby(df['author']).unique() list(group)

Attempt 2:

from collections import defaultdict subreddit_dict  = defaultdict(list)  for index, row in df.iterrows():     author = row['author']     subreddit = row['subreddit']     subreddit_dict[author].append(subreddit)  for key, value in subreddit_dict.items():     subreddit_dict[key] = set(value)  subreddit_df = pd.DataFrame.from_dict(subreddit_dict,                              orient = 'index')

293

asked Feb 25 '18 23:02

Parseltongue

1 Answers

Here are two strategies to do it. No doubt, there are other ways.

Assuming your dataframe looks something like this (obviously with more columns):

df = pd.DataFrame({'author':['a', 'a', 'b'], 'subreddit':['sr1', 'sr2', 'sr2']})  >>> df   author subreddit 0      a       sr1 1      a       sr2 2      b       sr2 ...

SOLUTION 1: groupby

More straightforward than solution 2, and similar to your first attempt:

group = df.groupby('author')  df2 = group.apply(lambda x: x['subreddit'].unique())  # Alternatively, same thing as a one liner: # df2 = df.groupby('author').apply(lambda x: x['subreddit'].unique())

Result:

>>> df2 author a    [sr1, sr2] b         [sr2]

The author is the index, and the single column is the list of all subreddits they are active in (this is how I interpreted how you wanted your output, according to your description).

If you wanted the subreddits each in a separate column, which might be more useable, depending on what you want to do with it, you could just do this after:

df2 = df2.apply(pd.Series)

Result:

>>> df2           0    1 author           a       sr1  sr2 b       sr2  NaN

Solution 2: Iterate through dataframe

you can make a new dataframe with all unique authors:

df2 = pd.DataFrame({'author':df.author.unique()})

And then just get the list of all unique subreddits they are active in, assigning it to a new column:

df2['subreddits'] = [list(set(df['subreddit'].loc[df['author'] == x['author']]))      for _, x in df2.iterrows()]

This gives you this:

>>> df2   author  subreddits 0      a  [sr2, sr1] 1      b       [sr2]

answered Oct 29 '22 02:10

sacuL

Related questions
                            
                                Ignore JWT Bearer token signature (i.e. don't validate token)
                            
                                Firestore: Invalid package reference in library
                            
                                How to get Device Id in Xamarin Forms?
                            
                                How can I add the new android chips dynamically in Android?
                            
                                Doxygen - Could NOT find FLEX (missing: FLEX_EXECUTABLE)
                            
                                Change font size in Vuetify based on viewport?
                            
                                App getting stuck with E/com.facebook.internal.AttributionIdentifiers [duplicate]
                            
                                How to disable animation for ListAdapter
                            
                                Flutter - how to get Text widget on widget test
                            
                                How can I restrict client access to only one group of users in keycloak?
                            
                                pandas.ExcelWriter ValueError: Append mode is not supported with xlsxwriter
                            
                                Refresh indicator doesn't work when list doesn't fill the whole page

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With