How to use pd.melt to unpivot a dataframe where columns share a prefix?

Question

I'm trying to unpivot my data using pd.melt but no success so far. Each row is a business, and the data contains information about the business and multiple reviews. I want my data to have every review as a row.

My first 150 columns are in groups of 15, each group column name shares the same pattern reviews/n/ for 0 < n < 9. (reviews/0/text, reviews/0/date, ... , reviews/9/date). The next 65 columns in the dataframe include more data about the business (e.g. business_id, address) that should remain as id_variables.

My current data looks like this:

business_id	address	reviews/0/date	reviews/0/text	reviews/1/date	reviews/1/text
12345	01 street	1/1/1990	"abc"	2/2/1995	"def"

and my new dataframe should have every review as a row instead of every business, and look like this:

business_id	address	review_number	review_date	review_text
12345	01 street	0	1/1/1990	"abc"
12345	01 street	1	2/2/1995	"def"

I tried using pd.melt but could not succeed in making code that produced something valuable to me.

buddemat · Accepted Answer

You can use pandas.wide_to_long() to do what you want.

However, you will need to rename your columns from the pattern reviews/N/COL to reviews/COL/N (or something similar) first, as wide_to_long() can only unpivot based on prefixes, whereas in your column names, you have a prefix and a suffix.

You could do this manually or e.g. using the re module and an appropriate regex:

df = df.rename(columns=lambda x: re.sub('reviews/(\d)/(.*)', r'review_\2\1', x))

After that, your data should look like this (note the changed colnames):

business_id	address	review_date0	review_text0	review_date1	review_text1
12345	01 street	1/1/1990	abc	2/2/1995	def

Now you can use pandas.wide_to_long() and use the stubnames parameter to specify the prefix of the columns that should be grouped when you unpivot.

df = pd.wide_to_long(df,
                     stubnames=['review_date','review_text'],
                     i=['business_id', 'address'], 
                     j='review_number')

Finally, call .reset_index() to achieve the result you asked for.

Full example:

import re
import pandas as pd

df = pd.DataFrame({'business_id': 12345, 
                   'address': '01 street', 
                   'reviews/0/date': '1/1/1990', 
                   'reviews/0/text': 'abc', 
                   'reviews/1/date': '2/2/1995', 
                   'reviews/1/text': 'def'}, index = [0])

df = df.rename(columns=lambda x: re.sub('reviews/(\d)/(.*)', r'review_\2\1', x))

df = pd.wide_to_long(df,
                     stubnames=['review_date','review_text'],
                     i=['business_id', 'address'], 
                     j='review_number').reset_index()

Result:

business_id	address	review_number	review_date	review_text
12345	01 street	0	1/1/1990	abc
12345	01 street	1	2/2/1995	def

jqurious · Answer

You can get the names of all the non-review columns.

columns = df.columns[~df.columns.str.match(r'reviews/\d+/')]

>>> columns
Index(['address', 'business_id'], dtype='object')

And use those to .melt()

df = df.melt(columns)

df['review_number'] = df['variable'].str.extract(r'reviews/(\d+)')
df['variable'] = df['variable'].str.replace(r'reviews/\d+/', 'review_')

>>> df
  address  business_id     variable                value review_number
0  street            1  review_date  1990-01-01 00:00:00             0
1  street            1  review_text                "abc"             0
2  street            1  review_date  1995-02-02 00:00:00             1
3  street            1  review_text                "def"             1

From there you can .pivot()

>>> df.pivot(index=columns.union(['review_number']).to_list(), columns='variable')
                                                 value            
variable                                   review_date review_text
address business_id review_number                                 
street  1           0              1990-01-01 00:00:00       "abc"
                    1              1995-02-02 00:00:00       "def"

How to use pd.melt to unpivot a dataframe where columns share a prefix?

Tags:

python

regex

pandas

unpivot

melt

raz

2 Answers

buddemat

jqurious

Recent Activity

Donate For Us

How to use pd.melt to unpivot a dataframe where columns share a prefix?

Tags:

python

regex

pandas

unpivot

melt

raz

2 Answers

buddemat

jqurious

Related questions

Recent Activity

Donate For Us