Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Wide to Long Format Change with Column Titles Spliting

I have a table with the following columns titles and a row example:

  Subject  Test1-Result1  Test1-Result2  Test2-Result1  Test2-Result2
0    John             10            0.5             20            0.3

I would like to transform it to:

  Subject level_1  Result1  Result2
0    John   Test1       10      0.5
1    John   Test2       20      0.3

With the subjects list repeated once for Test1 and then again for Test2.

I think I can do this using for loops, but it's there a more pythonic way?

For extra complexity, I need to add an extra column of information for each test. I suppose I can use a dictionary, but how can I insert the information about, say Test1, in each corresponding row?

like image 265
JDS Avatar asked Nov 16 '16 22:11

JDS


People also ask

How do I convert a pandas Dataframe to a long format?

You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd.melt(df, id_vars='col1', value_vars= ['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc. are the columns we unpivot. The following example shows how to use this syntax in practice.

How to reshape a data from wide to long in pandas?

Reshaping a data from wide to long in pandas python is done with melt () function. melt function in pandas is one of the efficient function to transform the data from wide to long format. melt () Function in python pandas depicted with an example. Let’s create a simple data frame to demonstrate our reshape example in python pandas.

Is the Dataframe now in a long format?

The DataFrame is now in a long format. We used the ‘team’ column as the identifier column and we unpivoted the ‘points’, ‘assists’, and ‘rebounds’ columns. Note that we can also use the var_name and value_name arguments to specify the names of the columns in the new long DataFrame:

How to identify wide format variables in Dataframe?

Each row of these wide variables are assumed to be uniquely identified by i (can be a single column name or a list of column names) All remaining variables in the data frame are left intact. The wide-format DataFrame. The stub name (s). The wide format variables are assumed to start with the stub names. Column (s) to use as id variable (s).


1 Answers

You can split your columns into a multi-index column and then reshape your data frame:

df.set_index('Subject', inplace=True)
df.columns = df.columns.str.split("-", expand=True)
df.stack(level=0).rename_axis(['Subject', 'Test']).reset_index()

enter image description here

like image 113
Psidom Avatar answered Oct 26 '22 15:10

Psidom