Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to swap a group of column headings with their values in Pandas

I have the following data frame:

a1  | a2  | a3  | a4 
--------------------- 
Bob | Cat | Dov | Edd 
Cat | Dov | Bob | Edd
Edd | Cat | Dov | Bob

and I want to convert it to

Bob | Cat | Dov | Edd
---------------------
a1  | a2  | a3  | a4
a3  | a1  | a2  | a4
a4  | a2  | a3  | a1

Note that the number of columns equals the number of unique values, and the number and order of rows are preserved

like image 793
edmondawad Avatar asked Jan 10 '17 15:01

edmondawad


People also ask

How do I interchange column names in pandas?

You can change the column name of pandas DataFrame by using DataFrame. rename() method and DataFrame. columns() method.

How do I replace values in multiple columns in pandas?

Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.

How do I change the order of values in a column in pandas?

Change Columns Order Using DataFrame.Use df. reindex(columns=change_column) with a list of columns in the desired order as change_column to reorder the columns. Yields below output.


2 Answers

1) Required approach:

A faster implementation would be to sort the values of the dataframe and align the columns accordingly based on it's obtained indices after np.argsort.

pd.DataFrame(df.columns[np.argsort(df.values)], df.index, np.unique(df.values))

enter image description here

Applying np.argsort gives us the data we are looking for:

df.columns[np.argsort(df.values)]
Out[156]:
Index([['a1', 'a2', 'a3', 'a4'], ['a3', 'a1', 'a2', 'a4'],
       ['a4', 'a2', 'a3', 'a1']],
      dtype='object')

2) Slow generalized approach:

More generalized approach while at the cost of some speed / efficiency would be to use apply after creating a dict mapping of the strings/values present in the dataframe with their corresponding column names.

Use a dataframe constructor later after converting the obtained series to their list representation.

pd.DataFrame(df.apply(lambda s: dict(zip(pd.Series(s), pd.Series(s).index)), 1).tolist()) 

3) Faster generalized approach:

After obtaining a list of dictionaries from df.to_dict + orient='records', we need to swap it's respective key and value pairs while iterating through them in a loop.

pd.DataFrame([{val:key for key, val in d.items()} for d in df.to_dict('r')])

Sample test case:

df = df.assign(a5=['Foo', 'Bar', 'Baz'])

Both these approaches produce:

enter image description here


@piRSquared EDIT 1

generalized solution

def nic(df):
    v = df.values
    n, m = v.shape
    u, inv = np.unique(v, return_inverse=1)
    i = df.index.values
    c = df.columns.values
    r = np.empty((n, len(u)), dtype=c.dtype)
    r[i.repeat(m), inv] = np.tile(c, n)
    return pd.DataFrame(r, i, u)

1I would like to thank user @piRSquared for coming up with a really fast and generalized numpy based alternative soln.

like image 167
Nickil Maveli Avatar answered Sep 18 '22 23:09

Nickil Maveli


You can reshape it with stack and unstack with a swapping of the values and index:

df_swap = (df.stack()                     # reshape the data frame to long format
             .reset_index(level = 1)      # set the index(column headers) as a new column
             .set_index(0, append=True)   # set the values as index
             .unstack(level=1))           # reshape the data frame to wide format

df_swap.columns = df_swap.columns.get_level_values(1)   # drop level 0 in the column index
df_swap

enter image description here

like image 43
Psidom Avatar answered Sep 21 '22 23:09

Psidom