I have the following data frame:
a1 | a2 | a3 | a4
---------------------
Bob | Cat | Dov | Edd
Cat | Dov | Bob | Edd
Edd | Cat | Dov | Bob
and I want to convert it to
Bob | Cat | Dov | Edd
---------------------
a1 | a2 | a3 | a4
a3 | a1 | a2 | a4
a4 | a2 | a3 | a1
Note that the number of columns equals the number of unique values, and the number and order of rows are preserved
You can change the column name of pandas DataFrame by using DataFrame. rename() method and DataFrame. columns() method.
Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.
Change Columns Order Using DataFrame.Use df. reindex(columns=change_column) with a list of columns in the desired order as change_column to reorder the columns. Yields below output.
1) Required approach:
A faster implementation would be to sort the values of the dataframe and align the columns accordingly based on it's obtained indices after np.argsort
.
pd.DataFrame(df.columns[np.argsort(df.values)], df.index, np.unique(df.values))
Applying np.argsort
gives us the data we are looking for:
df.columns[np.argsort(df.values)]
Out[156]:
Index([['a1', 'a2', 'a3', 'a4'], ['a3', 'a1', 'a2', 'a4'],
['a4', 'a2', 'a3', 'a1']],
dtype='object')
2) Slow generalized approach:
More generalized approach while at the cost of some speed / efficiency would be to use apply
after creating a dict
mapping of the strings/values present in the dataframe with their corresponding column names.
Use a dataframe constructor later after converting the obtained series to their list
representation.
pd.DataFrame(df.apply(lambda s: dict(zip(pd.Series(s), pd.Series(s).index)), 1).tolist())
3) Faster generalized approach:
After obtaining a list of dictionaries from df.to_dict
+ orient='records'
, we need to swap it's respective key and value pairs while iterating through them in a loop.
pd.DataFrame([{val:key for key, val in d.items()} for d in df.to_dict('r')])
Sample test case:
df = df.assign(a5=['Foo', 'Bar', 'Baz'])
Both these approaches produce:
@piRSquared EDIT 1
generalized solution
def nic(df):
v = df.values
n, m = v.shape
u, inv = np.unique(v, return_inverse=1)
i = df.index.values
c = df.columns.values
r = np.empty((n, len(u)), dtype=c.dtype)
r[i.repeat(m), inv] = np.tile(c, n)
return pd.DataFrame(r, i, u)
1I would like to thank user @piRSquared for coming up with a really fast and generalized numpy based alternative soln.
You can reshape it with stack and unstack with a swapping of the values and index:
df_swap = (df.stack() # reshape the data frame to long format
.reset_index(level = 1) # set the index(column headers) as a new column
.set_index(0, append=True) # set the values as index
.unstack(level=1)) # reshape the data frame to wide format
df_swap.columns = df_swap.columns.get_level_values(1) # drop level 0 in the column index
df_swap
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With