Problem
"The df has two columns but sometimes filled with the same values. We need to re-save them into two new columns but in alphabetical order"
Context
We have a pandas df like this:
df = pd.DataFrame([{"name_A": "john", "name_B": "mac"}, {"name_A": "mac", "name_B": "john"}])
Like this:
name_A | name_B
john | mac
mac | john
Trump | Clinton
Desired Output
name_A | name_B | restated_A | restated_B
john | mac | john | mac
mac | john | john | mac
trump | clinton | clinton | trump
In words, we wish to have the columns' values name_A
and name_B
to be alphabetically sorted in restated_A
AND restated_B
Tried so far
bunch of lambdas but couldn't get it to work
Specifications
Python: 3.5.2
Pandas: 0.18.1
You can sort pandas DataFrame by one or multiple (one or more) columns using sort_values() method and by ascending or descending order. To specify the order, you have to use ascending boolean property; False for descending and True for ascending.
To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.
As an alternative vectorized solution, you can use numpy.minimum()
and numpy.maximum()
:
import numpy as np
df['restart_A'] = np.minimum(df['name_A'], df['name_B'])
df['restart_B'] = np.maximum(df['name_A'], df['name_B'])
Or use apply
method:
df[['restated_A', 'restated_B']] = df.apply(lambda r: sorted(r), axis = 1)
Just send df.values
to a list and sort that list for each row. Then reassign the elements in the pairs accordingly.
>>> df = pd.DataFrame([{"name_A": "john", "name_B": "mac"}, {"name_A": "mac", "name_B": "john"}])
>>> restated_values = [sorted(pair) for pair in df.values.tolist()]
>>> restated_values
[['john', 'mac'], ['john', 'mac']]
>>> df['restated_A'] = [pair[0] for pair in restated_values]
>>> df
name_A name_B restated_A
0 john mac john
1 mac john john
>>> df['restated_b'] = [pair[1] for pair in restated_values]
>>> df
name_A name_B restated_A restated_b
0 john mac john mac
1 mac john john mac
Or, you could do this, using a dict
and a new pandas.DataFrame
object:
>>> df = pd.DataFrame([{"name_A": "john", "name_B": "mac"}, {"name_A": "mac", "name_B": "john"}])
>>> restated_values = [sorted(pair) for pair in df.values.tolist()]
>>> restated_values
[['john', 'mac'], ['john', 'mac']]
>>> new_col_rows = {'restated_A': [pair[0] for pair in restated_values], 'restated_B': [pair[1] for pair in restated_values]}
>>> new_col_rows
{'restated_A': ['john', 'john'], 'restated_B': ['mac', 'mac']}
>>> new_df = pd.DataFrame(new_col_rows)
>>> new_df
restated_A restated_B
0 john mac
1 john mac
>>> df = df.join(new_df)
>>> df
name_A name_B restated_A restated_B
0 john mac john mac
1 mac john john mac
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With