I have a dataframe like so:
phone_number_1_clean phone_number_2_clean phone_number_3_clean
NaN NaN 8546987
8316589 8751369 NaN
4569874 NaN 2645981
I would like phone_number_1_clean
to be as populated as possible. This will require shifting either phone_number_2_clean
or phone_number_3_clean
to phone_number_1_clean
and vice versa meaning getting phone_number_2_clean
as populated as possible if phone_number_1_clean
is populated etc.
The output should look something like:
phone_number_1_clean phone_number_2_clean phone_number_3_clean
8546987 NaN NaN
8316589 8751369 NaN
4569874 2645981 NaN
I might be able to do it np.where
statements but could be messy.
The approach would preferably be vectorised as will be applied to large-ish dataframes.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
shift() If you want to shift your column or subtract the column value with the previous row value from the DataFrame, you can do it by using the shift() function. It consists of a scalar parameter called period, which is responsible for showing the number of shifts to be made over the desired axis.
Use:
#for each row remove NaNs and create new Series - rows in final df
df1 = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
#if possible different number of columns like original df is necessary reindex
df1 = df1.reindex(columns=range(len(df.columns)))
#assign original columns names
df1.columns = df.columns
print (df1)
phone_number_1_clean phone_number_2_clean phone_number_3_clean
0 8546987 NaN NaN
1 8316589 8751369 NaN
2 4569874 2645981 NaN
Or:
s = df.stack()
s.index = [s.index.get_level_values(0), s.groupby(level=0).cumcount()]
df1 = s.unstack().reindex(columns=range(len(df.columns)))
df1.columns = df.columns
print (df1)
phone_number_1_clean phone_number_2_clean phone_number_3_clean
0 8546987 NaN NaN
1 8316589 8751369 NaN
2 4569874 2645981 NaN
Or a bit changed justify
function:
def justify(a, invalid_val=0, axis=1, side='left'):
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = pd.notnull(a) #changed to pandas notnull
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val, dtype=object)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out
df = pd.DataFrame(justify(df.values, invalid_val=np.nan),
index=df.index, columns=df.columns)
print (df)
phone_number_1_clean phone_number_2_clean phone_number_3_clean
0 8546987 NaN NaN
1 8316589 8751369 NaN
2 4569874 2645981 NaN
Performance:
#3k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [442]: %%timeit
...: df1 = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
...: #if possible different number of columns like original df is necessary reindex
...: df1 = df1.reindex(columns=range(len(df.columns)))
...: #assign original columns names
...: df1.columns = df.columns
...:
1.17 s ± 10.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [443]: %%timeit
...: s = df.stack()
...: s.index = [s.index.get_level_values(0), s.groupby(level=0).cumcount()]
...:
...: df1 = s.unstack().reindex(columns=range(len(df.columns)))
...: df1.columns = df.columns
...:
...:
5.88 ms ± 74.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [444]: %%timeit
...: pd.DataFrame(justify(df.values, invalid_val=np.nan),
index=df.index, columns=df.columns)
...:
941 µs ± 131 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With