Say we have a dataframe set up as follows:
x = pd.DataFrame(np.random.randint(1, 10, 30).reshape(5,6),
columns=[f'col{i}' for i in range(6)])
x['col6'] = np.nan
x['col7'] = np.nan
col0 col1 col2 col3 col4 col5 col6 col7
0 6 5 1 5 2 4 NaN NaN
1 8 8 9 6 7 2 NaN NaN
2 8 3 9 6 6 6 NaN NaN
3 8 4 4 4 8 9 NaN NaN
4 5 3 4 3 8 7 NaN NaN
When calling x.shift(2, axis=1)
, col2 -> col5
shifts correctly, but col6
and col7
stays as NaN
?
How can I overwrite the NaN
in col6
and col7
values with col4
and col5
's values? Is this a bug or intended?
col0 col1 col2 col3 col4 col5 col6 col7
0 NaN NaN 6.0 5.0 1.0 5.0 NaN NaN
1 NaN NaN 8.0 8.0 9.0 6.0 NaN NaN
2 NaN NaN 8.0 3.0 9.0 6.0 NaN NaN
3 NaN NaN 8.0 4.0 4.0 4.0 NaN NaN
4 NaN NaN 5.0 3.0 4.0 3.0 NaN NaN
shift() If you want to shift your column or subtract the column value with the previous row value from the DataFrame, you can do it by using the shift() function. It consists of a scalar parameter called period, which is responsible for showing the number of shifts to be made over the desired axis.
shift() function Shift index by desired number of periods with an optional time freq. This function takes a scalar parameter called the period, which represents the number of shifts to be made over the desired axis. This function is very helpful when dealing with time-series data.
Use dropna() function to drop rows with NaN / None values in pandas DataFrame. Python doesn't support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data.
Here are 4 ways to select all rows with NaN values in Pandas DataFrame: (1) Using isna () to select all rows with NaN under a single DataFrame column:
When we apply the .shift () method to a dataframe itself, then all records in that dataframe are shifted. One of the Pandas .shift () arguments is the periods= argument, which allows us to pass in an integer. The integer determines how many periods to shift the data by.
When you receive a dataset, there may be some NaN values. Pandas Dropna is a useful method that allows you to drop NaN values of the dataframe.In this entire article, I will show you various examples of dealing with NaN values using drona () method. axis : Requires two values 0 and 1. how: any or all value.
One of the Pandas .shift () arguments is the periods= argument, which allows us to pass in an integer. The integer determines how many periods to shift the data by.
It's possible this is a bug, you can use np.roll
to achieve this:
In[11]:
x.apply(lambda x: np.roll(x, 2), axis=1)
Out[11]:
col0 col1 col2 col3 col4 col5 col6 col7
0 NaN NaN 6.0 5.0 1.0 5.0 2.0 4.0
1 NaN NaN 8.0 8.0 9.0 6.0 7.0 2.0
2 NaN NaN 8.0 3.0 9.0 6.0 6.0 6.0
3 NaN NaN 8.0 4.0 4.0 4.0 8.0 9.0
4 NaN NaN 5.0 3.0 4.0 3.0 8.0 7.0
Speedwise, it's probably quicker to construct a df and reuse the existing columns and pass the result of np.roll
as the data arg to the constructor to DataFrame
:
In[12]:
x = pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns)
x
Out[12]:
col0 col1 col2 col3 col4 col5 col6 col7
0 NaN NaN 6.0 5.0 1.0 5.0 2.0 4.0
1 NaN NaN 8.0 8.0 9.0 6.0 7.0 2.0
2 NaN NaN 8.0 3.0 9.0 6.0 6.0 6.0
3 NaN NaN 8.0 4.0 4.0 4.0 8.0 9.0
4 NaN NaN 5.0 3.0 4.0 3.0 8.0 7.0
timings
In[13]:
%timeit pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns)
%timeit x.fillna(0).astype(int).shift(2, axis=1)
10000 loops, best of 3: 117 µs per loop
1000 loops, best of 3: 418 µs per loop
So constructing a new df with the result of np.roll
is quicker than first filling the NaN
values, cast to int
, and then shift
ing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With