Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas shift rows NaNs

Tags:

python

pandas

Say we have a dataframe set up as follows:

x = pd.DataFrame(np.random.randint(1, 10, 30).reshape(5,6),
                 columns=[f'col{i}' for i in range(6)])
x['col6'] = np.nan
x['col7'] = np.nan

    col0    col1    col2    col3    col4    col5    col6    col7
 0   6       5        1       5       2       4      NaN    NaN
 1   8       8        9       6       7       2      NaN    NaN
 2   8       3        9       6       6       6      NaN    NaN
 3   8       4        4       4       8       9      NaN    NaN
 4   5       3        4       3       8       7      NaN    NaN     

When calling x.shift(2, axis=1), col2 -> col5 shifts correctly, but col6 and col7 stays as NaN? How can I overwrite the NaN in col6 and col7 values with col4 and col5's values? Is this a bug or intended?

    col0    col1    col2    col3    col4    col5    col6    col7
0   NaN      NaN    6.0     5.0     1.0      5.0    NaN     NaN
1   NaN      NaN    8.0     8.0     9.0      6.0    NaN     NaN
2   NaN      NaN    8.0     3.0     9.0      6.0    NaN     NaN
3   NaN      NaN    8.0     4.0     4.0      4.0    NaN     NaN
4   NaN      NaN    5.0     3.0     4.0      3.0    NaN     NaN
like image 944
A H Avatar asked Feb 09 '18 09:02

A H


People also ask

How do I move rows in pandas DataFrame?

shift() If you want to shift your column or subtract the column value with the previous row value from the DataFrame, you can do it by using the shift() function. It consists of a scalar parameter called period, which is responsible for showing the number of shifts to be made over the desired axis.

What does .shift do in pandas?

shift() function Shift index by desired number of periods with an optional time freq. This function takes a scalar parameter called the period, which represents the number of shifts to be made over the desired axis. This function is very helpful when dealing with time-series data.

How do I remove NaN from pandas?

Use dropna() function to drop rows with NaN / None values in pandas DataFrame. Python doesn't support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data.

How to select all rows with NaN values in pandas Dataframe?

Here are 4 ways to select all rows with NaN values in Pandas DataFrame: (1) Using isna () to select all rows with NaN under a single DataFrame column:

How do you shift data in pandas Dataframe?

When we apply the .shift () method to a dataframe itself, then all records in that dataframe are shifted. One of the Pandas .shift () arguments is the periods= argument, which allows us to pass in an integer. The integer determines how many periods to shift the data by.

How to deal with NaN values in pandas dropna?

When you receive a dataset, there may be some NaN values. Pandas Dropna is a useful method that allows you to drop NaN values of the dataframe.In this entire article, I will show you various examples of dealing with NaN values using drona () method. axis : Requires two values 0 and 1. how: any or all value.

What is periods= in pandas shift?

One of the Pandas .shift () arguments is the periods= argument, which allows us to pass in an integer. The integer determines how many periods to shift the data by.


1 Answers

It's possible this is a bug, you can use np.roll to achieve this:

In[11]:
x.apply(lambda x: np.roll(x, 2), axis=1)

Out[11]: 
   col0  col1  col2  col3  col4  col5  col6  col7
0   NaN   NaN   6.0   5.0   1.0   5.0   2.0   4.0
1   NaN   NaN   8.0   8.0   9.0   6.0   7.0   2.0
2   NaN   NaN   8.0   3.0   9.0   6.0   6.0   6.0
3   NaN   NaN   8.0   4.0   4.0   4.0   8.0   9.0
4   NaN   NaN   5.0   3.0   4.0   3.0   8.0   7.0

Speedwise, it's probably quicker to construct a df and reuse the existing columns and pass the result of np.roll as the data arg to the constructor to DataFrame:

In[12]:
x = pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns)
x

Out[12]: 
   col0  col1  col2  col3  col4  col5  col6  col7
0   NaN   NaN   6.0   5.0   1.0   5.0   2.0   4.0
1   NaN   NaN   8.0   8.0   9.0   6.0   7.0   2.0
2   NaN   NaN   8.0   3.0   9.0   6.0   6.0   6.0
3   NaN   NaN   8.0   4.0   4.0   4.0   8.0   9.0
4   NaN   NaN   5.0   3.0   4.0   3.0   8.0   7.0

timings

In[13]:

%timeit pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns)
%timeit x.fillna(0).astype(int).shift(2, axis=1)

10000 loops, best of 3: 117 µs per loop
1000 loops, best of 3: 418 µs per loop

So constructing a new df with the result of np.roll is quicker than first filling the NaN values, cast to int, and then shifting.

like image 133
EdChum Avatar answered Oct 22 '22 15:10

EdChum