Consider the dataframe <code>df</code> <pre class="prettyprint"><code>df = pd.DataFrame(dict(A=[1, 2], B=['X', 'Y'])) df A B 0 1 X 1 2 Y </code></pre> If I shift along <code>axis=0</code> (the default) <pre class="prettyprint"><code>df.shift() A B 0 NaN NaN 1 1.0 X </code></pre> It pushes all rows downwards one row as expected. But when I shift along <code>axis=1</code> <pre class="prettyprint"><code>df.shift(axis=1) A B 0 NaN NaN 1 NaN NaN </code></pre> Everything is null when I expected <pre class="prettyprint"><code> A B 0 NaN 1 1 NaN 2 </code></pre> I understand why this happened. For <code>axis=0</code>, Pandas is operating column by column where each column is a single <code>dtype</code> and when shifting, there is clear protocol on how to deal with the introduced <code>NaN</code> value at the beginning or end. But when shifting along <code>axis=1</code> we introduce potential ambiguity of <code>dtype</code> from one column to the next. In this case, I'm trying for force <code>int64</code> into an <code>object</code> column and Pandas decides to just null the values. This becomes more problematic when the <code>dtypes</code> are <code>int64</code> and <code>float64</code> <pre class="prettyprint"><code>df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.])) df A B 0 1 1.0 1 2 2.0 </code></pre> And the same thing happens <pre class="prettyprint"><code>df.shift(axis=1) A B 0 NaN NaN 1 NaN NaN </code></pre> <hr> <h3>My Question</h3> What are good options for creating a dataframe that is shifted along <code>axis=1</code> in which the result has shifted values and dtypes? For the <code>int64</code>/<code>float64</code> case the result would look like: <pre class="prettyprint"><code>df_shifted A B 0 NaN 1 1 NaN 2 </code></pre> and df_shifted.dtypes <pre class="prettyprint"><code>A object B int64 dtype: object </code></pre> <hr> A more comprehensive example <pre class="prettyprint"><code>df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.], C=['X', 'Y'], D=[4., 5.], E=[4, 5])) df A B C D E 0 1 1.0 X 4.0 4 1 2 2.0 Y 5.0 5 </code></pre> Should look like this <pre class="prettyprint"><code>df_shifted A B C D E 0 NaN 1 1.0 X 4.0 1 NaN 2 2.0 Y 5.0 df_shifted.dtypes A object B int64 C float64 D object E float64 dtype: object </code></pre>

It turns out that Pandas is shifting over blocks of similar <code>dtypes</code> Define <code>df</code> as <pre class="prettyprint"><code>df = pd.DataFrame(dict( A=[1, 2], B=[3., 4.], C=['X', 'Y'], D=[5., 6.], E=[7, 8], F=['W', 'Z'] )) df # i f o f i o # n l b l n b # t t j t t j # A B C D E F 0 1 3.0 X 5.0 7 W 1 2 4.0 Y 6.0 8 Z </code></pre> It will shift the integers to the next integer column, the floats to the next float column and the objects to the next object column <pre class="prettyprint"><code>df.shift(axis=1) A B C D E F 0 NaN NaN NaN 3.0 1.0 X 1 NaN NaN NaN 4.0 2.0 Y </code></pre> I don't know if that's a good idea, but that is what is happening. <hr> <h3>Approaches</h3> <h3> <code>astype(object)</code> first</h3> <pre class="prettyprint"><code>dtypes = df.dtypes.shift(fill_value=object) df_shifted = df.astype(object).shift(1, axis=1).astype(dtypes) df_shifted A B C D E F 0 NaN 1 3.0 X 5.0 7 1 NaN 2 4.0 Y 6.0 8 </code></pre> <hr> <h3><code>transpose</code></h3> Will make it <code>object</code> <pre class="prettyprint"><code>dtypes = df.dtypes.shift(fill_value=object) df_shifted = df.T.shift().T.astype(dtypes) df_shifted A B C D E F 0 NaN 1 3.0 X 5.0 7 1 NaN 2 4.0 Y 6.0 8 </code></pre> <hr> <h3><code>itertuples</code></h3> <pre class="prettyprint"><code>pd.DataFrame([(np.nan, *t[1:-1]) for t in df.itertuples()], columns=[*df]) A B C D E F 0 NaN 1 3.0 X 5.0 7 1 NaN 2 4.0 Y 6.0 8 </code></pre> Though I'd probably do this <pre class="prettyprint"><code>pd.DataFrame([ (np.nan, *t[:-1]) for t in df.itertuples(index=False, name=None) ], columns=[*df]) </code></pre>

I tried using a <code>numpy</code> method. The method works as long as you keep your data in a numpy array: <pre class="prettyprint"><code>def shift_df(data, n): shifted = np.roll(data, n) shifted[:, :n] = np.NaN return shifted shifted(df, 1) array([[nan, 1, 1.0, 'X', 4.0], [nan, 2, 2.0, 'Y', 5.0]], dtype=object) </code></pre> But when you call the <code>DataFrame</code> constructer, all columns are converted to <code>object</code> although the values in the array are <code>float, int, object</code>: <pre class="prettyprint"><code>def shift_df(data, n): shifted = np.roll(data, n) shifted[:, :n] = np.NaN shifted = pd.DataFrame(shifted) return shifted print(shift_df(df, 1),'\n') print(shift_df(df, 1).dtypes) 0 1 2 3 4 0 NaN 1 1 X 4 1 NaN 2 2 Y 5 0 object 1 object 2 object 3 object 4 object dtype: object </code></pre>

dtypes muck things up when shifting on axis one (columns)

Tags:

python

pandas

Consider the dataframe df

df = pd.DataFrame(dict(A=[1, 2], B=['X', 'Y']))

df

   A  B
0  1  X
1  2  Y

If I shift along axis=0 (the default)

df.shift()

     A    B
0  NaN  NaN
1  1.0    X

It pushes all rows downwards one row as expected.

But when I shift along axis=1

df.shift(axis=1)

    A    B
0 NaN  NaN
1 NaN  NaN

Everything is null when I expected

     A  B
0  NaN  1
1  NaN  2

I understand why this happened. For axis=0, Pandas is operating column by column where each column is a single dtype and when shifting, there is clear protocol on how to deal with the introduced NaN value at the beginning or end. But when shifting along axis=1 we introduce potential ambiguity of dtype from one column to the next. In this case, I'm trying for force int64 into an object column and Pandas decides to just null the values.

This becomes more problematic when the dtypes are int64 and float64

df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.]))

df

   A    B
0  1  1.0
1  2  2.0

And the same thing happens

df.shift(axis=1)

    A   B
0 NaN NaN
1 NaN NaN

My Question

What are good options for creating a dataframe that is shifted along axis=1 in which the result has shifted values and dtypes?

For the int64/float64 case the result would look like:

df_shifted

     A  B
0  NaN  1
1  NaN  2

and

df_shifted.dtypes

A    object
B     int64
dtype: object

A more comprehensive example

df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.], C=['X', 'Y'], D=[4., 5.], E=[4, 5]))

df

   A    B  C    D  E
0  1  1.0  X  4.0  4
1  2  2.0  Y  5.0  5

Should look like this

df_shifted

     A  B    C  D    E
0  NaN  1  1.0  X  4.0
1  NaN  2  2.0  Y  5.0

df_shifted.dtypes

A     object
B      int64
C    float64
D     object
E    float64
dtype: object

986

asked Nov 05 '19 16:11

piRSquared

2 Answers

It turns out that Pandas is shifting over blocks of similar dtypes

Define df as

df = pd.DataFrame(dict(
    A=[1, 2], B=[3., 4.], C=['X', 'Y'],
    D=[5., 6.], E=[7, 8], F=['W', 'Z']
))

df

#  i    f  o    f  i  o
#  n    l  b    l  n  b
#  t    t  j    t  t  j
#
   A    B  C    D  E  F
0  1  3.0  X  5.0  7  W
1  2  4.0  Y  6.0  8  Z

It will shift the integers to the next integer column, the floats to the next float column and the objects to the next object column

df.shift(axis=1)

    A   B    C    D    E  F
0 NaN NaN  NaN  3.0  1.0  X
1 NaN NaN  NaN  4.0  2.0  Y

I don't know if that's a good idea, but that is what is happening.

Approaches

`astype(object)` first

dtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.astype(object).shift(1, axis=1).astype(dtypes)

df_shifted

     A  B    C  D    E  F
0  NaN  1  3.0  X  5.0  7
1  NaN  2  4.0  Y  6.0  8

`transpose`

Will make it object

dtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.T.shift().T.astype(dtypes)

df_shifted

     A  B    C  D    E  F
0  NaN  1  3.0  X  5.0  7
1  NaN  2  4.0  Y  6.0  8

`itertuples`

pd.DataFrame([(np.nan, *t[1:-1]) for t in df.itertuples()], columns=[*df])

     A  B    C  D    E  F
0  NaN  1  3.0  X  5.0  7
1  NaN  2  4.0  Y  6.0  8

Though I'd probably do this

pd.DataFrame([
    (np.nan, *t[:-1]) for t in
    df.itertuples(index=False, name=None)
], columns=[*df])

109

answered Sep 26 '22 02:09

piRSquared

I tried using a numpy method. The method works as long as you keep your data in a numpy array:

def shift_df(data, n):
    shifted = np.roll(data, n)
    shifted[:, :n] = np.NaN

    return shifted

shifted(df, 1)

array([[nan, 1, 1.0, 'X', 4.0],
       [nan, 2, 2.0, 'Y', 5.0]], dtype=object)

But when you call the DataFrame constructer, all columns are converted to object although the values in the array are float, int, object:

def shift_df(data, n):
    shifted = np.roll(data, n)
    shifted[:, :n] = np.NaN
    shifted = pd.DataFrame(shifted)

    return shifted

print(shift_df(df, 1),'\n')
print(shift_df(df, 1).dtypes)

     0  1  2  3  4
0  NaN  1  1  X  4
1  NaN  2  2  Y  5 

0    object
1    object
2    object
3    object
4    object
dtype: object

answered Sep 22 '22 02:09

Erfan

Related questions
                            
                                How to specify return type in an async Python function?
                            
                                Fill in missing dates of groupby
                            
                                How can I use relative importing in Python3 with an if __name__='__main__' block?
                            
                                How to share data in `AWS Step Functions` without passing it between the steps
                            
                                How to accept self-signed certificate from e-mail server via smtplib (TSL)?
                            
                                Python: How to make shaded areas or alternating background color using plotly?
                            
                                python: Invalid base64-encoded string: number of data characters (5) cannot be 1 more than a multiple of 4
                            
                                How to fix 'Install tornado itself to use zmq with the tornado IOLoop.' warning in Python
                            
                                pandas.factorize with custom array datatype
                            
                                Worker process crashes on requests.get() when data is put into input queue before the worker process starts
                            
                                How to show a histogram of percentages instead of counts using Altair
                            
                                ContextVars across modules
                            
                                Is string internally stored as individual characters, each character in memory shared by other similar strings?
                            
                                How to emulate file opened in text mode in Python
                            
                                Nbconvert doesn't display styler dataframe from jupyter notebook
                            
                                Condition statement without loops
                            
                                Do separate Anaconda environments install the same package twice, taking up twice the storage?
                            
                                Python - define constant inside function
                            
                                Comma operator precedence
                            
                                Error: class uri 'eventlet' invalid or not found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

dtypes muck things up when shifting on axis one (columns)

Tags:

python

pandas

My Question

piRSquared

People also ask

2 Answers

Approaches

`astype(object)` first

`transpose`

`itertuples`

piRSquared

Erfan

Recent Activity

Donate For Us

dtypes muck things up when shifting on axis one (columns)

Tags:

python

pandas

My Question

piRSquared

People also ask

2 Answers

Approaches

astype(object) first

transpose

itertuples

piRSquared

Erfan

Related questions

Recent Activity

Donate For Us

`astype(object)` first

`transpose`

`itertuples`