Consider the dataframe df
df = pd.DataFrame(dict(A=[1, 2], B=['X', 'Y']))
df
A B
0 1 X
1 2 Y
If I shift along axis=0 (the default)
df.shift()
A B
0 NaN NaN
1 1.0 X
It pushes all rows downwards one row as expected.
But when I shift along axis=1
df.shift(axis=1)
A B
0 NaN NaN
1 NaN NaN
Everything is null when I expected
A B
0 NaN 1
1 NaN 2
I understand why this happened. For axis=0, Pandas is operating column by column where each column is a single dtype and when shifting, there is clear protocol on how to deal with the introduced NaN value at the beginning or end. But when shifting along axis=1 we introduce potential ambiguity of dtype from one column to the next. In this case, I'm trying for force int64 into an object column and Pandas decides to just null the values.
This becomes more problematic when the dtypes are int64 and float64
df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.]))
df
A B
0 1 1.0
1 2 2.0
And the same thing happens
df.shift(axis=1)
A B
0 NaN NaN
1 NaN NaN
What are good options for creating a dataframe that is shifted along axis=1 in which the result has shifted values and dtypes?
For the int64/float64 case the result would look like:
df_shifted
A B
0 NaN 1
1 NaN 2
and
df_shifted.dtypes
A object
B int64
dtype: object
A more comprehensive example
df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.], C=['X', 'Y'], D=[4., 5.], E=[4, 5]))
df
A B C D E
0 1 1.0 X 4.0 4
1 2 2.0 Y 5.0 5
Should look like this
df_shifted
A B C D E
0 NaN 1 1.0 X 4.0
1 NaN 2 2.0 Y 5.0
df_shifted.dtypes
A object
B int64
C float64
D object
E float64
dtype: object
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
Cast a pandas object to a specified dtype dtype . Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame's columns to column-specific types.
The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32. By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit).
It turns out that Pandas is shifting over blocks of similar dtypes
Define df as
df = pd.DataFrame(dict(
A=[1, 2], B=[3., 4.], C=['X', 'Y'],
D=[5., 6.], E=[7, 8], F=['W', 'Z']
))
df
# i f o f i o
# n l b l n b
# t t j t t j
#
A B C D E F
0 1 3.0 X 5.0 7 W
1 2 4.0 Y 6.0 8 Z
It will shift the integers to the next integer column, the floats to the next float column and the objects to the next object column
df.shift(axis=1)
A B C D E F
0 NaN NaN NaN 3.0 1.0 X
1 NaN NaN NaN 4.0 2.0 Y
I don't know if that's a good idea, but that is what is happening.
astype(object) firstdtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.astype(object).shift(1, axis=1).astype(dtypes)
df_shifted
A B C D E F
0 NaN 1 3.0 X 5.0 7
1 NaN 2 4.0 Y 6.0 8
transposeWill make it object
dtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.T.shift().T.astype(dtypes)
df_shifted
A B C D E F
0 NaN 1 3.0 X 5.0 7
1 NaN 2 4.0 Y 6.0 8
itertuplespd.DataFrame([(np.nan, *t[1:-1]) for t in df.itertuples()], columns=[*df])
A B C D E F
0 NaN 1 3.0 X 5.0 7
1 NaN 2 4.0 Y 6.0 8
Though I'd probably do this
pd.DataFrame([
(np.nan, *t[:-1]) for t in
df.itertuples(index=False, name=None)
], columns=[*df])
I tried using a numpy method. The method works as long as you keep your data in a numpy array:
def shift_df(data, n):
shifted = np.roll(data, n)
shifted[:, :n] = np.NaN
return shifted
shifted(df, 1)
array([[nan, 1, 1.0, 'X', 4.0],
[nan, 2, 2.0, 'Y', 5.0]], dtype=object)
But when you call the DataFrame constructer, all columns are converted to object although the values in the array are float, int, object:
def shift_df(data, n):
shifted = np.roll(data, n)
shifted[:, :n] = np.NaN
shifted = pd.DataFrame(shifted)
return shifted
print(shift_df(df, 1),'\n')
print(shift_df(df, 1).dtypes)
0 1 2 3 4
0 NaN 1 1 X 4
1 NaN 2 2 Y 5
0 object
1 object
2 object
3 object
4 object
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With