Consider the dataframe df
df = pd.DataFrame(dict(A=[1, 2], B=['X', 'Y']))
df
A B
0 1 X
1 2 Y
If I shift along axis=0
(the default)
df.shift()
A B
0 NaN NaN
1 1.0 X
It pushes all rows downwards one row as expected.
But when I shift along axis=1
df.shift(axis=1)
A B
0 NaN NaN
1 NaN NaN
Everything is null when I expected
A B
0 NaN 1
1 NaN 2
I understand why this happened. For axis=0
, Pandas is operating column by column where each column is a single dtype
and when shifting, there is clear protocol on how to deal with the introduced NaN
value at the beginning or end. But when shifting along axis=1
we introduce potential ambiguity of dtype
from one column to the next. In this case, I'm trying for force int64
into an object
column and Pandas decides to just null the values.
This becomes more problematic when the dtypes
are int64
and float64
df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.]))
df
A B
0 1 1.0
1 2 2.0
And the same thing happens
df.shift(axis=1)
A B
0 NaN NaN
1 NaN NaN
What are good options for creating a dataframe that is shifted along axis=1
in which the result has shifted values and dtypes?
For the int64
/float64
case the result would look like:
df_shifted
A B
0 NaN 1
1 NaN 2
and
df_shifted.dtypes
A object
B int64
dtype: object
A more comprehensive example
df = pd.DataFrame(dict(A=[1, 2], B=[1., 2.], C=['X', 'Y'], D=[4., 5.], E=[4, 5]))
df
A B C D E
0 1 1.0 X 4.0 4
1 2 2.0 Y 5.0 5
Should look like this
df_shifted
A B C D E
0 NaN 1 1.0 X 4.0
1 NaN 2 2.0 Y 5.0
df_shifted.dtypes
A object
B int64
C float64
D object
E float64
dtype: object
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
Cast a pandas object to a specified dtype dtype . Use a numpy.dtype or Python type to cast entire pandas object to the same type. Alternatively, use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame's columns to column-specific types.
The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32. By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit).
It turns out that Pandas is shifting over blocks of similar dtypes
Define df
as
df = pd.DataFrame(dict(
A=[1, 2], B=[3., 4.], C=['X', 'Y'],
D=[5., 6.], E=[7, 8], F=['W', 'Z']
))
df
# i f o f i o
# n l b l n b
# t t j t t j
#
A B C D E F
0 1 3.0 X 5.0 7 W
1 2 4.0 Y 6.0 8 Z
It will shift the integers to the next integer column, the floats to the next float column and the objects to the next object column
df.shift(axis=1)
A B C D E F
0 NaN NaN NaN 3.0 1.0 X
1 NaN NaN NaN 4.0 2.0 Y
I don't know if that's a good idea, but that is what is happening.
astype(object)
firstdtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.astype(object).shift(1, axis=1).astype(dtypes)
df_shifted
A B C D E F
0 NaN 1 3.0 X 5.0 7
1 NaN 2 4.0 Y 6.0 8
transpose
Will make it object
dtypes = df.dtypes.shift(fill_value=object)
df_shifted = df.T.shift().T.astype(dtypes)
df_shifted
A B C D E F
0 NaN 1 3.0 X 5.0 7
1 NaN 2 4.0 Y 6.0 8
itertuples
pd.DataFrame([(np.nan, *t[1:-1]) for t in df.itertuples()], columns=[*df])
A B C D E F
0 NaN 1 3.0 X 5.0 7
1 NaN 2 4.0 Y 6.0 8
Though I'd probably do this
pd.DataFrame([
(np.nan, *t[:-1]) for t in
df.itertuples(index=False, name=None)
], columns=[*df])
I tried using a numpy
method. The method works as long as you keep your data in a numpy array:
def shift_df(data, n):
shifted = np.roll(data, n)
shifted[:, :n] = np.NaN
return shifted
shifted(df, 1)
array([[nan, 1, 1.0, 'X', 4.0],
[nan, 2, 2.0, 'Y', 5.0]], dtype=object)
But when you call the DataFrame
constructer, all columns are converted to object
although the values in the array are float, int, object
:
def shift_df(data, n):
shifted = np.roll(data, n)
shifted[:, :n] = np.NaN
shifted = pd.DataFrame(shifted)
return shifted
print(shift_df(df, 1),'\n')
print(shift_df(df, 1).dtypes)
0 1 2 3 4
0 NaN 1 1 X 4
1 NaN 2 2 Y 5
0 object
1 object
2 object
3 object
4 object
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With