Im attempting to convert a dataframe into a series using code which, simplified, looks like this:
dates = ['2016-1-{}'.format(i)for i in range(1,21)]
values = [i for i in range(20)]
data = {'Date': dates, 'Value': values}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
ts = pd.Series(df['Value'], index=df['Date'])
print(ts)
However, print output looks like this:
Date
2016-01-01 NaN
2016-01-02 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 NaN
2016-01-09 NaN
2016-01-10 NaN
2016-01-11 NaN
2016-01-12 NaN
2016-01-13 NaN
2016-01-14 NaN
2016-01-15 NaN
2016-01-16 NaN
2016-01-17 NaN
2016-01-18 NaN
2016-01-19 NaN
2016-01-20 NaN
Name: Value, dtype: float64
Where does NaN
come from? Is a view on a DataFrame
object not a valid input for the Series
class ?
I have found the to_series
function for pd.Index
objects, is there something similar for DataFrame
s ?
NaN means missing data Missing data is labelled NaN. Note that np. nan is not equal to Python None.
To count the NaN values in a column in a Pandas DataFrame, we can use the isna() method with sum.
By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .
notna() function to detect the non-missing values in the series object. Output : As we can see in the output, the Series. notna() function has returned a boolean object.
I think you can use values
, it convert column Value
to array:
ts = pd.Series(df['Value'].values, index=df['Date'])
import pandas as pd
import numpy as np
import io
dates = ['2016-1-{}'.format(i)for i in range(1,21)]
values = [i for i in range(20)]
data = {'Date': dates, 'Value': values}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print df['Value'].values
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
ts = pd.Series(df['Value'].values, index=df['Date'])
print(ts)
Date
2016-01-01 0
2016-01-02 1
2016-01-03 2
2016-01-04 3
2016-01-05 4
2016-01-06 5
2016-01-07 6
2016-01-08 7
2016-01-09 8
2016-01-10 9
2016-01-11 10
2016-01-12 11
2016-01-13 12
2016-01-14 13
2016-01-15 14
2016-01-16 15
2016-01-17 16
2016-01-18 17
2016-01-19 18
2016-01-20 19
dtype: int64
Or you can use:
ts1 = pd.Series(data=values, index=pd.to_datetime(dates))
print(ts1)
2016-01-01 0
2016-01-02 1
2016-01-03 2
2016-01-04 3
2016-01-05 4
2016-01-06 5
2016-01-07 6
2016-01-08 7
2016-01-09 8
2016-01-10 9
2016-01-11 10
2016-01-12 11
2016-01-13 12
2016-01-14 13
2016-01-15 14
2016-01-16 15
2016-01-17 16
2016-01-18 17
2016-01-19 18
2016-01-20 19
dtype: int64
Thank you @ajcr for better explanation why you get NaN
:
When you give a Series
or DataFrame
column to pd.Series
, it will reindex it using the index
you specify. Since your DataFrame
column has an integer index
(not a date index
) you get lots of missing values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With