Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.Series() Creation using DataFrame Columns returns NaN Data entries

Im attempting to convert a dataframe into a series using code which, simplified, looks like this:

dates = ['2016-1-{}'.format(i)for i in range(1,21)]
values = [i for i in range(20)]
data = {'Date': dates, 'Value': values}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
ts = pd.Series(df['Value'], index=df['Date'])
print(ts)

However, print output looks like this:

Date
2016-01-01   NaN
2016-01-02   NaN
2016-01-03   NaN
2016-01-04   NaN
2016-01-05   NaN
2016-01-06   NaN
2016-01-07   NaN
2016-01-08   NaN
2016-01-09   NaN
2016-01-10   NaN
2016-01-11   NaN
2016-01-12   NaN
2016-01-13   NaN
2016-01-14   NaN
2016-01-15   NaN
2016-01-16   NaN
2016-01-17   NaN
2016-01-18   NaN
2016-01-19   NaN
2016-01-20   NaN
Name: Value, dtype: float64

Where does NaN come from? Is a view on a DataFrame object not a valid input for the Series class ?

I have found the to_series function for pd.Index objects, is there something similar for DataFrames ?

like image 521
deepbrook Avatar asked Mar 05 '16 19:03

deepbrook


People also ask

Why am I getting NaN in pandas DataFrame?

NaN means missing data Missing data is labelled NaN. Note that np. nan is not equal to Python None.

How NaN count in pandas series?

To count the NaN values in a column in a Pandas DataFrame, we can use the isna() method with sum.

How do I get rid of NaN in pandas?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

Which function returns the non NaN values in the series?

notna() function to detect the non-missing values in the series object. Output : As we can see in the output, the Series. notna() function has returned a boolean object.


1 Answers

I think you can use values, it convert column Value to array:

ts = pd.Series(df['Value'].values, index=df['Date'])
import pandas as pd
import numpy as np
import io

dates = ['2016-1-{}'.format(i)for i in range(1,21)]
values = [i for i in range(20)]
data = {'Date': dates, 'Value': values}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print df['Value'].values
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

ts = pd.Series(df['Value'].values, index=df['Date'])
print(ts)
Date
2016-01-01     0
2016-01-02     1
2016-01-03     2
2016-01-04     3
2016-01-05     4
2016-01-06     5
2016-01-07     6
2016-01-08     7
2016-01-09     8
2016-01-10     9
2016-01-11    10
2016-01-12    11
2016-01-13    12
2016-01-14    13
2016-01-15    14
2016-01-16    15
2016-01-17    16
2016-01-18    17
2016-01-19    18
2016-01-20    19
dtype: int64

Or you can use:

ts1 = pd.Series(data=values, index=pd.to_datetime(dates))
print(ts1)
2016-01-01     0
2016-01-02     1
2016-01-03     2
2016-01-04     3
2016-01-05     4
2016-01-06     5
2016-01-07     6
2016-01-08     7
2016-01-09     8
2016-01-10     9
2016-01-11    10
2016-01-12    11
2016-01-13    12
2016-01-14    13
2016-01-15    14
2016-01-16    15
2016-01-17    16
2016-01-18    17
2016-01-19    18
2016-01-20    19
dtype: int64

Thank you @ajcr for better explanation why you get NaN:

When you give a Series or DataFrame column to pd.Series, it will reindex it using the index you specify. Since your DataFrame column has an integer index (not a date index) you get lots of missing values.

like image 55
jezrael Avatar answered Oct 14 '22 07:10

jezrael