Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas type error trying to plot

I'm trying to create a basic scatter plot based on a Pandas dataframe. But when I call the scatter routine I get an error "TypeError: invalid type promotion". Sample code to reproduce the problem is shown below:

t1 = pd.to_datetime('2015-11-01 00:00:00')
t2 = pd.to_datetime('2015-11-02 00:00:00')

Time = pd.Series([t1, t2])
r = pd.Series([-1, 1])

df = pd.DataFrame({'Time': Time, 'Value': r})
print(df)

print(type(df.Time))
print(type(df.Time[0]))

fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.scatter(df.Time, y=df.Value, marker='o')

The resulting output is

        Time  Value
0 2015-11-01     -1
1 2015-11-02      1
<class 'pandas.core.series.Series'>
<class 'pandas.tslib.Timestamp'>

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-285-f4ed0443bf4d> in <module>()
     15 fig = plt.figure(figsize=(x_size,y_size))
     16 ax = fig.add_subplot(111)
---> 17 ax.scatter(df.Time, y=df.Value, marker='o')

C:\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x,    y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, **kwargs)
   3635             edgecolors = 'face'
   3636 
-> 3637         offsets = np.dstack((x, y))
   3638 
   3639         collection = mcoll.PathCollection(

C:\Anaconda3\lib\site-packages\numpy\lib\shape_base.py in dstack(tup)
    365 
    366     """
--> 367     return _nx.concatenate([atleast_3d(_m) for _m in tup], 2)
    368 
    369 def _replace_zero_by_x_arrays(sub_arys):

TypeError: invalid type promotion

Searching around I've found a similar post Pandas Series TypeError and ValueError when using datetime which suggests that the error is caused by having multiple data types in the series. But that does not appear to be the issue in my example, as evidenced by the type information I'm printing.

Note that if I stop using pandas datetime objects and make the 'Time' a float instead this works fine, e.g.

t1 = 1.1 #
t2 = 1.2

Time = pd.Series([t1, t2])
r = pd.Series([-1, 1])

df = pd.DataFrame({'Time': Time, 'Value': r})
print(df)

print(type(df.Time))
print(type(df.Time[0]))

fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.scatter(df.Time, y=df.Value, marker='o')

with output

   Time  Value
0   1.1     -1
1   1.2      1
<class 'pandas.core.series.Series'>
<class 'numpy.float64'>

and the graph looks just fine. I'm at a loss as to why the use of a datetime is causing the invalid type promotion error? I'm using Python 3.4.3 and pandas 0.16.2.

like image 337
Tom Johnson Avatar asked Nov 12 '15 16:11

Tom Johnson


People also ask

What is Typeerror in pandas?

This error occurs when you attempt to plot values from a pandas DataFrame, but there are no numeric values to plot. This error typically occurs when you think a certain column in the DataFrame is numeric but it turns out to be a different data type.

How do I fix pandas key error?

How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.

What is Dtype O?

It means: 'O' (Python) objects. Source. The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised.

What is the default plot () type in pandas plotting?

With a DataFrame , pandas creates by default one line plot for each of the columns with numeric data.


2 Answers

Thanks @martinvseticka. I think your assessment is correct based on the numpy code you pointed me to. I was able to simplify your tweaks a bit more (and added a third sample point) to get

t1 = pd.to_datetime('2015-11-01 00:00:00')
t2 = pd.to_datetime('2015-11-02 00:00:00')
t3 = pd.to_datetime('2015-11-03 00:00:00')

Time = pd.Series([t1, t2, t3])
r = pd.Series([-1, 1, 0.5])

df = pd.DataFrame({'Time': Time, 'Value': r})

fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.plot_date(x=df.Time, y=df.Value, marker='o')

The key seems to be calling 'plot_date' rather than 'plot'. This seems to inform mapplotlib to not try to concatenate the arrays.

like image 78
Tom Johnson Avatar answered Oct 19 '22 20:10

Tom Johnson


There is another way, that we should drop uses Series. Just use list for time.

t1 = pd.to_datetime('2015-11-01 00:00:00')
t2 = pd.to_datetime('2015-11-02 00:00:00')

Time = pd.Series([t1, t2])
r = pd.Series([-1, 1])

df = pd.DataFrame({'Time': Time, 'Value': r})
print(df)

print(type(df.Time))
print(type(df.Time[0]))
x_size = 800
y_size = 600
fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.scatter(list(df.Time.values), list(df.Value.values), marker='o')
like image 32
Jeff Avatar answered Oct 19 '22 19:10

Jeff