I'm trying to create a basic scatter plot based on a Pandas dataframe. But when I call the scatter routine I get an error "TypeError: invalid type promotion". Sample code to reproduce the problem is shown below:
t1 = pd.to_datetime('2015-11-01 00:00:00')
t2 = pd.to_datetime('2015-11-02 00:00:00')
Time = pd.Series([t1, t2])
r = pd.Series([-1, 1])
df = pd.DataFrame({'Time': Time, 'Value': r})
print(df)
print(type(df.Time))
print(type(df.Time[0]))
fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.scatter(df.Time, y=df.Value, marker='o')
The resulting output is
Time Value
0 2015-11-01 -1
1 2015-11-02 1
<class 'pandas.core.series.Series'>
<class 'pandas.tslib.Timestamp'>
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-285-f4ed0443bf4d> in <module>()
15 fig = plt.figure(figsize=(x_size,y_size))
16 ax = fig.add_subplot(111)
---> 17 ax.scatter(df.Time, y=df.Value, marker='o')
C:\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, **kwargs)
3635 edgecolors = 'face'
3636
-> 3637 offsets = np.dstack((x, y))
3638
3639 collection = mcoll.PathCollection(
C:\Anaconda3\lib\site-packages\numpy\lib\shape_base.py in dstack(tup)
365
366 """
--> 367 return _nx.concatenate([atleast_3d(_m) for _m in tup], 2)
368
369 def _replace_zero_by_x_arrays(sub_arys):
TypeError: invalid type promotion
Searching around I've found a similar post Pandas Series TypeError and ValueError when using datetime which suggests that the error is caused by having multiple data types in the series. But that does not appear to be the issue in my example, as evidenced by the type information I'm printing.
Note that if I stop using pandas datetime objects and make the 'Time' a float instead this works fine, e.g.
t1 = 1.1 #
t2 = 1.2
Time = pd.Series([t1, t2])
r = pd.Series([-1, 1])
df = pd.DataFrame({'Time': Time, 'Value': r})
print(df)
print(type(df.Time))
print(type(df.Time[0]))
fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.scatter(df.Time, y=df.Value, marker='o')
with output
Time Value
0 1.1 -1
1 1.2 1
<class 'pandas.core.series.Series'>
<class 'numpy.float64'>
and the graph looks just fine. I'm at a loss as to why the use of a datetime is causing the invalid type promotion error? I'm using Python 3.4.3 and pandas 0.16.2.
This error occurs when you attempt to plot values from a pandas DataFrame, but there are no numeric values to plot. This error typically occurs when you think a certain column in the DataFrame is numeric but it turns out to be a different data type.
How to Fix the KeyError? We can simply fix the error by correcting the spelling of the key. If we are not sure about the spelling we can simply print the list of all column names and crosscheck.
It means: 'O' (Python) objects. Source. The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised.
With a DataFrame , pandas creates by default one line plot for each of the columns with numeric data.
Thanks @martinvseticka. I think your assessment is correct based on the numpy code you pointed me to. I was able to simplify your tweaks a bit more (and added a third sample point) to get
t1 = pd.to_datetime('2015-11-01 00:00:00')
t2 = pd.to_datetime('2015-11-02 00:00:00')
t3 = pd.to_datetime('2015-11-03 00:00:00')
Time = pd.Series([t1, t2, t3])
r = pd.Series([-1, 1, 0.5])
df = pd.DataFrame({'Time': Time, 'Value': r})
fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.plot_date(x=df.Time, y=df.Value, marker='o')
The key seems to be calling 'plot_date' rather than 'plot'. This seems to inform mapplotlib to not try to concatenate the arrays.
There is another way, that we should drop uses Series. Just use list for time.
t1 = pd.to_datetime('2015-11-01 00:00:00')
t2 = pd.to_datetime('2015-11-02 00:00:00')
Time = pd.Series([t1, t2])
r = pd.Series([-1, 1])
df = pd.DataFrame({'Time': Time, 'Value': r})
print(df)
print(type(df.Time))
print(type(df.Time[0]))
x_size = 800
y_size = 600
fig = plt.figure(figsize=(x_size,y_size))
ax = fig.add_subplot(111)
ax.scatter(list(df.Time.values), list(df.Value.values), marker='o')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With