Take the following code:
import MySQLdb as mdb
import pandas as pd
con = mdb.connect(db_host, db_user, db_pass, db_name)
query = """SELECT `TIME`.`BID-CLOSE`
FROM `EUR-USD`.`tbl_EUR-USD_1-Day`
WHERE TIME >= '2006-12-15 22:00:00' AND TIME <= '2007-01-03 22:00:00'
ORDER BY TIME ASC;"""
# Create a pandas dataframe from the SQL query
eurusd = pd.read_sql_query(query, con=con, index_col='TIME')
idx = pd.date_range('2006-12-17 22:00:00', '2007-01-03 22:00:00')
eurusd.reindex(idx, fill_value=None)
This gives an output of
BID-CLOSE
2006-12-17 22:00:00 1.30971
2006-12-18 22:00:00 1.31971
2006-12-19 22:00:00 1.31721
2006-12-20 22:00:00 1.31771
2006-12-21 22:00:00 1.31411
2006-12-22 22:00:00 NaN
2006-12-23 22:00:00 NaN
2006-12-24 22:00:00 NaN
2006-12-25 22:00:00 1.30971
2006-12-26 22:00:00 1.31131
2006-12-27 22:00:00 1.31491
2006-12-28 22:00:00 1.32021
2006-12-29 22:00:00 NaN
2006-12-30 22:00:00 NaN
2006-12-31 22:00:00 1.32731
2007-01-01 22:00:00 1.32731
2007-01-02 22:00:00 1.31701
2007-01-03 22:00:00 1.30831
Re-Index the data
eurusd = eurusd.reindex(idx, fill_value=None)
List of interpolate types
methods = ['linear', 'quadratic', 'cubic']
Next line throws an Exception...
pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
ValueError: If using all scalar values, you must pass an index
Following the Interpolation section of this guide http://pandas.pydata.org/pandas-docs/stable/missing_data.html How do I correctly 'pass an index' in this situation?
Update 1
The output of eurusd.interpolate('linear')
BID-CLOSE
2006-12-17 22:00:00 1.309710
2006-12-18 22:00:00 1.319710
2006-12-19 22:00:00 1.317210
2006-12-20 22:00:00 1.317710
2006-12-21 22:00:00 1.314110
2006-12-22 22:00:00 1.313010
2006-12-23 22:00:00 1.311910
2006-12-24 22:00:00 1.310810
2006-12-25 22:00:00 1.309710
2006-12-26 22:00:00 1.311310
2006-12-27 22:00:00 1.314910
2006-12-28 22:00:00 1.320210
2006-12-29 22:00:00 1.322577
2006-12-30 22:00:00 1.324943
2006-12-31 22:00:00 1.327310
2007-01-01 22:00:00 1.327310
2007-01-02 22:00:00 1.317010
2007-01-03 22:00:00 1.308310
Update 2
In[9]: pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
Out[9]:
cubic linear quadratic
2006-12-17 22:00:00 1.309710 1.309710 1.309710
2006-12-18 22:00:00 1.319710 1.319710 1.319710
2006-12-19 22:00:00 1.317210 1.317210 1.317210
2006-12-20 22:00:00 1.317710 1.317710 1.317710
2006-12-21 22:00:00 1.314110 1.314110 1.314110
2006-12-22 22:00:00 1.310762 1.313010 1.307947
2006-12-23 22:00:00 1.309191 1.311910 1.305159
2006-12-24 22:00:00 1.308980 1.310810 1.305747
2006-12-25 22:00:00 1.309710 1.309710 1.309710
2006-12-26 22:00:00 1.311310 1.311310 1.311310
2006-12-27 22:00:00 1.314910 1.314910 1.314910
2006-12-28 22:00:00 1.320210 1.320210 1.320210
2006-12-29 22:00:00 1.323674 1.322577 1.321632
2006-12-30 22:00:00 1.325553 1.324943 1.323998
2006-12-31 22:00:00 1.327310 1.327310 1.327310
2007-01-01 22:00:00 1.327310 1.327310 1.327310
2007-01-02 22:00:00 1.317010 1.317010 1.317010
2007-01-03 22:00:00 1.308310 1.308310 1.308310
If you attempt to create a pandas DataFrame with all scalar values, but you do not pass an index, you will raise the ValueError: if using all scalar values, you must pass an index. You can solve this error by passing an index when creating the DataFrame.
append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value. Parameters: other : DataFrame or Series/dict-like object, or list of these.
The problem is that when you use the DataFrame
constructor:
pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
the value for each m
is a DataFrame
, which will be interpreted as a scalar value, which is admittedly confusing. This constructer expects some sort of sequence or Series
. The following should solve the problem:
pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
Since subsetting on a column returns a Series
. So, for example instead of:
In [34]: pd.DataFrame({'linear':df.interpolate('linear')})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-4b6c095c6da3> in <module>()
----> 1 pd.DataFrame({'linear':df.interpolate('linear')})
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
222 dtype=dtype, copy=copy)
223 elif isinstance(data, dict):
--> 224 mgr = self._init_dict(data, index, columns, dtype=dtype)
225 elif isinstance(data, ma.MaskedArray):
226 import numpy.ma.mrecords as mrecords
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
358 arrays = [data[k] for k in keys]
359
--> 360 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
361
362 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5229 # figure out the index, if necessary
5230 if index is None:
-> 5231 index = extract_index(arrays)
5232 else:
5233 index = _ensure_index(index)
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
5268
5269 if not indexes and not raw_lengths:
-> 5270 raise ValueError('If using all scalar values, you must pass'
5271 ' an index')
5272
ValueError: If using all scalar values, you must pass an index
Use this instead:
In [35]: pd.DataFrame({'linear':df['BID-CLOSE'].interpolate('linear')})
Out[35]:
linear
timestamp
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.319710
2016-10-10 22:00:00 1.317210
2016-10-10 22:00:00 1.317710
2016-10-10 22:00:00 1.314110
2016-10-10 22:00:00 1.313010
2016-10-10 22:00:00 1.311910
2016-10-10 22:00:00 1.310810
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.311310
2016-10-10 22:00:00 1.314910
2016-10-10 22:00:00 1.320210
2016-10-10 22:00:00 1.322577
2016-10-10 22:00:00 1.324943
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.317010
2016-10-10 22:00:00 1.308310
Fair warning, though, I am getting a LinAlgError: singular matrix
error when I try 'quadratic'
and 'cubic'
interpolation on your data. Not sure why though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With