The schema in my google table looks like this:
price_datetime : DATETIME,
symbol : STRING,
bid_open : FLOAT,
bid_high : FLOAT,
bid_low : FLOAT,
bid_close : FLOAT,
ask_open : FLOAT,
ask_high : FLOAT,
ask_low : FLOAT,
ask_close : FLOAT
After I do a pandas.read_gbq
I get a dataframe
with column dtypes like this:
price_datetime object
symbol object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype: object
Now I want to use to_gbq
so I convert my local dataframe (which I just made) from these dtypes:
price_datetime datetime64[ns]
symbol object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype: object
to these dtypes:
price_datetime object
symbol object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype: object
by doing:
df['price_datetime'] = df['price_datetime'].astype(object)
Now I (think) I am read to use to_gbq
so I do:
import pandas
pandas.io.gbq.to_gbq(df, <table_name>, <project_name>, if_exists='append')
but I get the error:
---------------------------------------------------------------------------
InvalidSchema Traceback (most recent call last)
<ipython-input-15-d5a3f86ad382> in <module>()
1 a = time.time()
----> 2 pandas.io.gbq.to_gbq(df, <table_name>, <project_name>, if_exists='append')
3 b = time.time()
4
5 print(b-a)
C:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
825 elif if_exists == 'append':
826 if not connector.verify_schema(dataset_id, table_id, table_schema):
--> 827 raise InvalidSchema("Please verify that the structure and "
828 "data types in the DataFrame match the "
829 "schema of the destination table.")
InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
I had to do two things that solved the issue for me. First, I deleted my table and reuploaded it with the columns as TIMESTAMP
types rather than DATETIME
types. This made sure that the schema matched when the pandas.DataFrame
with column type datetime64[ns]
was uploaded to using to_gbq
, which converts datetime64[ns]
to TIMESTAMP
type and not to DATETIME
type (for now).
The second thing I did was upgrade from pandas 0.19
to pandas 0.20
. These two things solved my problem of a schema mismatch.
This is probably an issue related to pandas. If you check the code for to_gbq, you'll see that it runs this code:
table_schema = _generate_bq_schema(dataframe)
Where _generate_bq_schema
is given by:
def _generate_bq_schema(df, default_type='STRING'):
""" Given a passed df, generate the associated Google BigQuery schema.
Parameters
----------
df : DataFrame
default_type : string
The default big query type in case the type of the column
does not exist in the schema.
"""
type_mapping = {
'i': 'INTEGER',
'b': 'BOOLEAN',
'f': 'FLOAT',
'O': 'STRING',
'S': 'STRING',
'U': 'STRING',
'M': 'TIMESTAMP'
}
fields = []
for column_name, dtype in df.dtypes.iteritems():
fields.append({'name': column_name,
'type': type_mapping.get(dtype.kind, default_type)})
return {'fields': fields}
As you can see, there's no type mapping to DATETIME
. This inevitably gets mapped to type STRING
(since its dtype.kind
is "O") and then conflict occurs.
The only work around for now that I'm aware of would be to change your table schema from DATETIME
to either TIMESTAMP
or STRING
.
It probably would be a good idea to start a new issue on pandas-bq repository asking to update this code to accept DATETIME
as well.
[EDIT]:
I've opened this issue in their repository.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With