I'm trying to access a csv file in my Watson Data Platform catalog. I used the code generation functionality from my DSX notebook: Insert to code
> Insert StreamingBody object
.
The generated code was:
import os
import types
import pandas as pd
import boto3
def __iter__(self): return 0
# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share your notebook.
os.environ['AWS_ACCESS_KEY_ID'] = '******'
os.environ['AWS_SECRET_ACCESS_KEY'] = '******'
endpoint = 's3-api.us-geo.objectstorage.softlayer.net'
bucket = 'catalog-test'
cos_12345 = boto3.resource('s3', endpoint_url=endpoint)
body = cos_12345.Object(bucket,'my.csv').get()['Body']
# add missing __iter__ method so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType(__iter__, body)
df_data_2 = pd.read_csv(body)
df_data_2.head()
When I try to run this code, I get:
/usr/local/src/conda3_runtime.v27/4.1.1/lib/python3.5/site-packages/botocore/endpoint.py in create_endpoint(self, service_model, region_name, endpoint_url, verify, response_parser_factory, timeout, max_pool_connections)
270 if not is_valid_endpoint_url(endpoint_url):
271
--> 272 raise ValueError("Invalid endpoint: %s" % endpoint_url)
273 return Endpoint(
274 endpoint_url,
ValueError: Invalid endpoint: s3-api.us-geo.objectstorage.service.networklayer.com
What is strange is that if I generate the code for SparkSession setup instead, the same endpoint is used but the spark code runs ok.
How can I fix this issue?
I'm presuming the same issue will be encountered for the other Softlayer endpoints so I'm listing them here as well to ensure this question is also applicable for the other softlayer locations:
The solution was to prefix the endpoint with https://
, changing from ...
this
endpoint = 's3-api.us-geo.objectstorage.softlayer.net'
to
endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
For IBM Cloud Object Storage, it should be import ibm_boto3
rather than import boto3
. The original boto3 is for accessing AWS, which uses different authentication. Maybe those two have a different interpretation of the endpoint value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With