Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list all databases and tables in AWS Glue Catalog?

I created a Development Endpoint in the AWS Glue console and now I have access to SparkContext and SQLContext in gluepyspark console.

How can I access the catalog and list all databases and tables? The usual sqlContext.sql("show tables").show() does not work.

What might help is the CatalogConnection Class but I have no idea in which package it is. I tried importing from awsglue.context and no success.

like image 541
Jiří Mauritz Avatar asked Sep 06 '17 16:09

Jiří Mauritz


1 Answers

Glue returns back one page per response. If you have more than 100 tables, make sure you use NextToken to retrieve all tables.

def get_glue_tables(database=None):
    next_token = ""

    while True:
        response = glue_client.get_tables(
            DatabaseName=database,
            NextToken=next_token
        )

        for table in response.get('TableList'):
            print(table.get('Name'))

        next_token = response.get('NextToken')

        if next_token is None:
            break
like image 170
Bao Pham Avatar answered Sep 30 '22 22:09

Bao Pham