Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Composer- How do I install Microsoft SQL Server ODBC drivers on environments

I am new to GCP and Airflow and am trying to run my python pipelines via a simple PYODBC connection via python 3. However, I believe I have found what I need to install on the machines [Microsoft doc]https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-2017 , but I am not sure where to go in GCP to run these commands. I have gone down several deep holes looking for answers, but don't know how to solve the problem

Here is the error I keep seeing when I upload the DAG:

Airflow Error

Here is the PYODBC connection:

pyodbc.connect('DRIVER={Microsoft SQL Server};SERVER=servername;DATABASE=dbname;UID=username;PWD=password')

When I open my gcloud shell in environments and run Microsoft downloads it just aborts, when I downloaded SDK and connected to project from local download it auto aborts or doesn't recognize commands from Microsoft. Can anyone give some simple instruction on where to start and what I am doing wrong?

like image 823
capt-mac Avatar asked Mar 02 '23 19:03

capt-mac


1 Answers

It's Simple ! No Need of DockerFile, KubernetesPodOperator, LD_LIBRARY_PATH, etc just a basic python operator will do

Points to consider

  • GCP Composer Worker's Pod image is ubuntu 1604 (just run a basic python operator with a command os.system('cat /etc/os-release') to check )
  • It has unixodbc-dev already installed on the worker's pod image
  • Composer creates the bucket and mount it with airflow
  • So why not just install the pyodbc from pypi packages and provide the mssql odbc driver as paramter in pyodbc connection method

here 'gs://bucket_created_by_composer' == '/home/airflow/gcs'

gcs bucket created by composer ->
          -> data/
          -> dags/

Step By Step Approach

Step 1: Install pyodbc, mssql odbc on any ubuntu instances to get the driver files

for consideration lets do it on GCP VM Intance with ubuntu 1804 image

#update the packages
sudo apt update
sudo apt-get update -y
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/18.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
sudo apt-get update -y
echo Installing mssql-tools and unixODBC developer...
sudo ACCEPT_EULA=Y apt-get install -y mssql-tools unixodbc-dev
sudo apt-get update -y
sudo apt-get install  -y mssql-tools #it includes sql_cmd and bcp (we dont need those)
sudo apt install python3-pip #installing pip3
pip3 install pyodbc 

Step 2: Get the Driver Files and upload it to the data folder of gcs_bucket created by the composer

cd /opt/microsoft
#now you can see there is one directory 'msodbcsql17', version may change
#we need to upload this directory to the data folder of gcs_bucket

#for this you may choose which ever approach suits you
#copying the directory to /<home/user> for proper zipping/uploading to gcs
cp -r msodbcsql17 /home/<user> #you may need to use sudo 
#upload this /home/<user>/msodbcsql17 to any gcs_bucket 
gsutil cp -r /home/<user>/msodbcsql17 gs://<your-gcs-bucket>

download this folder from gcs bucket to local and the upload this folder to data folder of gcs bucket created by composer

choose any approach/method, main aim is to get the msodbcsql17 folder in the data folder of gcs bucket created by composer

Final structure:

gcs bucket created by composer ->
          -> data/msodbcsql17/
          -> dags/<your_dags.py>

Step 3: using this msodbcsql17 drivers for pyodbc connection

EXAMPLE DAG:

import os
import time
import datetime
import argparse
import json
from airflow import DAG
import airflow

from airflow.operators import python_operator


default_dag_args = {
    'start_date': airflow.utils.dates.days_ago(0), #
    'provide_context': True
}



dag = DAG(
        'pyodbc_test',
        schedule_interval=None, #change for composer
        default_args=default_dag_args
        )


def check_connection(**kwargs):
    print('hello')
    driver='/home/airflow/gcs/data/msodbcsql17/lib64/libmsodbcsql-17.5.so.2.1'
    #this is the main driver file, the exact location can be found on gcs_bucket/data folder or check the /etc/odbcinst.in file of ubuntu instance in which you installed the pyodbc earlier

    def tconnection(ServerIp,LoginName,Password,mssql_portno):
        """ A method which return connection object"""
        import pyodbc
        pyodbc.pooling = False 
        try:   
            sql_conn = pyodbc.connect("DRIVER={4};SERVER={0},{1};UID={2};PWD={3}".format(ServerIp,mssql_portno,LoginName,Password,driver)) 
        except pyodbc.Error as ex:
            sqlstate = ex.args[1]
            raise


        return sql_conn

    con=tconnection('<your-server-ip>','<your-login-name>','<your-password>','1433')
    #recommendation is to take the password and login from airflow connections
    import pandas as pd
    q='select * from <your-db-name>.<your-schema-name>.<your-table-name>'
    df=pd.read_sql(q,con)
    print(df)

Tcheck_connection= python_operator.PythonOperator(
        task_id='Tcheck_connection',
        python_callable=check_connection,
        dag=dag ) 


#calling the task sequence
Tcheck_connection 

PYPI Packages

pyodbc
pandas

Have tested on Composer recently

like image 148
D.C Avatar answered Apr 08 '23 14:04

D.C