Migrating specific data from a remote Microsoft SQL server to the Google Cloud Platform

Question

My task is to migrate data from a remote Microsoft SQL server over to a Google Cloud Big Query table. The data in question comes from joining two tables on a common key and filtering with a WHERE clause. An example of the query as a Python f-string is given below:

query = f'''
    SELECT dbo.SalesHeaders.* 
    FROM dbo.Nodes WITH (NOLOCK) 
    INNER JOIN dbo.SalesHeaders WITH (NOLOCK) 
    ON dbo.Nodes.node = dbo.SalesHeaders.node 
    AND CONVERT(DATE , dbo.SalesHeaders.DateTime) BETWEEN '{start.strftime('%Y-%m-%d')}' AND '{end.strftime('%Y-%m-%d')}'
    WHERE dbo.Nodes.companytype = 1
    '''

The current approach to achieving the above task is via a Python script that runs on my local machine. A critical part of its success, is the presence of an installed SQL driver which is required by the pyodc library in Python as to establish a connection. An example is given below:

import pyodbc
import json

CREDENTIALS = 'blah_blah.json'

def establish_connector():
    with open(CREDENTIALS,'r') as file:
        credentials = json.load(file)
    connector = pyodbc.connect(
                               'DRIVER={SQL Server};'+\ # uploads.gaap.com,5143
                               f'SERVER={credentials["server"]};'+\
                               f'DATABASE={credentials["database"]};'+\
                               f'UID={credentials["username"]};'+\
                               f'PWD={credentials["password"]}'
                               )
    return connector

In a previous question of mine I have attempted to replicate this approach such that my script either runs as a Google Cloud function or DataFlow pipeline using the Apache Beam Python SDK. Due to the lack of installed SQL drivers on serverless environments, I have not had much success.

My previous question constrained itself to using either a Cloud function or DataFlow approach. In this question, I am open to a broader scope of solutions. In other words, please suggest and outline any approach that would manage to migrate the data from the given query over to the Google Cloud Platform such that it would ultimately be accessible by Big Query.

I have already started to consider the following which could be expanded upon:

Schedule an instance of a virtual machine (VM) to run the Python script. This would work because one can install the SQL driver on a VM.
Create a SQL server instance and use the Database Migration service. I think that this would be migrating a lot more data than is needed. Once can the build a Big Query table from replicating the above query to pertain to this new SQL server. I am unsure about the costs related to this and am hesitant as such.

Google Cloud Functions would be the wrong tool for the job, at least as far as Python and pyodbc is concerned, since the Ubuntu 18.04/22.04 base images used for Cloud Functions do not include the unixodbc-dev package that's required by pyodbc, nor do they include any ODBC drivers suitable for use with SQL Server. Ref: System Packages Included in Cloud Functions. Creating a VM to run your Python script would probably be a better way to go. — AlwaysLearning
– AlwaysLearning, Commented Oct 30, 2023 at 9:48
@AlwaysLearning Thank you for confirming this to be the better approach. Knowing this makes it easier to commit to the project. Thank you. — Dylan Solms
– Dylan Solms, Commented Oct 30, 2023 at 9:53

Selvakrishnan Rajendran · Accepted Answer · 2024-03-13 16:12:34Z

0

Create a Job from Template feature in Dataflow, seamlessly transferring data from SQL Server to BigQuery. Simply inputting the necessary parameters such as the JDBC URL, username, and password, and then clicking execute, ensures that the job executes flawlessly, delivering the expected results without any hassle.

answered Mar 13, 2024 at 16:12

Selvakrishnan Rajendran

113 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Migrating specific data from a remote Microsoft SQL server to the Google Cloud Platform

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related