Side Projects

Anyone (everyone?) who has ever tried to learn a programming language knows that to really learn, you need a project.

I’m a DBA by trade and as such…haven’t really spent that much time learning programming languages so over the last year I’ve been playing around with a couple of projects using Python and Go. AI is everywhere at the moment so I thought I’d throw that in there because, well, why not?

So here I thought I’d briefly show two projects that when I have some free time (ha ha ha) I bounce from error to error with.

I’ve linked the gihub repos here as well…so you can have a look at the code (be kind). One thing I will admit to from the start…ChatGPT has 100% been used. And why not?

It’s a tool that’s available, and allows someone like me to get something working that I can then improve on. Although I will say…it doesn’t half spit out some rubbish sometimes…I think I’ve spent most of my time working out why the code it generates doesn’t work!

But, eventually, I got a both projects into a working demo state…so here they are!

Prusk-sql

The first one is a command line tool to connect to a SQL instance in a container, display the SQL version, and have an optional flag to list the databases.

OK, so there are other (better) tools out there to do this BUT this came around as I’ve spent quite a bit of time working with Helm and Kubernetes, both of which are written in Go…so I thought I’d learn a little bit!

When I’m testing SQL Server in Kubernetes/containers the first thing I do is confirm the instance is up and running by retrieving the SQL version and then list the databases present. So this helps by combining those checks into an easy one-liner…

https://github.com/dbafromthecold/prusk-sql

Burrito-Bot

If I didn’t know what I was doing with prusk-sql, I really don’t know what I’m doing here!

The idea is to have a chat bot that I can ask for burrito recommendations (I know, I know…niche or what?). The basic plan is to load data from Yelp from all restaurants in a city (starting with Dublin…but soon…THE WORLD…or maybe Cork) and then ping an LLM to respond to any input from the user.

And it works! Kinda…there’s a lot of work to be done here. I want to strip out pretty much all of the existing code and replace with something like python ChatterBot. I also need to work out how I can get it to give me recommendations for restaurants in a specific area of the city.

But for now, I’m just happy that it doesn’t spam errors!

https://github.com/dbafromthecold/burrito-bot

Anyhoo, those are the two projects I’ve been dabbling with…I hope that you are having fun with your side projects. May your errors be clear and actually point to the problem 🙂

Thanks for reading!

SQL Server Distributed Availability Groups and Kubernetes

A while back I wrote about how to use a Cross Platform (or Clusterless) Availability Group to seed a database from a Windows SQL instance into a pod in Kubernetes.

I was talking with a colleague last week and they asked, “What if the existing Windows instance is already in an Availability Group?”

This is a fair question, as it’s fairly rare (in my experience) to run a standalone SQL instance in production…most instances are in some form of HA setup, be it a Failover Cluster Instance or an Availability Group.

Failover Cluster Instances will work with a clusterless Availability Group but it’s a different story when it comes to existing Availability Groups.

A Linux node cannot be added to an existing Windows Availability Group (trust me, I tried for longer than I’m going to admit) so the only way to do it is to use a Distributed Availability Group.

So let’s run through the process!

Here is the existing Windows Availability Group: –

Just a standard, 2 node AG with one database already synchronized across the nodes. It’s that database we are going to seed over to the pod running on the Kubernetes cluster using a Distributed Availability Group.

So here’s the Kubernetes cluster: –

kubectl get nodes

Four nodes, one control plane node and three worker nodes.

OK, so first thing to do is deploy a statefulset running one SQL Server pod (using a file called sqlserver-statefulset.yaml): –

kubectl apply -f .\sqlserver-statefulset.yaml

Here’s the manifest of the statefulset: –

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mssql-statefulset
spec:
  serviceName: "mssql"
  replicas: 1
  podManagementPolicy: Parallel
  selector:
    matchLabels:
      name: mssql-pod
  template:
    metadata:
      labels:
        name: mssql-pod
      annotations:
        stork.libopenstorage.org/disableHyperconvergence: "true"
    spec:
      securityContext:
        fsGroup: 10001
      hostAliases:
      - ip: "10.225.115.129"
        hostnames:
        - "z-ap-sql-10"
      containers:
        - name: mssql-container
          image: mcr.microsoft.com/mssql/server:2022-CU15-ubuntu-20.04
          ports:
            - containerPort: 1433
              name: mssql-port
          env:
            - name: MSSQL_PID
              value: "Developer"
            - name: ACCEPT_EULA
              value: "Y"
            - name: MSSQL_AGENT_ENABLED
              value: "1"
            - name: MSSQL_ENABLE_HADR
              value: "1"
            - name: MSSQL_SA_PASSWORD
              value: "Testing1122"
          volumeMounts:
            - name: sqlsystem
              mountPath: /var/opt/mssql
            - name: sqldata
              mountPath: /var/opt/sqlserver/data
  volumeClaimTemplates:
    - metadata:
        name: sqlsystem
      spec:
        accessModes:
         - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        storageClassName: mssql-sc
    - metadata:
        name: sqldata
      spec:
        accessModes:
         - ReadWriteOnce
        resources:
          requests:
            storage: 25Gi
        storageClassName: mssql-sc

Like my last post, this is pretty stripped down. No resources limits, tolerations etc. It has two persistent volumes, one for the system databases and one for the user databases from a storage class already configured in the cluster.

One thing to note: –

hostAliases:
- ip: "10.225.115.129"
  hostnames:
  - "z-ap-sql-10"

Here an entry in the pod’s hosts file is being created for the listener of the Windows Availability Group.

Next thing to do is deploy two services, one so that we can connect to the SQL instance (on port 1433) and one for the AG (port 5022): –

kubectl apply -f .\sqlserver-services.yaml

Here’s the manifest for the services: –

apiVersion: v1
kind: Service
metadata:
  name: mssql-service
spec:
  ports:
  - name: mssql-ports
    port: 1433
    targetPort: 1433
  selector:
    name: mssql-pod
  type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  name: mssql-ha-service
spec:
  ports:
  - name: mssql-ha-ports
    port: 5022
    targetPort: 5022
  selector:
    name: mssql-pod
  type: LoadBalancer

NOTE – We could use just one service with multiple ports configured but I’m keeping them separate here to try and keep things as clear as possible.

Check that everything looks OK: –

kubectl get all

Now we need to create master key, login, and user in all instances: –

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<C0m9L3xP@55w0rd!>';
CREATE LOGIN [dbm_login] WITH PASSWORD = '<C0m9L3xP@55w0rd!>';
CREATE USER dbm_user FOR LOGIN dbm_login;

Then create a certificate in the SQL instance in the pod: –

CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'Mirroring_certificate', EXPIRY_DATE = '20301031'

Backup that certificate: –

BACKUP CERTIFICATE dbm_certificate
TO FILE = '/var/opt/mssql/data/dbm_certificate.cer'
WITH PRIVATE KEY (
        FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
        ENCRYPTION BY PASSWORD = '<C0m9L3xP@55w0rd!>'
    );

Copy the certificate locally: –

kubectl cp mssql-statefulset-0:var/opt/mssql/data/dbm_certificate.cer ./dbm_certificate.cer
kubectl cp mssql-statefulset-0:var/opt/mssql/data/dbm_certificate.pvk ./dbm_certificate.pvk

And then copy the files to the Windows boxes: –

Copy-Item dbm_certificate.cer \\z-ap-sql-02\E$\SQLBackup1\ -Force
Copy-Item dbm_certificate.pvk \\z-ap-sql-02\E$\SQLBackup1\ -Force
Copy-Item dbm_certificate.cer \\z-ap-sql-03\E$\SQLBackup1\ -Force
Copy-Item dbm_certificate.pvk \\z-ap-sql-03\E$\SQLBackup1\ -Force

Once the files are on the Windows boxes, we can create the certificate in each Windows SQL instance: –

CREATE CERTIFICATE dbm_certificate
    AUTHORIZATION dbm_user
    FROM FILE = 'E:\SQLBackup1\dbm_certificate.cer'
    WITH PRIVATE KEY (
    FILE = 'E:\SQLBackup1\dbm_certificate.pvk',
    DECRYPTION BY PASSWORD = '<C0m9L3xP@55w0rd!>'
)

OK, great! Now we need to create a mirroring endpoint in the SQL instance in the pod: –

CREATE ENDPOINT [Hadr_endpoint]
    STATE = STARTED
    AS TCP (   
        LISTENER_PORT = 5022,
        LISTENER_IP = ALL)
    FOR DATA_MIRRORING (
        ROLE = ALL,
        AUTHENTICATION = WINDOWS CERTIFICATE [dbm_certificate],
        ENCRYPTION = REQUIRED ALGORITHM AES
        );
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [dbm_login];

There are already endpoints in the Windows instances, but we need to update them to use the certificate for authentication: –

ALTER ENDPOINT [Hadr_endpoint]
    STATE = STARTED
    AS TCP (   
        LISTENER_PORT = 5022,
        LISTENER_IP = ALL)
    FOR DATABASE_MIRRORING (
    AUTHENTICATION = WINDOWS CERTIFICATE [dbm_certificate]
    , ENCRYPTION = REQUIRED ALGORITHM AES
    );
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [dbm_login];

Now we can create a one node Clusterless Availability Group in the SQL instance in the pod: –

CREATE AVAILABILITY GROUP [AG2]
   WITH (CLUSTER_TYPE=NONE)
   FOR
   REPLICA ON   
      'mssql-statefulset-0' WITH   
         (  
         ENDPOINT_URL = 'TCP://mssql-statefulset-0.com:5022',   
		 FAILOVER_MODE = MANUAL
		,AVAILABILITY_MODE = SYNCHRONOUS_COMMIT
		,BACKUP_PRIORITY = 50
		,SEEDING_MODE = AUTOMATIC
		,SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)
	)    

No listener here, we are going to use the mssql-ha-service as the endpoint for the Distributed Availability Group.

OK, so on the primary node of the Windows Availability Group, we can create the Distributed Availability Group: –

CREATE AVAILABILITY GROUP [DistributedAG]  
   WITH (DISTRIBUTED)   
   AVAILABILITY GROUP ON  
      'AG1' WITH    
      (   
         LISTENER_URL = 'tcp://Z-AP-SQL-10:5022',    
         AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
         FAILOVER_MODE = MANUAL,   
         SEEDING_MODE = AUTOMATIC   
      ),   
      'AG2' WITH    
      (   
         LISTENER_URL = 'tcp://10.225.115.131:5022',   
         AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
         FAILOVER_MODE = MANUAL,   
         SEEDING_MODE = AUTOMATIC   
      );    

We could use a host file entry for the URL in AG2 (I did that in the previous post) but here we’ll just use the IP address of the mssql-ha-service.

OK, nearly there! We now have to join the Availability Group in the SQL instance in the pod: –

ALTER AVAILABILITY GROUP [DistributedAG]  
   JOIN   
   AVAILABILITY GROUP ON  
      'AG1' WITH    
      (   
         LISTENER_URL = 'tcp://Z-AP-SQL-10:5022',    
         AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
         FAILOVER_MODE = MANUAL,   
         SEEDING_MODE = AUTOMATIC   
      ),   
      'AG2' WITH    
      (   
         LISTENER_URL = 'tcp://10.225.115.131:5022',   
         AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,   
         FAILOVER_MODE = MANUAL,   
         SEEDING_MODE = AUTOMATIC   
      );    

And that should be it! If we now connect to the SQL instance in the pod…the database is there!

There it is! OK, one thing I haven’t gone through here is how to get auto-seeding working from Windows into a Linux SQL instance. I went through how that works in my previous post (here) but the gist is as long as the database data and log files are located under the Windows SQL instance’s default data and log path…they will auto-seed to the Linux SQL instance’s default data and log paths.

So that’s how to seed a database from a SQL instance that is in a Windows Availability Group into a SQL instance running in a pod in a Kubernetes cluster using a Distributed Availability Group!

Phew! Thanks for reading!

SQL Server Cross Platform Availability Groups and Kubernetes

Say we have a database that we want to migrate a copy of into Kubernetes for test/dev purposes, and we don’t want to backup/restore.

How can it be done?

Well, with cross platform availability groups! We can deploy a pod to our Kubernetes cluster, create the availability group, and then auto-seed our database!

We’ll run through how to do this but first, I just want to point out that everything here firmly belongs in this category…

Ok, now that’s out the way…let’s see how this can be done.

We’ll need a “normal” instance of SQL Server running on Windows and a Kubernetes cluster.

Here I have one instance of SQL Server running SQL 2022 CU15 (HA enabled): –

And a Kubernetes cluster running v1.29.4: –

OK, now we need to deploy a SQL instance to the Kubernetes cluster. Going to use a statefulset for this: –

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mssql-statefulset
spec:
  serviceName: "mssql"
  replicas: 1
  selector:
    matchLabels:
      name: mssql-pod
  template:
    metadata:
      labels:
        name: mssql-pod
    spec:
      securityContext:
        fsGroup: 10001
      hostAliases:
      - ip: "10.225.115.136"
        hostnames:
        - "z-ap-sql-01"
      containers:
        - name: mssql-container
          image: mcr.microsoft.com/mssql/server:2022-CU15-ubuntu-20.04
          ports:
            - containerPort: 1433
              name: mssql-port
          env:
            - name: MSSQL_PID
              value: "Developer"
            - name: ACCEPT_EULA
              value: "Y"
            - name: MSSQL_AGENT_ENABLED
              value: "1"
            - name: MSSQL_ENABLE_HADR
              value: "1"
            - name: MSSQL_DATA_DIR
              value: /var/opt/sqlserver/data
            - name: MSSQL_LOG_DIR
              value: /var/opt/sqlserver/data
            - name: MSSQL_SA_PASSWORD
              value: "Testing1122"
          volumeMounts:
            - name: sqlsystem
              mountPath: /var/opt/mssql
            - name: sqldata
              mountPath: /var/opt/sqlserver/data
  volumeClaimTemplates:
    - metadata:
        name: sqlsystem
      spec:
        accessModes:
         - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
        storageClassName: mssql-sc
    - metadata:
        name: sqldata
      spec:
        accessModes:
         - ReadWriteOnce
        resources:
          requests:
            storage: 25Gi
        storageClassName: mssql-sc

This is fairly stripped down for this example. No tolerations, resource limits etc. Two persistent volumes will be created, one for the system databases and one for the user database data and log from a storage class already configured in the cluster.

A couple of things to note…

- name: MSSQL_ENABLE_HADR
              value: "1"

Here, HA is being enabled.

      hostAliases:
      - ip: "10.225.115.136"
        hostnames:
        - "z-ap-sql-01"

And here, an entry in the pod’s hosts file is being added for the SQL instance running on Windows.

OK, let’s deploy that (file is sqlserver-statefulset.yaml): –

kubectl apply -f sqlserver-statefulset.yaml

Next thing to do is deploy two services, one so that we can connect to the SQL instance (on port 1433) and one for the AG (port 5022): –

apiVersion: v1
kind: Service
metadata:
  name: mssql-service
spec:
  ports:
  - name: mssql-ports
    port: 1433
    targetPort: 1433
  selector:
    name: mssql-pod
  type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
  name: mssql-ha-service
spec:
  ports:
  - name: mssql-ha-ports
    port: 5022
    targetPort: 5022
  selector:
    name: mssql-pod
  type: LoadBalancer

And let’s deploy that (file is sqlserver-service.yaml): –

kubectl apply -f sqlserver-service.yaml

Check that everything looks OK: –

kubectl get all

Great! Ok, now an entry in the SQL on Windows hosts file needs to be created for the external IP address of the service listening on port 5022.

In this example: –

10.225.115.132 mssql-statefulset-0

Confirm that we can connect to the SQL instance in Kubernetes in SSMS: –

Let’s start building the availability group!

Following the Microsoft guide here: –
https://learn.microsoft.com/en-us/sql/linux/sql-server-linux-availability-group-cross-platform

Create a login and user on the Windows SQL instance: –

CREATE LOGIN dbm_login WITH PASSWORD = '<C0m9L3xP@55w0rd!>';
CREATE USER dbm_user FOR LOGIN dbm_login;
GO

And then a master key and certificate: –

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<C0m9L3xP@55w0rd!>';
CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm';
BACKUP CERTIFICATE dbm_certificate
TO FILE = 'C:\Temp\dbm_certificate.cer'
WITH PRIVATE KEY (
        FILE = 'C:\Temp\dbm_certificate.pvk',
        ENCRYPTION BY PASSWORD = '<C0m9L3xP@55w0rd!>'
    );
GO

Grab the dbm_certificate.cer and dbm_certificate.pvk files, copy to your local machine, and then copy them into the Kubernetes pod: –

kubectl cp dbm_certificate.cer mssql-statefulset-0:/var/opt/mssql/data
kubectl cp dbm_certificate.pvk mssql-statefulset-0:/var/opt/mssql/data

One copied, run the following on the SQL instance in the Kubernetes pod to create a login/user, master key, and the certificate: –

CREATE LOGIN dbm_login WITH PASSWORD = '<C0m9L3xP@55w0rd!>';
CREATE USER dbm_user FOR LOGIN dbm_login;
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<M@st3rKeyP@55w0rD!>'
GO

CREATE CERTIFICATE dbm_certificate
    AUTHORIZATION dbm_user
    FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer'
    WITH PRIVATE KEY (
    FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
    DECRYPTION BY PASSWORD = '<C0m9L3xP@55w0rd!>'
)
GO

And then create the endpoint for the availability group on both the Windows SQL instance and the instance in the Kubernetes pod: –

CREATE ENDPOINT [Hadr_endpoint]
    AS TCP (LISTENER_IP = (0.0.0.0), LISTENER_PORT = 5022)
    FOR DATA_MIRRORING (
        ROLE = ALL,
        AUTHENTICATION = CERTIFICATE dbm_certificate,
        ENCRYPTION = REQUIRED ALGORITHM AES
        );
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [dbm_login];
GO

Fantastic! Now we can create the availability group!

Run on the Windows SQL instance: –

CREATE AVAILABILITY
GROUP [ag1]
WITH (CLUSTER_TYPE = NONE)
FOR REPLICA
    ON N'z-ap-sql-01'
WITH (
    ENDPOINT_URL = N'tcp://z-ap-sql-01:5022',
    AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
    SEEDING_MODE = AUTOMATIC,
    FAILOVER_MODE = MANUAL,
    SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
    ),
    N'mssql-statefulset-0'
WITH (
    ENDPOINT_URL = N'tcp://mssql-statefulset-0:5022',
    AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
    SEEDING_MODE = AUTOMATIC,
    FAILOVER_MODE = MANUAL,
    SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
    )
GO

And then join the SQL instance in the Kubernetes pod: –

ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = NONE);
ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;
GO

Once that has completed, we have a cross platform availability group!

Right, final thing to do is get our database into the AG! For this, we are going to auto-seed the database to the Linux instance.

However, Windows and Linux have different file paths! But auto-seeding does work across platform, details are here: –
https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/automatic-seeding-secondary-replicas

As long as the database’s data and log files are located under the Windows SQL instance’s default data and log file paths, the database will be seeded to the Linux instance’s default data and log file paths.

Let’s see this in action! The default file paths for my Windows instance are: –

And the database’s files reside on those file paths: –

And the default file paths for the SQL instance in the pod are: –

So now let’s add the database: –

ALTER AVAILABILITY GROUP [AG1] ADD DATABASE [testdatabase1];
GO

And boom! We have the database auto-seeded into the SQL instance running in the Kubernetes pod: –

Interestingly, if we look at the database’s file paths in SSMS for the instance in the pod: –

SQL in the pod thinks that the files are located at the Windows locations…but that can’t be true…it’s running on Linux!

If we go and check the actual location of the files…they are located at the default file paths for the SQL instance in the pod: –

OK, I fully admit this pretty out there…but it’s just a bit of fun to see what we can do with SQL Server Availability Groups and Kubernetes.

Thanks for reading!

Automating Container Image Builds with Docker Build Cloud and Github Actions

In a previous post we went through how to use Docker Build Cloud to remotely build a Docker container image from a Github repository.

In that example, we kicked off a build and pushed the image to a container registry using the syntax: –

docker buildx build https://github.com/dbafromthecold/sqlserver2022.git `
--builder cloud-dbafromthecold-default `
--tag dbafromthecold/sqlserver2022:latest `
--push

This all seems a bit manual, doesn’t it?

What if we could automate the building of the image and the push to the container registry when we commit to the Github repository?

Thankfully, with Github Actions…we can do just that!

I don’t really have much experience with Github Actions if I’m honest…but all the syntax needed is provided in the Docker Build Cloud: –

N.B. – this option is under the builder that you use to build the image (using the default one here)

This gives the following code: –

name: ci

on:
  push:
    branches:
      - "main"

jobs:
  docker:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Log in to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        with:
          version: "lab:latest"
          driver: cloud
          endpoint: "dbafromthecold/default"
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          tags: "dbafromthecold/sqlserver2022"
          outputs: ${{ github.event_name == 'pull_request' && 'type=cacheonly' || 'type=registry,push=true' }}

The only update I’ve had to make to the code provided is line 29 where I’ve changed the tag to “dbafromthecold/sqlserver2022”

OK, so let’s go and create this Github Action.

The first thing to do is create secrets in the Github repository to allow the action to connect to Docker Build Cloud.

For this example we’ll use the repo from the previous blog: –
https://github.com/dbafromthecold/sqlserver2022

In the repo, go to /settings/secrets/actions and create two repository secrets: –

These are your Docker Hub username and an access token created under your account (https://hub.docker.com/settings/security): –

Once the secrets are created, we can then create the workflow in Github.

In the repo, go to Actions and then select new workflow: –

Choose simple workflow: –

Remove the template code and replace with the code from Docker Build Cloud (remember to update the tag name on line 29!): –

Commit the changes and we’re good to go!

I have the repo locally so I’ll pull down the changes made in the web gui…

git pull

And now let’s test committing a change to the repo and pushing.

There’s a dockerfile in the repo (to build the image from)…I’ll make a simple change (adding a LABEL for example) and push: –

git add .
git commit -m "testing github action"
git push

And if we go back to Actions in the repo in Github we should see it working!

But did it build and push the image? Let’s check the Docker Hub: –

Awesome stuff, the image has been built and pushed to the Hub!

So by using Docker Build Cloud and Github Actions, we can push changes to a Github repository, remotely build a container image, and then push to a container registry without having to use any local resources!

Thanks for reading!

Building a Docker image with Docker Build Cloud

In a previous blog post we went through how to build a Docker container image from a remote (Github) repository.

Here we’re going to expand on that by actually building the image itself remotely, using Docker Build Cloud.

What we can do with Docker Build Cloud is instead of building the image locally and then having to push to a remote container registry (for example the Docker Hub), we can build remotely and then immediately push that image to the registry so that it is available for immediate use by say, our team members or deployment/testing pipelines.

This has advantages as we no longer are reliant on our local system. Build Cloud uses isolated Amazon EC2 instances so we’ll get a consistent build speed and if our images are large (I mainly work with SQL Server images, which are about 1.5GB in size), and we have a poor internet connection, we don’t have to wait for ages to push the image to the registry.

I have a remote repository that we’re going to use to build a SQL Server 2022 container image. Very simple repo, with just a dockerfile in it to build the image.

Now, to build an image from that remote repository locally, we would run: –

docker build -t dbafromthecold/sqlserver2022:latest https://github.com/dbafromthecold/sqlserver2022.git

But how do we use Build Cloud?

First thing to do is create a Docker account and sign up for Build Cloud.
Full directions are here: –
https://docs.docker.com/build/cloud/
N.B. – Docker Personal accounts are free and will give you 50 “build minutes” per month.

Once we have the account signed up we are good to go!

To build an image using the cloud builder we first need to log into Docker on the command line: –

docker login

Then we use the docker buildx build command to use Build Cloud: –

docker buildx build https://github.com/dbafromthecold/sqlserver2022.git `
--builder cloud-dbafromthecold-default `
--tag dbafromthecold/sqlserver2022:latest `
--load

Let’s break down what this is doing…

docker buildx build https://github.com/dbafromthecold/sqlserver2022.git
This is saying to use Build Cloud and pull the dockerfile from that repository on Github

–builder cloud-dbafromthecold-default
Use the default builder that is available in Build Cloud

–tag dbafromthecold/sqlserver2022:latest
Tag the build image with a name

–load
Once the image is built, pull it to the local machine (encrypted)

When executed, we can see the image being built and pulled to the local machine: –


N.B. – I built the image a few times to test before taking this screenshot, hence why there are cached layers in the build.

Once complete, we can check our local images: –

docker image ls

And there it is! But hang on, didn’t I say earlier we don’t want to have the image locally? We want to push it to a remote repository!

To to that, we need a remote repository in a registry. I’ve set up one in the Docker Hub for this example here but any other registry will work (just make sure you’re authenticated).

To use Build Cloud and push to a remote registry: –

docker buildx build https://github.com/dbafromthecold/sqlserver2022.git `
--builder cloud-dbafromthecold-default `
--tag dbafromthecold/sqlserver2022:latest `
--push

Only difference here is that we’re using –push instead of –load. And make sure that the tag used for the image matches the repository that you’re pushing to!

When executed, we’ll see something similar to: –

And if we check the repository in the registry: –

There’s the image! Available for our wider team to use or be utilised in a pipeline!

We can even use something like Github actions to trigger the build when we push to the GitHub repo, pretty cool…huh?

Thanks for reading!