0

Overcoming a storage failover issue with SQL Server on Kubernetes using Portworx

In a previous post I talked about an issue with storage failover when a Kubernetes node goes offline.

In a nutshell, the issue is that the attachdetach-controller in Kubernetes won’t detach storage from an offline node until that node is either brought back online or is removed from the cluster. What this means is that a pod spinning up on a new node that requires that storage can’t come online.


UPDATE – This issue has been resolved in Kubernetes version 1.26. Details are on this github issue: –
https://github.com/kubernetes/kubernetes/issues/65392

And there’s more on the official Kubernetes blog (when a feature called non-graceful node shutdown when into beta): –
https://kubernetes.io/blog/2022/12/16/kubernetes-1-26-non-graceful-node-shutdown-beta/


Aka, if you’re running SQL Server in Kubernetes and a node fails, SQL won’t be able to come back online until someone manually brings the node online or deletes the node.

Not great tbh, and it’s been a blocker for my PoC testing.

However, there are ways around this…one of them is by a product called Portworx which I’m going to demo here.


DISCLAIMER – I now work for Pure Storage who own Portworx…there are other options to get around this issue (Azure Arc Data Services being one) but I’ve found Portworx to be the simplest solution and everything I’m demoing here is completely free so you can go and test it for yourself


So let’s run through using Portworx to first create storage in Kubernetes for SQL Server, and then test a node failure.

First thing to do is log into Azure using the azure-cli: –

az login

Then create a role for portworx in Azure: –

az role definition create --role-definition '{
    "Name": "portworx",
    "Description": "",
    "AssignableScopes": [
        "/subscriptions/<subscription-id>"
    ],
    "Actions": [
        "Microsoft.ContainerService/managedClusters/agentPools/read",
        "Microsoft.Compute/disks/delete",
        "Microsoft.Compute/disks/write",
        "Microsoft.Compute/disks/read",
        "Microsoft.Compute/virtualMachines/write",
        "Microsoft.Compute/virtualMachines/read",
        "Microsoft.Compute/virtualMachineScaleSets/virtualMachines/write",
        "Microsoft.Compute/virtualMachineScaleSets/virtualMachines/read"
    ],
    "NotActions": [],
    "DataActions": [],
    "NotDataActions": []
}'

Then grab the resource group that the Kubernetes cluster resources are in: –

az aks show --name <cluster-name> --resource-group <cluster-resource-group> | jq -r '.nodeResourceGroup'

Then create a service principal in Azure dropping the subscription ID and Kubernetes resource group in: –

az ad sp create-for-rbac --role="portworx" --scopes="/subscriptions/<subscription-id>/resourceGroups/<resource-group>"

Now create secret for portworx to access Azure APIs using the service principal’s details: –

kubectl create secret generic -n kube-system px-azure --from-literal=AZURE_TENANT_ID=<tenant> \
                                                      --from-literal=AZURE_CLIENT_ID=<appId> \
                                                      --from-literal=AZURE_CLIENT_SECRET=<password>

Now we need to go to the Portworx Portal to generate a yaml file to deploy to the AKS cluster: –
https://central.portworx.com

Create an account and once in, select Portworx Enterprise (don’t worry there’s a free option coming): –

Now select Portworx Essentials and click Continue: –

On the next page, leave everything as the default and click Next: –

Then on the Storage page, select the Cloud option, then Azure, and pick a size for the disks (I went with 50GB): –

Leave everything as the default settings on the next page (the Network page) and click Next.

And on the final, Customise page, select AKS and hit Finish: –

N.B. – you can also go into the Advanced Settings section and choose what options to deploy for Portworx here. I disabled pretty much everything for this demo but you can leave the defaults.

Ok, on the next page we have some instructions. The first thing to do is deploy the Portworx operator: –

kubectl apply -f 'https://install.portworx.com/2.10?comp=pxoperator'

Once that’s deployed we can download the spec generated: –

Great! We are ready to deploy Portworx to the cluster: –

kubectl apply -f portworx_essentials_sql_demo.yaml

Let’s have a look at some of the resource that deployed. View the Portworx pods: –

kubectl get pods -n=kube-system -l name=portworx -o wide

N.B. – These may take around 5 minutes to come online

To confirm Portworx’s status: –

PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status

Now we can create a Portworx storage class. Create the following px_storageclass.yaml file: –

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
    name: px-mssql-sc
provisioner: kubernetes.io/portworx-volume
parameters:
   repl: "3"
   io_profile: "db_remote"
   priority_io: "high"
allowVolumeExpansion: true

Then deploy to the cluster: –

kubectl apply -f .\px_storageclass.yaml

Confirm the storage class is now available: –

kubectl get storageclass

Now create a px_pvc.yaml file referencing the Portworx storage class: –

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
   name: mssql-system
   annotations:
     volume.beta.kubernetes.io/storage-class: px-mssql-sc
spec:
   accessModes:
     - ReadWriteOnce
   resources:
     requests:
       storage: 1Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
   name: mssql-data
   annotations:
     volume.beta.kubernetes.io/storage-class: px-mssql-sc
spec:
   accessModes:
     - ReadWriteOnce
   resources:
     requests:
       storage: 1Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
   name: mssql-log
   annotations:
     volume.beta.kubernetes.io/storage-class: px-mssql-sc
spec:
   accessModes:
     - ReadWriteOnce
   resources:
     requests:
       storage: 1Gi

Create the PVCs: –

kubectl apply -f .\px_pvc.yaml.yaml

Confirm the PVCs: –

kubectl get pvc

And the corresponding PVs: –

kubectl get pv

Now we can deploy SQL Server using the PVCs. Create the following px_sqlserver.yaml file: –

apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: sqlserver
  name: sqlserver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sqlserver
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: sqlserver
    spec:
      securityContext:
        fsGroup: 10001
      containers:
      - image: mcr.microsoft.com/mssql/server:2019-CU11-ubuntu-18.04
        name: sqlserver
        env:
        - name: ACCEPT_EULA
          value: "Y"
        - name: MSSQL_SA_PASSWORD
          value: "Testing1122"
        resources:
          requests:
            memory: "1024Mi"
            cpu: "1000m"
          limits:
            memory: "2048Mi"
            cpu: "2000m"
        volumeMounts:
        - name: system
          mountPath: /var/opt/mssql
        - name: user
          mountPath: /var/opt/sqlserver/data
        - name: log
          mountPath: /var/opt/sqlserver/log
      tolerations:
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 10
      - key: "node.kubernetes.io/not-ready"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 10
      volumes:
      - name: system
        persistentVolumeClaim:
          claimName: mssql-system
      - name: user
        persistentVolumeClaim:
          claimName: mssql-data
      - name: log
        persistentVolumeClaim:
          claimName: mssql-log
  strategy:
    type: Recreate

And deploy: –

kubectl apply -f .\px_sqlserver.yaml

Confirm the deployment: –

kubectl get deployments

View pod created: –

kubectl get pods

View pod events: –

kubectl describe pods

N.B. – Note that we have the PVCs attached to the pod but there are no events for the attachdetach-controller in the pod events

Ok, now that we have a pod running SQL Server in the cluster…let’s see what happens when we shut down the node the pod is running on.

Confirm the node that the pod is running on: –

kubectl get pods -o wide

And then shut that node down in the portal: –

Confirm the node is down: –

kubectl get nodes

And then watch the pods: –

kubectl get pods --watch

A new pod is spun up and goes into the running status on another node: –

kubectl get pods -o wide

Ok, the old pod will stay in the terminated status until the offline node is either brought back online or removed from the cluster…but we have SQL Server up and running!

So Portworx fixes the issue that we saw in my previous post…which was a show stopper when I was running a proof of concept for SQL Server in Azure Kubernetes Service.

Thanks for reading!

0

Running SQL Server in Windows Subsystem for Linux (WSL)

I previously wrote a post on how to convert a SQL Server Docker image to a Windows Subsystem for Linux distribution.

I did this because if you tried to run SQL Server in WSL before now, you’ll be presented with this error: –

This happens because up until now, WSL did not support systemd. However recently Microsoft announced systemd support for WSL here: –
https://devblogs.microsoft.com/commandline/systemd-support-is-now-available-in-wsl/

This is pretty cool and gives us another option for running SQL Server locally on linux (great for testing and getting to grips with the Linux platform).

So how can we get this up and running?


Before going any further, the minimum version required for to get this running is OS version 10.0.22000.0 (a recent version of Windows 11).

I tried getting this to work on Windows 10, but no joy I’m afraid

UPDATE – 2022-11-23 – Microsoft have now made this available on Windows 10 (but I have not tested it I’m afraid) – the announcement is here: –
https://devblogs.microsoft.com/commandline/the-windows-subsystem-for-linux-in-the-microsoft-store-is-now-generally-available-on-windows-10-and-11/


First thing to do is get WSL up to the version that supports systemd. It’s only currently available through the store to Windows Insiders but you can download the installer from here: –
https://github.com/microsoft/WSL/releases

Run the installer once downloaded and then confirm the version of WSL: –

Now install a distro to run SQL Server on from the Microsoft Store: –

N.B. – I’m using Ubuntu 20.04.5 for this…I did try with Ubuntu 22.04 but couldn’t get it to work.

Once installed and log into WSL…update and upgrade: –

sudo apt update
sudo apt upgrade

Cool, ok now we are going to enable systemd in WSL. Create a /etc/wsl.conf file and drop in the following: –

[boot]
systemd=true

Exit out of WSL and then run: –

wsl --shutdown

Jump straight back into WSL and run the following to confirm systemd is running: –

systemctl list-unit-files --type=service

Great stuff, now we can run through the usual SQL install process (detailed here)

Import the GPG keys: –

wget -qO- https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -

Register the repository: –

sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/20.04/mssql-server-preview.list)"

Update and install SQL Server: –

sudo apt-get update
sudo apt-get install -y mssql-server

Once that’s complete, you’ll be able to configure SQL Server with mssql-conf: –

sudo /opt/mssql/bin/mssql-conf setup

Confirm SQL Server is running: –

We can also confirm the processes running by: –

ps aux | grep mssql

Finally, connect to SQL Server in SSMS (using 127.0.0.1 not localhost): –

And there we have it, SQL Server running in a Windows Subsystem for Linux distro!

Thanks for reading!

0

EightKB August 2022

EightkB is back!

Today we announced the schedule for the next EightKB, which is happening at 1pm UTC on the 3rd of August.

We have five more amazing, mind-melting sessions from 7 amazing speakers.

Really looking forward to this one!

I want to say a massive thank you to all the speakers who submitted. Selection was difficult as always, with only 5 slots available meant us having to make difficult decisions. Thank you so much for submitting and I hope that you will submit to our future events.

Registration for EightKB is completely free and you can sign up here: – https://eightkb.online/

Hope to see you on the 3rd!

0

Shutting down nodes in Azure Kubernetes Service

A while back I wrote a post on Adjusting Pod Eviction Timings in Kubernetes. To test the changes made in that post I had to shut down nodes in an Azure Kubernetes Service cluster.

This can be done easily in the Azure portal: –

However I did a presentation recently and didn’t want to have to keep jumping into the portal from VS Code…so I wanted to be able to shut down the nodes in code.

So here’s how to use the azure-cli to shut down a node in an Azure Kubernetes Service cluster.


DISCLAIMER – the following code should only be run against a test cluster!


Firstly, to test, let’s deployment a simple application to the cluster: –

kubectl create deployment test --image=nginx

Confirm: –

kubectl get all

To check the node that the pod is running on: –

kubectl get pods -o wide

kubectl get pods -o jsonpath="{.items[0].spec.nodeName}"

Assign the node that the pod is running on to a variable (we’ll use this in a minute): –

NODE=$(kubectl get pods -o jsonpath="{.items[0].spec.nodeName}" | sed 's/.$/\U&/')  && echo $NODE

N.B. – the sed command at the end of the statement above is to make sure that the last character of the node name is in upper case, which is needed to get the instance ID of the VM in a later statement.

OK now we can look at shutting down the node that the pod is running on.

Nodes in AKS run in a nodepool which is a virtual machine scale set that is in a different resource group that the kubernetes cluster itself. The naming convention of that resource group is: –

MC_resourcegroup_clustername_location

The cluster in the examples here is called kubernetes1 in the resource group kubernetes in EASTUS.

So set the resource group name: –

RESOURCEGROUP="MC_kubernetes_kubernetes1_eastus"

Now we can grab the VMSS name in two ways, firstly by running: –

VMSSNAME=$(az vmss list --resource-group $RESOURCEGROUP --query "[].name" -o tsv) && echo $VMSSNAME 

N.B. – AKS clusters can have multiple nodepools in which case the query above will return multiple values and won’t work.

Or we use the $NODE variable we set earlier and strip out the last few characters: –

VMSSNAME=${NODE:0:27} && echo $VMSSNAME 

Once we have the NODE name and the VMSS name, we need to get the instance ID of the VM in the scale set:-

INSTANCEID=$(az vmss list-instances --name $VMSSNAME --resource-group $RESOURCEGROUP --query "[?osProfile.computerName=='$NODE'].[instanceId]" -o tsv) && echo $INSTANCEID

And now we can shut down the node: –

az vmss deallocate --name $VMSSNAME --instance-ids $INSTANCEID --resource-group $RESOURCEGROUP

Confirm that the node is offline: –

kubectl get nodes

Great! The node is offline! Our test pod didn’t have any tolerations set so it’ll take 5 minutes for a new pod to be created on a healthy node. You can check out how to adjust this in my previous post.

Finally, to restart the node: –

az vmss start --name $VMSSNAME --instance-ids $INSTANCEID --resource-group $RESOURCEGROUP

And that’s how to shutdown a node in AKS to test pod eviction!

Thanks for reading!

5

Installing the mssql-cli on Ubuntu 22.04

I really like the mssql-cli tool. I use it pretty much everyday however it seems like it’s not being maintained anymore and as such, there are issues when trying to install it on Ubuntu 22.04.

The issue with Ubuntu 22.04 is that it has python 3.10 installed by default which the current mssql-cli is not compatible with. I did try installing previous versions of python (3.8 and 3.9) but had no luck so kept the default version.

The steps that follow are how I worked out how to get the mssql-cli installed on an Azure Ubuntu 22.04 fresh install. This is pretty hacky tbh so should only be done on development and test servers.

Also, if you’re running Ubuntu 20.04 only the first steps are required to get the mssql-cli working (as that has python 3.8 installed by default).

First ssh to the VM: –

ssh dbafromthecold@XXXXXXXXXXXXX

Confirm the python version: –

python3 --version

In order to test the mssql-cli we’ll need to spin up a SQL container and to do that we’ll need Docker installed.

The instructions on how to install Docker on Ubuntu are here: – https://docs.docker.com/engine/install/ubuntu/

Once docker is installed, run a SQL container: –

docker container run -d \
--publish 1433:1433 \
--env ACCEPT_EULA=Y \
--env MSSQL_SA_PASSWORD=Testing1122 \
--name sqlcontainer1 \
mcr.microsoft.com/mssql/server:2019-CU15-ubuntu-18.04

Now install pip: –

sudo apt install -y python3-pip

And then install mssql-cli with pip: –

pip install mssql-cli

Add location of mssql-cli to our path: –

export PATH=$PATH:~/.local/bin

N.B. – Add this to .bashrc to make permanent

Ok, now we have the mssql-cli we can test connecting to SQL in our container: –

mssql-cli -S localhost -U sa -P Testing1122 -Q "SELECT @@VERSION"

Which gives us this error: –

Ok, let’s have a look at the file: –

cat ~/.local/bin/mssql-cli

The issue is that the reference to python here needs to be updated to python3: –

sed -i 's/python/python3/g' ~/.local/bin/mssql-cli

Confirm the update: –

cat ~/.local/bin/mssql-cli

Test connecting to SQL container again: –

mssql-cli -S localhost -U sa -P Testing1122 -Q "SELECT @@VERSION"

N.B. – if you’re on Ubuntu 20.04 the mssql-cli should now be working

But on Ubuntu 22.04 we get a new error. From looking at the open issues on the mssql-cli issues page on Github…we need to force upgrade the cli-helpers python module: –

pip install cli-helpers --upgrade --force

Test connecting to SQL container again: –

mssql-cli -S localhost -U sa -P Testing1122 -Q "SELECT @@VERSION"

OK new error! Progress 🙂

To resolve this we need to remove references to ownerUri on lines 22 and 93 in the connectionservice.py file: –

vim ~/.local/lib/python3.10/site-packages/mssqlcli/jsonrpc/contracts/connectionservice.py

Test connecting to SQL container again: –

mssql-cli -S localhost -U sa -P Testing1122 -Q "SELECT @@VERSION"

OK that’s pretty clear…install libssl1.0.0: –

wget http://archive.ubuntu.com/ubuntu/pool/main/o/openssl1.0/libssl1.0.0_1.0.2n-1ubuntu5_amd64.deb
sudo dpkg -i libssl1.0.0_1.0.2n-1ubuntu5_amd64.deb

Test connecting to SQL container again: –

mssql-cli -S localhost -U sa -P Testing1122 -Q "SELECT @@VERSION"

Ha, OK another new error!

To resolve this one we need to remove references to owner_uri from lines 91, 93, and 191 in the mssqlcliclient.py file: –

vim ~/.local/lib/python3.10/site-packages/mssqlcli/mssqlcliclient.py

One more test connecting to SQL in the container: –

mssql-cli -S localhost -U sa -P Testing1122 -Q "SELECT @@VERSION" to the server: -

Ha success! I’ve dropped all the commands into this gist

Thanks for reading!