Bit of fun this week with something that a colleague of mine noticed when playing around with SQL Server on linux.
The first thing you do when playing with a new technology is see how you can break it right? Always good fun 🙂
So I’m going to break a database in a SQL Server instance that’s running on linux. I’m not going to go through the install process as there’s a whole bunch of resources out there that detail how to do it. See here
Once you have your instance up and running, connect it to as normal in SSMS and create a database with one table:-
CREATE DATABASE [GoingToBreak]; GO USE [GoingToBreak]; GO CREATE TABLE dbo.Test (PKID INT IDENTITY(1,1) PRIMARY KEY, ColA VARCHAR(10), ColB VARCHAR(10), DateCreated DATETIME); GO
Then insert some data: –
INSERT INTO dbo.Test (ColA, ColB, DateCreated) VALUES (REPLICATE('A',10),REPLICATE('B',10),GETUTCDATE()); GO 1000
What we’re going to do is delete the database files whilst the instance is up and running. Something you can’t do to a database running in an instance of SQL on windows as the files are locked.
First, find where the files are located: –
USE [GoingToBreak]; GO EXEC sp_helpfile; GO
Then jump into your favourite terminal client (I’m using bash on windows) and connect to the linux server.
Then run: –
cd /var/opt/mssql/data/ ls
Ok, so now to delete the files we can run: –
rm GoingToBreak.mdf GoingToBreak_log.ldf
And the files are gone! That database is deader than dead!
But….wait a minute. Let’s have a look back in SSSMS.
Run the following: –
USE [GoingToBreak]; GO SELECT * FROM dbo.Test;
Hmm, data’s returned! Ok, it could just be in the buffer pool. Let’s write some data: –
USE [GoingToBreak]; GO INSERT INTO dbo.Test (ColA, ColB, DateCreated) VALUES (REPLICATE('A',10),REPLICATE('B',10),GETUTCDATE()); GO 1000
What? It completed successfully??? Err, ok. Let’s run that select statement one more time…
SELECT * FROM dbo.Test;
Ok, that returned 2000 rows right? Hmm, what happens if we run CHECKPOINT?
CHECKPOINT
Errr, ok. That worked as well.
Alright, enough of this. Let’s break it for sure. Back in your terminal session run: –
sudo systemctl restart mssql-server
Once that’s complete, jump back into SSMS and refresh the connection: –
Ha ha! Now it’s dead. That’s pretty weird behaviour though eh? I expect there’s a linux person out there who can explain why that happened ‘cos I’m really not sure.
EDIT – Anthony Nocentino (b|t) has come back to me and said that the files when deleted get unlinked from the directory so we can no longer see them but the SQL process will still have them open; hence being able to execute queries. Once the instance is restarted the file handle will be released, the underlying blocks and inodes get deallocated; hence the database going into recovery pending. Thanks Anthony!
One thing I do know is that SQL databases on linux will continue to allow queries to be executed against them after their underlying files have been deleted. Pretty worrying imho, you could have a problem and not even know about it until your next server restart!
> Pretty worrying imho
It’s not pretty worrying, it’s actually great feature of Unix/Linux. Get used to live with it. Databases are used on this platform for few decades more than on Windows and proper filesystem permissions is enough to make them secure. Just don’t run everything as root.
Being able to delete/rename/move a file while it’s locked is a great advantage, which for unknown reason didn’t come to Windows. On Windows you have to reboot to rename a file which is locked (think about monthly patching), and that’s grossly stupid, isn’t it?
As long as they are referenced, files will persist on disk as per: lsof | grep -i deleted
This is fine, this stops daemons randomly crashing when people delete their log files or other such files without checking if they are actually still in use. The fact is, anybody can do an rm -rf /var/lib/mysql as anybody could do a rm -rf /var/opt/mssql/data but this is fine as this gives the server admin the power to decide and you need permissions to do this. Running things as root has an inherent risk associated to it and if you aren’t sure what you are doing then you shouldn’t be running anything. This isn’t an issue because the only people who should have power to do this should never do it unless under an extreme circumstance.
Linux is not Windows, Linux is not there to hold your hand and ask, “are you sure you want to do this?”
Somehow, the entire Oracle world and well, pretty much everyone else who has been using Unix for the past several decades has managed to survive with this “feature.”
Steve – I don’t see how your passive-aggressive comment has helped here. My understanding of the blogpost was to inform people coming from a Windows background that even things they hold as “obvious” from their sysadmin knowledge should be questioned when transitioning. Your comment leans heavily towards the “Windoze Lo$er” camp and won’t really encourage Windows sysadmins in their inevitable struggles with Linux/Unix.
We all have to start somewhere and are all here to learn.
Thanks for the interesting post. It highlights nicely, that SQL DBAs moving from Windows to Linux need to be aware that their sysadmin knowledge from Windows will need revising for a transition to Linux.
There is going to be a lot of headscratching and scrambling to (re)learn Linux for a lot of SQL Server DBAs in the coming months and years. I’d like to see a blog series on these sort of gotchas with pointers towards learning material to kick-start a DBA who needs to transition.
Yes, Linux is very different than Windows. A couple other people have already pointed out the reasons why you were able to remove the database files, even though a process was using them. The root login really is all powerful on Linux. In Windows there are still a number of protections in place even when you are an Administrator; on Linux using the root login, not so much. Bottom line, be wary of what you do as root.
SQL Server administrators that have never worked with Linux before have a whole new platform to learn. I expect a lot more of these types of posts in the near future. Thanks for sharing your experience.
Spot on Chris, this was going to be my comment but you nailed it.
Andrew – Thank you for the post. As a SQL Server DBA on Windows, I can see that I have taken for granted some of the “protections” that windows gives us. I am looking at rolling out some test instances of SQL Server on Linux. I look forward to more posts for you.
Thanks for the post Andrew, I found it very informative. Well done also for taking the criticism on the chin, this is exactly the reason I decided not to blog
Note as well that if the server has the file open then it will still be visible under /proc/PID/fd/FILE_DESCRIPTOR_NUMBER. e.g /proc/1234/fd/8. You could then in theory checkpoint the database and use dd to copy the file back to the original location. Possibly an instance restart with some fudging/dbcc’s MIGHT (not tried) get you out of the mess!
It seems, that besides copying (where a cp should suffice), you can also use the linkeat-system-call to “re-link” the file to a name:
– Reddit on that https://www.reddit.com/r/programming/comments/7yx6f/how_to_undelete_any_open_deleted_file_in_linux/c07st7j/
– linkeat documentation https://linux.die.net/man/2/linkat
This functionality has been disabled long time ago [1]. You cannot re-link inodes (also deleted files) because of security implications. Now when you try to use ‘ln -L’ or this ‘linkat’ from reddit, you get ‘no such file or directory’ error. So the option is to do whatever is required to ensure consistency, then copy from /proc//fd/, having checked that you’ve got enough disk space for a copy.
I found ‘lsof +L1’ quick and useful way to determine the path to the deleted file and ls -l /proc//fd | grep to find its fd number.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aae8a97d3ec30788790d1720b71d76fd8eb44b73
Oh and if you want to be sure, that you are not currently working with deleted files, you may use “lsof +L1” to list all open but deleted files.
All in all lsof is a very handy tool if you are interested in open files and as some state “everything is a file on Linux”, it will also show you network connections and probably some more things. lsof should be part of any distribution, but it could be, that you need to install it first.
lsof documentation: https://linux.die.net/man/8/lsof
A really very informative blog
You are champ
Please keep posting more blogs. We are eagerly waiting for your blogs
Thanks