How come it takes so long to DROP CONSTRAINT in MSSQL?
Right, this will almost definitely be a rant, so here goes.
** Deep breath **
I had created C# EF Entities with ulong for their primary keys. Seemed smart. Why use long when they are only going to be positive, right?
Wrong, MSSQL does not have ulong, so EF mapped it to decimal(20,0). Not ideal, but okay. Wasn't a problem. We built this whole system around it. Once we started seeing production workloads, we found that queries weren't running super well. No issue, we just needed to add some indexes, and that would be the end of that. Right?
Wrong, in spite of having indexes, queries were still slow. It turns out that in some cases, MSSQL would generate queries that would cast the decimal(20,0) primary key to bigint in order to do joins, which meant that indexes couldn't be used. So, we decided to change the primary keys from decimal(20,0) to bigint. Simple enough, right?
Wrong. Turns out that changing the type of a primary key in MSSQL is not a simple operation. You have to:
- Drop any dependent indexex.
- Drop any foreign key constraints that reference the primary key.
- Drop the primary key constraint.
- Change the column type.
- Change the column types of all foreign keys that referenced the primary key.
- Re-add the primary key constraint. Fine, I can do that.
- Re-add all foreign key constraints.
- Re-add indexes.
Entity Framework usually does a good job of generating migrations for me, but in this case, it only generated what corresponds to the ALTER COLUMN statement for the primary key and handling for indexes. It did not handle dropping and re-adding foreign key constraints. So, I had to manually write that part of the migration. I want to say "no big deal", but it was tedious and error-prone, since there were A LOT of tables and foreign keys to deal with.
Anyway, once that was all written, it was just a matter of running the migration. It ran pretty quick in my local dev DB, so I would assume that it would also run reasonably well in production, right?
Wrong. Turns out that dropping foreign key constraints in MSSQL can be a very slow operation, especially if the tables are large and have a lot of data. In my case, it took several hours to drop all the foreign key constraints. Not much help was found online. I could gather that it is somehow related to clustered indexes on primary keys, which - I'll admit - I don't fully understand the significance of. That's on me.
What bothers me is that there is seemingly no way around this. Once the constraints is in place (Entity Framework generated it for me), it needs to be dropped before the column type can be changed. Some strategies mentioned renaming the ID column to OLDID and creating a new ID column to copy data into, etc. But the issue remains. I still need to drop the constraint before I can drop the OLDID column. So, it doesn't really help.
Anyway, I suppose it is just a matter of letting it run and then way, right?
Wrong! Apparantly once the constraint is dropped, MSSQL needs to turn the table in an unindex heap, and then back to a table again afterwards, which means a ton of IO operations and a lot of disk space usage. When the migration started, I was using only around 30% of the disk, yet a single table operation ended up reaching the 100% cap and crashing, so I had to increase the disk size and let it run again.
Conclusions?
I don't have any. Seems like this is just how it is with MSSQL. If I were to offer a suggestion, it would probably just be to echo a sentiment that I've often seen being thrown around online: Just use PostgreSQL. It is open source, has a lot of features that MSSQL lacks, and generally seems to be a better database system all around. But I get it, sometimes you are stuck with what you have.