Data Conventions

Checks for data conventions issues.

Table of contents

Using Placeholder Rows
Data Should be Encrypted if Compliance Dictates
Use Environment-Agnostic SQL Queries and Data-Driven Design
Not Purging Data

Using Placeholder Rows

Check Id: 60 Not implemented yet. Click here to add the issue if you want to develop and create a pull request.

A placeholder is an empty row that is created to hold the place of possible future data in the row that may or may not be necessary.

While placeholder rows do not violate database normalization rules, it is not considered a best practice to create “empty” rows. Row data should only be created when it is materialized. If the row data does not exists, it should not be inserted. If the row data is removed, the row should be hard or soft deleted. Empty rows are not free, there is overhead space allocated with placeholder rows, which can impact performance.

Having the unnecessary placeholder rows can muddy queries that now would need to include IS NOT NULL or LEN(PhoneNumber) > 0 to exclude these placeholder rows on other queries.

Data Should be Encrypted if Compliance Dictates

Potential Finding: Unencrypted Data
Check Id: 27

The table column returned for this check might have unencrypted data that you might want to have encrypted for best practices or industry specific compliance. You will need to determine if the data needs to be protected at rest, in transit or both.

With SQL Server you have a couple choices to implement hashing or encryption

SQL Server Always Encrypt 🗗 by Microsoft
SQL Server Transparent Data Encryption (TDE) 🗗 by Microsoft
Configure SQL Server Database Engine for encrypting connections 🗗 by Microsoft

Use Environment-Agnostic SQL Queries and Data-Driven Design

Check Id: 191

Do not hard-code values into queries that will change over time. Use a data driven approach like database tables where the values can be maintained with something like a user interface. This will shift the maintenance of these values closer to the business users who are the domain experts.

Hard-coding values directly in SQL queries like specific IDs, names, or lists creates rigid, error-prone SQL code that requires manual updates and is vulnerable to inconsistencies across environments. This approach also limits adaptability and introduces risks when data changes, as it requires ongoing maintenance to ensure accuracy. Using data-driven design with tables to dynamically source values allows for flexibility, ensuring the query remains consistent, adaptable, and reflective of the latest data.

Not Purging Data

Check Id: 193 Not implemented yet. Click here to add the issue if you want to develop and create a pull request.

Purging historical or obsolete data is a crucial task in maintaining a healthy SQL Server environment. It improves performance, reduces storage costs, and ensures compliance with data retention policies. The following best practices provide a structured approach to effective and safe data purging.

Work with stakeholders to define how long data must be retained (e.g., 3 years for transactions).
Consider legal and regulatory requirements (e.g., GDPR, HIPAA, SOX).
Ensure purging does not break foreign key relationships, audit trails, or reports.
Ensure indexes support the purge filter (e.g., a non-clustered index on CreateDateTime).
Purge in batches to avoid locking and deadlocking due to lock escalation.

CREATE PROCEDURE dbo.TableNamePurge
AS
    BEGIN
        SET NOCOUNT, XACT_ABORT ON;

        /* This will not be performant without an index that leads on CreateDateTime. */
        IF NOT EXISTS (
            SELECT
                *
            FROM
                sys.tables               AS t
            INNER JOIN sys.indexes       AS i ON i.object_id   = t.object_id
            INNER JOIN sys.columns       AS c ON c.object_id   = t.object_id
            INNER JOIN sys.index_columns AS ic ON i.object_id  = ic.object_id
                                               AND i.index_id  = ic.index_id
                                               AND c.column_id = ic.column_id
            WHERE
                t.name                   = N'TableName'
            AND SCHEMA_NAME(t.schema_id) = N'dbo'
            AND c.name                   = N'CreateDateTime'
            AND ic.key_ordinal           = 1
            AND ic.is_included_column    = 0
        )
            BEGIN
                /* This is here to tell someone to create the index. */
                RAISERROR(
                    'No index exists that leads on CreateDateTime. Process can''t run without it. Please create this index:
CREATE INDEX IX_TableName_CreateDateTime
    ON dbo.TableName (CreateDateTime ASC)
    WITH (ONLINE = ON, SORT_IN_TEMPDB = ON);'
                   ,11
                   ,1
                ) WITH NOWAIT;
                RETURN;
            END;

        /* Start the delete and archival. */
        DECLARE
            @MinimumDate datetime2(0) = '0001-01-01'
           ,@MaximumDate datetime2(0) = CAST(DATEADD(YEAR, -1, SYSDATETIME()) AS date);

        WHILE (1 = 1)
            BEGIN
                BEGIN TRY
                    BEGIN TRANSACTION;

                    /* Starting place. */
                    SELECT
                        @MinimumDate = MIN(tn.CreateDateTime)
                    FROM
                        dbo.TableName AS tn
                    WHERE
                        tn.CreateDateTime >= @MinimumDate
                    AND tn.CreateDateTime < @MaximumDate;

                    /* If there's no data to delete, this will be NULL and we can exit. */
                    IF @MinimumDate IS NULL
                        BEGIN
                            COMMIT TRANSACTION;
                            RAISERROR('No more data found to purge/archive, exiting.', 0, 1) WITH NOWAIT;
                            BREAK;
                        END;

                    /* Get the first 1000 rows with a qualifying date. */
                    WITH d
                      AS (
                          SELECT TOP (1000)
                              tn.*
                          FROM
                              dbo.TableName AS tn
                          WHERE
                              tn.CreateDateTime >= @MinimumDate
                          AND tn.CreateDateTime < @MaximumDate
                          ORDER BY
                              tn.TableNameId ASC
                      )
                    DELETE
                    d WITH (ROWLOCK)
                    /* Send output to the archive if you want it archived. */
                    OUTPUT
                        Deleted.ColumnName1
                       ,Deleted.ColumnName2
                    INTO dbo.TableNameArchive (ColumnName1, ColumnName2);

                    COMMIT TRANSACTION;

                END TRY
                BEGIN CATCH
                    IF @@TRANCOUNT > 0
                        BEGIN
                            ROLLBACK TRANSACTION;
                        END;

                    /* Handle the error here, cleanup, et cetera.
                       In most cases it is best to bubble up (THROW) the error to the application/client to be displayed to the user and logged.
                    */
                    THROW;
                END CATCH;
            END;
    END;

Take Care When Scripting Batches 🗗 by Michael J Swart
How to Delete Just Some Rows from a Really Big Table: Fast Ordered Deletes 🗗 by Brent Ozar