How to partition a table on a non primary key field

2017-10-31

Question

_{Tuesday, October 31, 2017 4:14 AM}

I have a huge table I am wanting to partition. The table's current primary key is on the ID column (and thus referenced by Fky's). I want to partition on a datetime field. I tried to make the Pky non-clustered, and clustered on the datetime field, but getting this error: Partition columns for a unique index must be a subset of the index key.

Can someone tell me what I am doing wrong, if this is possible and how to get around it?

All replies (6)

_{Tuesday, October 31, 2017 1:17 PM ✅Answered | 1 vote}

The table itself can be partitioned on the datetime column but the PK index cannot be partitioned unless you add the partitioning column to the key. You can use your existing primary key on ID alone and still partition the table if the PK index is nonclustered and not partitioned.

The example below creates the partitioned table with a non-partitioned primary key. Note that the table and indexes won't be aligned with this method so you won't be able to use SWITCH or partition-level TRUNCATE.

CREATE PARTITION FUNCTION PF_DateTime(datetime2)
    AS RANGE RIGHT FOR VALUES();
CREATE PARTITION SCHEME PS_DateTime
    AS PARTITION PF_DateTime ALL TO ([PRIMARY]);
CREATE TABLE dbo.Example(
      ID int NOT NULL IDENTITY
        --non-partitioned PK
        CONSTRAINT PK_Example PRIMARY KEY NONCLUSTERED (ID) ON [PRIMARY]
    , DateTime datetime2
        --partitioned non-unique clustered index
        INDEX cdx CLUSTERED ON PS_DateTime(DateTime)
);

Dan Guzman, Data Platform MVP, http://www.dbdelta.com

_{Tuesday, October 31, 2017 5:15 AM}

Hi JasonDWilson77,

>>Can someone tell me what I am doing wrong, if this is possible and how to get around it?

The message is pretty self-explanatory, basically you have three options:

Not to use partition at all.
Not to use unique key to enforce uniqueness. Use trigger instead.
Include your partition column in all your primary key and unique key to workaround the limitation. Or consider using some different partition column.

For more information, please refer to BOL.

If you have any other questions, please let me know.

Regards,
Lin

MSDN Community Support
Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact [email protected].

_{Tuesday, October 31, 2017 12:56 PM}

Thanks for the reply. My situation seems like it would be a pretty common scenario partitioning should be able to handle. Partition on date and be able to use an Identity column for what it was meant for.

_{Tuesday, October 31, 2017 1:23 PM}

Thank you Dan! That helps a ton! I was creating the table on the partition, then adding the Pky as NC, then trying to add a clustered index on the partition and did not realize that was the issue. This is perfect. Thanks!

_{Friday, November 3, 2017 2:42 PM}

Dan, this technically worked, but I am not seeing a ton of space moved from my primary fg to the new partitioned fg.

Originally my entire DB was in one .mdf file (Primary fg) and that .mdf was 1.5tb. ~80% of that was in this one logging type table called DeviceEvent, more specifically that table's clustered idx, which was just the ID column (see below) made up about 1.2TB alone.

I implemented partitioning function and scheme by month based off of a datetime stamp field in that table. I changed the Pky to NC, then created a clustered idx on that datetime stamp field using my partitioning. That appears to work (based off of data dispersement, and filesize changes in the monthly data files in my partitioned FG), but that new FG makes up only about 250GB. On the flip side the Primary FG (original .mdf file) has actually increased in size as well. My overall goal is to make this table more performant, and get it so no data file is HUGE > 1TB as well. We expect this data to keep growing.

Here is my table definition and new clustered index:

CREATE TABLE [dbo].[DeviceEvent](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[EventCategory] [varchar](255) NOT NULL,
[EventType] [varchar](50) NOT NULL,
[EventId] [varchar](50) NULL,
[EventTime] [datetime] NOT NULL,
[LocationId] [varchar](50) NULL,
[ModuleId] [varchar](50) NULL,
[FormId] [varchar](50) NULL,
[ScreenId] [varchar](50) NULL,
[QuestionId] [varchar](50) NULL,
[Response] [varchar](max) NULL,
[Reported] [datetime] NOT NULL,
[DeviceId] [varchar](100) NOT NULL,
[LocalEventTime] [datetime] NULL,
CONSTRAINT [PK_DeviceEvents] PRIMARY KEY NONCLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
)

CREATE CLUSTERED INDEX ixEventTime
ON [dbo].[DeviceEvent] (EventTime)
ON MonthlyPS(EventTime)

fyi: This is the model I used for my partioning: https://www.brentozar.com/archive/2013/01/sql-server-table-partitioning-tutorial-videos-and-scripts/

_{Friday, November 3, 2017 10:22 PM}

I implemented partitioning function and scheme by month based off of a datetime stamp field in that table. I changed the Pky to NC, then created a clustered idx on that datetime stamp field using my partitioning. That appears to work (based off of data dispersement, and filesize changes in the monthly data files in my partitioned FG), but that new FG makes up only about 250GB. On the flip side the Primary FG (original .mdf file) has actually increased in size as well.

Run the query below to identify space used by each object along with the partition info and/or file group;

SELECT
    OBJECT_NAME(p.object_id) AS ObjectName, 
    i.name AS IndexName, 
    p.index_id AS IndexID, 
    ds.name AS PartitionScheme, 
    p.partition_number AS PartitionNumber, 
    fg.name AS FileGroupName, 
    prv_left.value AS LowerBoundaryValue, 
    prv_right.value AS UpperBoundaryValue, 
    CASE pf.boundary_value_on_right WHEN 1 THEN 'RIGHT' ELSE 'LEFT' END AS PartitionFunctionRange, 
    p.[rows] AS [Rows],
    pst.used_page_count AS UsedPages,
    pst.used_page_count * 8 AS UsedSpaceKB,
    pst.reserved_page_count as ReservedPages,
    pst.reserved_page_count * 8 as ReservedSpaceKB
FROM
    sys.partitions AS p 
    INNER JOIN sys.dm_db_partition_stats AS pst ON p.partition_id = pst.partition_id
    INNER JOIN sys.indexes AS i ON i.[object_id] = p.[object_id] AND i.index_id = p.index_id
    INNER JOIN sys.data_spaces AS ds ON ds.data_space_id = i.data_space_id
    INNER JOIN sys.partition_schemes AS ps ON ps.data_space_id = ds.data_space_id
    INNER JOIN sys.partition_functions AS pf ON pf.function_id = ps.function_id
    INNER JOIN sys.destination_data_spaces AS dds2 ON dds2.partition_scheme_id = ps.data_space_id AND dds2.destination_id = p.partition_number
    INNER JOIN sys.filegroups AS fg ON fg.data_space_id = dds2.data_space_id
    LEFT OUTER JOIN sys.partition_range_values AS prv_left ON ps.function_id = prv_left.function_id AND prv_left.boundary_id = p.partition_number - 1
    LEFT OUTER JOIN sys.partition_range_values AS prv_right ON ps.function_id = prv_right.function_id AND prv_right.boundary_id = p.partition_number
WHERE
    OBJECTPROPERTY(p.object_id, 'IsMSShipped') = 0
UNION ALL
SELECT
    OBJECT_NAME(p.object_id) AS ObjectName, 
    i.name AS IndexName,
    p.index_id AS IndexID,
    NULL AS PartitionScheme, 
    p.partition_number AS PartitionNumber, 
    fg.name AS FileGroupName, 
    NULL AS LowerBoundaryValue,
    NULL AS UpperBoundaryValue, 
    NULL AS PartitionFunctionRange, 
    p.[rows] AS [Rows],
    pst.used_page_count AS UsedPages,
    pst.used_page_count * 8 AS UsedSpaceKB,
    pst.reserved_page_count as ReservedPages,
    pst.reserved_page_count * 8 as ReservedSpaceKB
FROM
    sys.partitions AS p 
    INNER JOIN sys.dm_db_partition_stats AS pst ON p.partition_id = pst.partition_id
    INNER JOIN sys.indexes AS i ON i.object_id = p.object_id AND i.index_id = p.index_id
    INNER JOIN sys.data_spaces AS ds ON ds.data_space_id = i.data_space_id
    INNER JOIN sys.filegroups AS fg ON fg.data_space_id = i.data_space_id
WHERE
    OBJECTPROPERTY(p.object_id, 'IsMSShipped') = 0
ORDER BY
    ObjectName, 
    IndexID, 
    PartitionNumber;

My overall goal is to make this table more performant, and get it so no data file is HUGE > 1TB as well. We expect this data to keep growing.

Partitioning improves manageability of large tables but performance is mostly a matter of query and index tuning. Partitioning will improve performance of processes that can employ SWITCH to load/purge large amounts of data efficiently and in specialized use cases where partitions can be joined in parallel. Partitioning can also result in worse performance depending on the queries, workload, and whether the partitioning column is specified in queries.

Dan Guzman, Data Platform MVP, http://www.dbdelta.com

Share via

How to partition a table on a non primary key field

Question

All replies (6)

Additional resources