When loading hub s and satellites the best option you have in SQL Server is to use merge for loading data vault raw. This is because it’s fail gracefully when it loads an existing hub id leaving out those in staging already loaded. It’ll easily insert new satellite rows for not matched records when you’re hash diff is not matching.
Here is a little code snippet for a hub and a satellite. The example is to load a set of records which are associated with the Monash Model for health localities. You can also use a simpler insert for hub loads such is this
CREATE PROCEDURE [dbo].[Load_Hub_Monash_Model]
@executionId uniqueidentifier = ''
AS
DECLARE @RecordSource nvarchar(100)
DECLARE @DefaultValidFrom datetime2(0) --use datetime2(0) to remove milliseconds
DECLARE @DefaultValidTo datetime2(0)
DECLARE @LoadDateTime datetime2(0)
SET @RecordSource = N'CDR_Artius_Staging MDS Stg_Monash_Model'
SET @DefaultValidFrom = '1900-01-01'
SET @DefaultValidTo = '9999-12-31'
SET @LoadDateTime = GETDATE()
-- Find existing Hub Hashkeys
--; WITH GetHubkeys
--AS
--(SELECT H_Monash_Model_key, H_MM_Locality_Key FROM H_MonashModel
--)
-- Insert new Hub Records
INSERT INTO dbo.H_MonashModel
(H_Monash_Model_key, H_MM_Locality_Key, H_MM_Source, H_MM_Load_Date)
SELECT
dbo.Create_DV_Hash( CONCAT( SMM.Town_Locality, SMM.Postcode)), CONCAT(SMM.Town_Locality, SMM.Postcode), @RecordSource, @LoadDateTime
FROM [$(DatavaultDB1)].Staging.Stg_Monash_Model SMM
WHERE NOT EXISTS ( SELECT H_Monash_Model_Key FROM H_MonashModel HMM WHERE CONCAT( SMM.Town_Locality, SMM. Postcode) = HMM.H_MM_Locality_Key );
RETURN 0
The table for this would like a but like this
CREATE TABLE [dbo].[H_MonashModel]
(
[H_Monash_Model_key] VARBINARY(128) NOT NULL,
[H_MM_Locality_Key] VARCHAR(85) NOT NULL,
[H_MM_Source] VARCHAR(150) NULL,
[H_MM_Load_Date] DATETIME2 NULL,
CONSTRAINT [PK_H_MonashModel] PRIMARY KEY ([H_MM_Locality_Key])
The code to load the satellite would look a bit like this. This code is using DV version 1 style with an end date for a record. Don’t do that, it has performance issues with updating records when you have many millions of rows
CREATE PROCEDURE [dbo].[Load_Sat_Monash_Model]
@executionId uniqueidentifier = ''
AS
DECLARE @RecordSource nvarchar(100)
DECLARE @DefaultValidFrom datetime2(0) --use datetime2(0) to remove milliseconds
DECLARE @DefaultValidTo datetime2(0)
DECLARE @LoadDateTime datetime2(0)
SET @RecordSource = N'CDR_Artius_Staging MDS Stg_Monash_Model'
SET @DefaultValidFrom = '1900-01-01'
SET @DefaultValidTo = '9999-12-31'
SET @LoadDateTime = GETDATE()
SET NOCOUNT ON
SET XACT_ABORT ON
BEGIN TRY
BEGIN TRANSACTION
--Insert new current records for changed records
INSERT INTO S_MonashModel
( S_Monash_Model_Key, S_MM_Source, S_MM_Load_Date, S_MM_Town_Locality, S_MM_State_Territory,S_MM_Postcode, S_MM_Monash_Code, S_Hash_Diff)
SELECT
H_Monash_Model_key,
RecSource,
LoadDate,
Town_Locality,
State_Territory,
Postcode,
MM_Code,
Sat_Hash_Diff
FROM -- Main
(
MERGE S_MonashModel AS Target
USING
(SELECT
-- query distinct values from source (staging)
hub.H_Monash_Model_key, -- Hash Key
@RecordSource AS RecSource, -- Records Source
@LoadDateTime AS LoadDate, -- Load Date time Stamp
stage.Town_Locality,
stage.State_Territory,
stage.Postcode,
stage.MM_Code,
dbo.Create_DV_Hash(CONCAT(Town_Locality, State_Territory, Postcode, MM_Code)) Sat_Hash_Diff
FROM [$(DatavaultDB1)].[Staging].[Stg_Monash_Model] stage
left outer join dbo.H_MonashModel hub on hub.H_Monash_Model_key = dbo.Create_DV_Hash(CONCAT(stage.Postcode, stage.Town_Locality))
--where hub.H_Monash_Model_Key is not null
) AS Source
ON target.S_Monash_Model_Key = dbo.Create_DV_Hash(CONCAT(Source.Postcode, Source.Town_Locality))
AND target.S_End_Rec_Date IS NULL
-- End Date existing Record
WHEN MATCHED AND
(target.S_MM_Town_Locality = Source.Town_Locality
AND target.S_MM_Postcode = Source.Postcode
)
THEN UPDATE SET
S_End_Rec_Date = @LoadDateTime
WHEN NOT MATCHED BY TARGET
THEN INSERT
(
S_Monash_Model_Key, S_MM_Source, S_MM_Load_Date, S_MM_Town_Locality, S_MM_State_Territory,S_MM_Postcode, S_MM_Monash_Code, S_End_Rec_Date
)
VALUES
(
dbo.Create_DV_Hash(CONCAT(Source.Postcode, Source.Town_Locality)),
@RecordSource,
@LoadDateTime,
Town_Locality,
State_Territory,
Postcode,
MM_Code,
NULL -- We add a date when we expire the record
)
-- Output Changed Records
OUTPUT
$action AS Action
,Source.*
) as MergeOutput
WHERE MergeOutput.ACTION = 'UPDATE'
COMMIT TRAN
SELECT
'Success' AS ExecutionResult
RETURN;
-- End Main
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK
SELECT
'Failure' as ExecutionResult,
ERROR_MESSAGE() AS ErrorMessage;
RETURN;
END CATCH
GO
Some points, this is a slightly different way of using merge, the insert merge
First we set up what looks like a fairly normal insert and then drive that from a subquery in the from which is a merge statement. Don’t for get you can use CTE and subqueries in the merge to help with creating the data set you want to load
This would be loading to a table similar to this
CREATE TABLE [dbo].[S_MonashModel]
(
[S_Monash_Model_Key] VARBINARY(128) NOT NULL ,
[S_MM_Source] VARCHAR(150) NULL,
[S_MM_Load_Date] DATETIME2 NULL,
[S_MM_Town_Locality] VARCHAR(75) NULL,
[S_MM_State_Territory] VARCHAR(40) NULL,
[S_MM_Postcode] CHAR(5) NULL,
[S_MM_Monash_Code] TINYINT NULL,
[S_Hash_Diff] VARBINARY(128) NULL,
CONSTRAINT [PK_S_MonashModel] PRIMARY KEY ([S_Monash_Model_Key])
)