One off the biggest mistakes out there is to add columns with unique values to a model. Such as order numbers and orderline numbers. Unless you have a very good reason to do so. This will make your models larger then necessary. And the SSAS engine will not aggregate your data as good as it should.
Lets make a sample. We will use AdventureWorksDW2012 as we always do.
First I will add a couple of columns to my fact
alter table [dbo].[FactInternetSales] add MostLikelyUniqueVarchar varchar(10), UniqueIntValue int IDENTITY(1,1) update [dbo].[FactInternetSales] set MostLikelyUniqueVarchar = SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 10)
Then I will create a tabular model with the following tables
– DimDate
– FactInternetSales
– DimProduct
And it will look something like this
I mark DimDate as date table and then deploy my model to the server.
The model is now about 6 MB.
Then I replace my FactInternetSales with this query. Still having SalesOrderNumber and SalesOrderLineNumber in it.
select ,[OrderDateKey] ,[DueDateKey] ,[ShipDateKey] ,[CustomerKey] ,[PromotionKey] ,[CurrencyKey] ,[SalesTerritoryKey] ,[SalesOrderNumber] ,[SalesOrderLineNumber] ,[RevisionNumber] ,[OrderQuantity] ,[UnitPrice] ,[ExtendedAmount] ,[UnitPriceDiscountPct] ,[DiscountAmount] ,[ProductStandardCost] ,[TotalProductCost] ,[SalesAmount] ,[TaxAmt] ,[Freight] ,[CarrierTrackingNumber] ,[CustomerPONumber] ,[OrderDate] ,[DueDate] ,[ShipDate] FROM [dbo].[FactInternetSales]
The model is then down to about 4 MB.
But what If I remove all the unique columns? Such as CarrierTrackingNumber, CustomerPONumber, SalesOrderNumber and SalesOrderLineNumber.
SELECT [ProductKey] ,[OrderDateKey] ,[DueDateKey] ,[ShipDateKey] ,[CustomerKey] ,[PromotionKey] ,[CurrencyKey] ,[SalesTerritoryKey] ,[OrderQuantity] ,[UnitPrice] ,[ExtendedAmount] ,[UnitPriceDiscountPct] ,[DiscountAmount] ,[ProductStandardCost] ,[TotalProductCost] ,[SalesAmount] ,[TaxAmt] ,[Freight] ,[OrderDate] ,[DueDate] ,[ShipDate] FROM [dbo].[FactInternetSales]
Now the model is reduced to about 3.5 MB. Half the size we began with 🙂
So the conclusion is. Do not add columns with unique values in your model. You should only have columns with foreign keys and measures.