See you in Seattle?

Saturday I went from Norway to Seattle with one of my friends. A flight that took over 11 hours. It is good to have a few days up front to shop and so on. The hotel (Hyatt Olive) we are staying at even has a fitness center and swimming pool. So I might even get in shape during this week.

I am really looking forward to SQL Pass that starts on Tuesday. Good speakers and really interesting sessions.

Maybe I will see you there?

Kind regards,
Sindre

How to make your model smaller

One off the biggest mistakes out there is to add columns with unique values to a model. Such as order numbers and orderline numbers. Unless you have a very good reason to do so. This will make your models larger then necessary. And the SSAS engine will not aggregate your data as good as it should.

Lets make a sample. We will use AdventureWorksDW2012 as we always do.

First I will add a couple of columns to my fact

alter table [dbo].[FactInternetSales]
add MostLikelyUniqueVarchar varchar(10),
UniqueIntValue int IDENTITY(1,1)

update [dbo].[FactInternetSales]
set MostLikelyUniqueVarchar = SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 10)

Then I will create a tabular model with the following tables

– DimDate
– FactInternetSales
– DimProduct

And it will look something like this

Tables

I mark DimDate as date table and then deploy my model to the server.

The model is now about 6 MB.

Size 1

Then I replace my FactInternetSales with this query. Still having SalesOrderNumber and SalesOrderLineNumber in it.

select
      ,[OrderDateKey]
      ,[DueDateKey]
      ,[ShipDateKey]
      ,[CustomerKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[SalesTerritoryKey]
      ,[SalesOrderNumber]
      ,[SalesOrderLineNumber]
      ,[RevisionNumber]
      ,[OrderQuantity]
      ,[UnitPrice]
      ,[ExtendedAmount]
      ,[UnitPriceDiscountPct]
      ,[DiscountAmount]
      ,[ProductStandardCost]
      ,[TotalProductCost]
      ,[SalesAmount]
      ,[TaxAmt]
      ,[Freight]
      ,[CarrierTrackingNumber]
      ,[CustomerPONumber]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM [dbo].[FactInternetSales]

The model is then down to about 4 MB.

Size 2

But what If I remove all the unique columns? Such as CarrierTrackingNumber, CustomerPONumber, SalesOrderNumber and SalesOrderLineNumber.

SELECT [ProductKey]
      ,[OrderDateKey]
      ,[DueDateKey]
      ,[ShipDateKey]
      ,[CustomerKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[SalesTerritoryKey]
      ,[OrderQuantity]
      ,[UnitPrice]
      ,[ExtendedAmount]
      ,[UnitPriceDiscountPct]
      ,[DiscountAmount]
      ,[ProductStandardCost]
      ,[TotalProductCost]
      ,[SalesAmount]
      ,[TaxAmt]
      ,[Freight]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM [dbo].[FactInternetSales]

Now the model is reduced to about 3.5 MB. Half the size we began with 🙂

Size 3

So the conclusion is. Do not add columns with unique values in your model. You should only have columns with foreign keys and measures.

Optimize your datatypes for SSAS

Not everyone is aware of the fact that not all data types are supported in SSAS cubes / models. If you use one of the data types that are not supported by SSAS, SSAS will cast all your unsupported data types to supported data types when the cube is being processed.

This will cause your cube processing time to increase. And you will use more CPU than necessary.

Microsoft has provided a list of supported data types here

To avoid these kinds of problems there are several things you can do.

  • Use views as source for your cubes. Then cast the values in these views. SQL Server is better than SSAS to cast values. And it is more likely that your SQL Server has more power that your SSAS server.
  • Make sure your DWH only uses supported SSAS data types. But that’s not to likely is it?

Hardware specification for SSAS

Top things you should consider when buying new hardware for your SSAS server.

CPU
When you query a SSAS multidimensional cube your query might get split between the storage engine (raw data) and the formula engine (complex calculations). The formula engine is single threaded. So it is important to choose a CPU with high frequency. 3.4 Ghz will be much better than 2.1 Ghz.

The CPU should also have as much cache as possible.

Disk
I prefer to always have DAS (Directly Attached Storage) rather than using a SAN. At least 2 fast SSD disks in proper RAID should be perfect. Or you could consider flash memory like Fusion-IO (SanDisk). SSAS multidimensional uses Windows File Cache quite a lot.

I am not using RAID with failover. Because if the server crashes I can create a new virtual server in no-time. And then deploy the cubes to it.

Memory
You should choose as fast memory as possible. And make sure you have enough of it 🙂 SSAS Tabular reads the model into memory. So this will increase you query performance quite a lot.

NUMA
I will not talk too much about NUMA since it’s quite complicated. SSAS Multidimensional should perform quite well using NUMA. SSAS Tabular might have problems on a server with NUMA. I have read some forum posts telling that a query runs faster on a Surface Pro than a high-end server. And that the problem might be because of NUMA. On of the solutions on this is to install a virtual server on top of the server. And then bind one of the sockets to this.

At one of our customer we ended up buying this server (august 2015).

DL380 gen9
2 x Intel Xeon E5-2643v3, 3.4GHz, 6-core, 20MB L3 cache
120GB Memory 2133MHz
2 x 800GB SSD drive
10Gb Ethernet

Please note that this server specification might not suit your needs.

You should always identify bottlenecks before you decide on a new server specification. And then scale the server for your needs.

Marco Russo has written a nice article on “Optimize Hardware Settings for Analysis Services Tabular”

Can Tabular models be optimized by sorting the fact tables?

In multidimensional cubes you can create faster cubes by sorting your facts before you process the cube. By doing this your models will be more compressed (smaller) and your queries will run faster.

Today I wanted to test the same in Tabular. So I created 2 models with one fact table and a few dimensions. The only difference between the models are the fact table. In the first model I used the orginal table (FactInternetSales). And in the other model I used a view against FactInternetSales with a order by on all columns.

These are the tables I used

model

Then I processed and deployed the models

When checking the size of the folders that contains the fact data I was kind of surprised. I looked to be the opposite! The folder for the regular table was actually smaller than the one that represents the view. So the table seemed to be more compressed than the view.

What to you think? Is this caused by the fact that one of the facts is a table and the other one is a view? Or is the fact allready sorted in the table?