Creating a snapshot of your database (SQL)

I am working quite a lot with virtual machines in Hyper-V and VMWare. In these products you have a really useful feature called “snapshot”.

The snapshot feature is most useful when you want to preserve the state of the database so you can return to the same state repeatedly. For example if you are doing a scary update or something you have the possibility to “roll back” these changes.

Did you know that you could make snapshots even in SQL Server?

Lets start by creating a snapshot of the AdventureWorksDW2012 database and create a “super important” table.

-- change database
use [AdventureWorksDW2012]

-- Create a super important table
create table mySuperImportantTable ([ID] int, [Desc] varchar(2000))

-- Insert some values
insert into mySuperImportantTable values (1, 'Secret 1')
insert into mySuperImportantTable values (2, 'Secret 2')
insert into mySuperImportantTable values (3, 'Secret 3')
insert into mySuperImportantTable values (4, 'Secret 4')
insert into mySuperImportantTable values (5, 'Secret 5')

-- Create a snapshot of the database
CREATE DATABASE AdventureWorksDW2012_Mysnapshot ON
(
NAME = AdventureWorksDW2012_Data,
FILENAME = 'C:\db snapshotfile\AdventureWorksDW2012_Mysnapshot.ss' )
AS SNAPSHOT OF [AdventureWorksDW2012];
GO

Then the big mistake happens. We delete the table by accident.

-- Oh no I did a mistake
drop table mySuperImportantTable

Because snapshot files tend to get big I am really curious about the size of it.

Huge file
Ouch… It looks really big 😦 But lets look at the actual disk usage

disk usage

Ahh… only 960 kB. I can live with that.

Okey back to my snapshot. I can write some basic SQL to list all my snapshots.

select * from sys.databases where source_database_id is not null

Lets try to rescue the day by restoring from my snapshot and then get the data from the database

-- I want to restore my database from snapshot
USE MASTER
RESTORE DATABASE AdventureWorksDW2012
FROM DATABASE_SNAPSHOT = 'AdventureWorksDW2012_Mysnapshot'

-- Is my table there?
use AdventureWorksDW2012
select * from mySuperImportantTable
result

Cool. My snapshots saved my day. Then some cleaning. Delete the snapshot and then my sample table.

 -- Drop my snapshot database
DROP DATABASE AdventureWorksDW2012_Mysnapshot ;

-- Drop my table
drop table mySuperImportantTable

See you in Seattle?

Saturday I went from Norway to Seattle with one of my friends. A flight that took over 11 hours. It is good to have a few days up front to shop and so on. The hotel (Hyatt Olive) we are staying at even has a fitness center and swimming pool. So I might even get in shape during this week.

I am really looking forward to SQL Pass that starts on Tuesday. Good speakers and really interesting sessions.

Maybe I will see you there?

Kind regards,
Sindre

Sample queries for KPI in Datazen

When I started working with Datazen I could not find to many samples out there. So to help people getting started I will provide some 🙂

The first DAX query is for creating a KPI in Datazen.

EVALUATE
(
   ROW("Internet total sales", [Internet Total Sales])
)

And with some filtering it might look like something like this

EVALUATE
(
	ROW("Internet total sales", CALCULATE([Internet Total Sales], 
		FILTER('Date', 'Date'[Calendar Year] = 2008)
		)
	)
)

To make a query for trend you will have to use nested summarize. The reason why you have to do this is that you need to have the measure as first column.

A single summarize will produce this result

EVALUATE
(
	SUMMARIZE
	(
		'Date',
		'Date'[Month],
		"Internet Total Sales", 
		CALCULATE([Internet Total Sales], 
			FILTER('Date', 'Date'[Calendar Year] = 2007)
		)
	)
)

The first column is month number and the next column is the measure.

Summarize datazen 1

But we will have to change it so that the measure is the first column.

EVALUATE
(
	SUMMARIZE
	(
		SUMMARIZE
		(
			'Date',
			'Date'[Month],
			"Internet Total Sales", 
			CALCULATE([Internet Total Sales], 
				FILTER('Date', 'Date'[Calendar Year] = 2007)
			)
		),
		[Internet Total Sales],
		'Date'[Month]
	) 
) ORDER BY 'Date'[Month] DESC

And the result

Summarize datazen 2

How to make your model smaller

One off the biggest mistakes out there is to add columns with unique values to a model. Such as order numbers and orderline numbers. Unless you have a very good reason to do so. This will make your models larger then necessary. And the SSAS engine will not aggregate your data as good as it should.

Lets make a sample. We will use AdventureWorksDW2012 as we always do.

First I will add a couple of columns to my fact

alter table [dbo].[FactInternetSales]
add MostLikelyUniqueVarchar varchar(10),
UniqueIntValue int IDENTITY(1,1)

update [dbo].[FactInternetSales]
set MostLikelyUniqueVarchar = SUBSTRING(CONVERT(varchar(255), NEWID()), 0, 10)

Then I will create a tabular model with the following tables

– DimDate
– FactInternetSales
– DimProduct

And it will look something like this

Tables

I mark DimDate as date table and then deploy my model to the server.

The model is now about 6 MB.

Size 1

Then I replace my FactInternetSales with this query. Still having SalesOrderNumber and SalesOrderLineNumber in it.

select
      ,[OrderDateKey]
      ,[DueDateKey]
      ,[ShipDateKey]
      ,[CustomerKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[SalesTerritoryKey]
      ,[SalesOrderNumber]
      ,[SalesOrderLineNumber]
      ,[RevisionNumber]
      ,[OrderQuantity]
      ,[UnitPrice]
      ,[ExtendedAmount]
      ,[UnitPriceDiscountPct]
      ,[DiscountAmount]
      ,[ProductStandardCost]
      ,[TotalProductCost]
      ,[SalesAmount]
      ,[TaxAmt]
      ,[Freight]
      ,[CarrierTrackingNumber]
      ,[CustomerPONumber]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM [dbo].[FactInternetSales]

The model is then down to about 4 MB.

Size 2

But what If I remove all the unique columns? Such as CarrierTrackingNumber, CustomerPONumber, SalesOrderNumber and SalesOrderLineNumber.

SELECT [ProductKey]
      ,[OrderDateKey]
      ,[DueDateKey]
      ,[ShipDateKey]
      ,[CustomerKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[SalesTerritoryKey]
      ,[OrderQuantity]
      ,[UnitPrice]
      ,[ExtendedAmount]
      ,[UnitPriceDiscountPct]
      ,[DiscountAmount]
      ,[ProductStandardCost]
      ,[TotalProductCost]
      ,[SalesAmount]
      ,[TaxAmt]
      ,[Freight]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
  FROM [dbo].[FactInternetSales]

Now the model is reduced to about 3.5 MB. Half the size we began with 🙂

Size 3

So the conclusion is. Do not add columns with unique values in your model. You should only have columns with foreign keys and measures.

Optimize your datatypes for SSAS

Not everyone is aware of the fact that not all data types are supported in SSAS cubes / models. If you use one of the data types that are not supported by SSAS, SSAS will cast all your unsupported data types to supported data types when the cube is being processed.

This will cause your cube processing time to increase. And you will use more CPU than necessary.

Microsoft has provided a list of supported data types here

To avoid these kinds of problems there are several things you can do.

  • Use views as source for your cubes. Then cast the values in these views. SQL Server is better than SSAS to cast values. And it is more likely that your SQL Server has more power that your SSAS server.
  • Make sure your DWH only uses supported SSAS data types. But that’s not to likely is it?

Hardware specification for SSAS

Top things you should consider when buying new hardware for your SSAS server.

CPU
When you query a SSAS multidimensional cube your query might get split between the storage engine (raw data) and the formula engine (complex calculations). The formula engine is single threaded. So it is important to choose a CPU with high frequency. 3.4 Ghz will be much better than 2.1 Ghz.

The CPU should also have as much cache as possible.

Disk
I prefer to always have DAS (Directly Attached Storage) rather than using a SAN. At least 2 fast SSD disks in proper RAID should be perfect. Or you could consider flash memory like Fusion-IO (SanDisk). SSAS multidimensional uses Windows File Cache quite a lot.

I am not using RAID with failover. Because if the server crashes I can create a new virtual server in no-time. And then deploy the cubes to it.

Memory
You should choose as fast memory as possible. And make sure you have enough of it 🙂 SSAS Tabular reads the model into memory. So this will increase you query performance quite a lot.

NUMA
I will not talk too much about NUMA since it’s quite complicated. SSAS Multidimensional should perform quite well using NUMA. SSAS Tabular might have problems on a server with NUMA. I have read some forum posts telling that a query runs faster on a Surface Pro than a high-end server. And that the problem might be because of NUMA. On of the solutions on this is to install a virtual server on top of the server. And then bind one of the sockets to this.

At one of our customer we ended up buying this server (august 2015).

DL380 gen9
2 x Intel Xeon E5-2643v3, 3.4GHz, 6-core, 20MB L3 cache
120GB Memory 2133MHz
2 x 800GB SSD drive
10Gb Ethernet

Please note that this server specification might not suit your needs.

You should always identify bottlenecks before you decide on a new server specification. And then scale the server for your needs.

Marco Russo has written a nice article on “Optimize Hardware Settings for Analysis Services Tabular”

Using subqueries in MDX

Did you know that you can write subqueries in MDX? It might be handy if you have some complex dimension filters.

Lets take a basic sample

SELECT [Measures].[Internet Total Sales] ON 0,
[Date].[Calendar Year].[Calendar Year] ON 1
FROM [Adventure Works Internet Sales Model]

And the result will look like this

Subqueries 1

If you only want to get sales for the year of 2007 it can be achieved with a subquery.

SELECT [Measures].[Internet Total Sales] ON 0,
[Date].[Calendar Year].[Calendar Year] ON 1
FROM 
( 
	SELECT [Date].[Calendar Year].&[2007] ON 0
	FROM [Adventure Works Internet Sales Model]
)

Subqueries 2

Yes I know this can be achieved in simpler ways 🙂 But I had some problems figuring out better samples

Useful filters for your time dimension (DAX / Tabular)

It is always nice to have some time filters in your Tabular model. Because of reuse I try to create most of these filters in the data source. But if that is not an option you can off course add them directly to your Tabular model. So here are some useful samples.

The first one will give you a flag on the last date with sales.

First lets create a new measure.

Last Sales Date:=LASTNONBLANK('Date'[Date]; CALCULATE(SUM('Internet Sales'[Sales Amount])))

And then add a new column in the date dimension. I my sample I will call this column “Last Date With Sales”

=IF('Date'[Date] = calculate('Date'[Last Sales Date];ALL('Date'));1;0)

So if you need the last date with sales in your report you can add the dimension attribute “Last Date With Sales” and then filter the value to 1. This is quite handy if you are using date functions such as TotalYTD or TotalMTD. Please note that you can use the FIRSTNONBLANK functions to. This will return the first date with sales.

The next flag will be a flag that indicates tomorrows date.

To make this flag I will create a new column and add the following expression

=IF('Date'[Date] = TODAY()-1;1;0)

To test this in AdventureWork you first have to figure out how many days to subtract. This can be done by using the following SQL

select datediff(day, '2010/01/02', getdate())

And then change your expression to something like this

=IF('Date'[Date] = TODAY()-1983;1;0)

To get the current month you can add a column like this

=IF(Month('Date'[Date]) = MONTH(TODAY()-1) && YEAR('Date'[Date]) = YEAR(TODAY()-1);1;0)

And to get the current year you can add a column like this

=IF(YEAR('Date'[Date]) = YEAR(TODAY()-1);1;0)

Use you inactive relations with “USERELATIONSHIP” in DAX

In Tabular models you can only have one active relationship between two tables. But you can have multiple inactive ones. In the image below you can see how the active relations are marked with a solid line. And then inactive ones are marked as dotted lines.

Relationsships

So how do you use your inactive ones?

I start with a basic query that gives me the sales per year. Because I did not specify a relationsship the active relationship is used. In this case the order date.

EVALUATE
 (
    ADDCOLUMNS (
        VALUES ( 'Date'[Calendar Year] ),
        "Total sales", 'Internet Sales'[Internet Total Sales]
    )
)

And the result

Relation 1

So if I want to use an inactive relationship too I need to use the USERELATIONSHIP function. Below is an example on how to get both sales per order year and sales per shipment year in the same query.

DEFINE
    MEASURE 'Internet Sales'[Total Sales By ShipmentDate] =
        CALCULATE (
            'Internet Sales'[Internet Total Sales],
            USERELATIONSHIP ( 'Internet Sales'[ShipDateKey], 'Date'[DateKey] )
        )
EVALUATE
 (
    ADDCOLUMNS (
        VALUES ( 'Date'[Calendar Year] ),
        "Total sales", 'Internet Sales'[Internet Total Sales],
        "Total sales by ShipmentDate", 'Internet Sales'[Total Sales By ShipmentDate]
    )
)

And the result will look like this

Relation 2

Search for dimension members in DAX using the “search” function

If you want to search for specific dimension members you can use the “Search” function in DAX. The “Search” function also support wildcards.

So if you want sales for every products that contains the word “Mountain” you can write a query like this.

EVALUATE
FILTER (
    CALCULATETABLE (
        ADDCOLUMNS (
            VALUES ( 'Product'[Product Name] ),
            "Sum of Sales Amount", 'Internet Sales'[Internet Total Sales]
        )
    ),
    'Internet Sales'[Internet Total Sales] <> 0
        && SEARCH ( "*Mountain*", 'Product'[Product Name], 1, 0 ) <> 0
)

And the result will look like this

Search 1

Below is a more complex sample. It returns all products that contains the word “Mountain”. But it also returns the percentage of the total for all products containing the word “mountain”

DEFINE
    MEASURE 'Internet Sales'[Total] =
        CALCULATE (
            'Internet Sales'[Internet Total Sales],
            FILTER (
                ALL ( 'Product'[Product Name] ),
                SEARCH ( "*Mountain*", 'Product'[Product Name], 1, 0 ) <> 0
            )
        )
EVALUATE
FILTER (
    CALCULATETABLE (
        ADDCOLUMNS (
            VALUES ( 'Product'[Product Name] ),
            "Sum of Sales Amount", 'Internet Sales'[Internet Total Sales],
            "Percent of all mountain", FORMAT (
                DIVIDE (
                    'Internet Sales'[Internet Total Sales],
                    'Internet Sales'[Total]
                ),
                "Percent"
            )
        )
    ),
    'Internet Sales'[Internet Total Sales] <> 0
        && SEARCH ( "*Mountain*", 'Product'[Product Name], 1, 0 ) <> 0
)

And the results look like this

Search 2