Use “descendants” in MDX

In the sample today I will show you how to use the “descendants” function in MDX. In this sample I want to count how many days and months I have sold stuff. And then calculate the average sales per month and date.

WITH MEMBER DaysWithSales AS
  Count ( Descendants ( [Ship Date].[Calendar].CurrentMember, [Ship Date].[Calendar].[Date] ) )

MEMBER MonthWithSales AS
  Count ( Descendants ( [Ship Date].[Calendar].CurrentMember, [Ship Date].[Calendar].[Month] ) )

MEMBER [Avg Sales Per Month] AS
  divide ( [Measures].[Internet Sales Amount], [Measures].[MonthWithSales] )

MEMBER [Avg Sales Per Day] AS
  divide ( [Measures].[Internet Sales Amount], [Measures].[DaysWithSales] )

SELECT NON EMPTY [Ship Date].[Calendar].[Calendar Year].&[2013].Children ON 1,
{ 
   [Measures].[Internet Sales Amount], 
   [Measures].[MonthWithSales], 
   [Measures].[DaysWithSales], 
   [Measures].[Avg Sales Per Month], 
   [Measures].[Avg Sales Per Day] 
} ON 0
FROM [Adventure Works]

And the result will look like this

Descendants

“Descendants” have lots of different parameters. Please check them out on MSDN ๐Ÿ™‚

How to use “CrossJoin” in DAX

The “CrossJoin” function returns the Cartesian product of all rows from all tables in the arguments. In my sample below I want to have sales per year per city.

EVALUATE
ADDCOLUMNS (
SUMMARIZE (
CROSSJOIN (
VALUES ( 'Date'[Calendar Year] ),
VALUES ( Geography[City] )
),
'Date'[Calendar Year],
Geography[City]
),
"Sum of Sales", CALCULATE ( SUM ( 'Internet Sales'[Sales Amount] ) )
)

And the result looks like this

CrossJoin1

But as you can see I get a lot of empty rows. To remove these empty rows I add a filter to my query. I also moved my calculation outside the main query to make it more readable.

DEFINE
MEASURE 'Internet Sales'[Sales] =
SUM ( 'Internet Sales'[Sales Amount] )
EVALUATE
ADDCOLUMNS (
FILTER (
SUMMARIZE (
CROSSJOIN (
VALUES ( 'Date'[Calendar Year] ),
VALUES ( Geography[City] )
),
'Date'[Calendar Year],
Geography[City]
),
'Internet Sales'[Sales] <> 0
),
"Sum of Sales", 'Internet Sales'[Sales]
)

CrossJoin 1

Use “except” in MDX to remove dimension members

In this post I will explain how the “except” function works in MDX.

I have a basic query that looks like this

SELECT [Measures].[Internet Sales Amount] ON 0,
NON EMPTY [Product].[Subcategory].[Subcategory] ON 1
FROM [Adventure Works]

And the output will look like this

except 1

But the customer wants me to remove bikes from the resultset. To do this I extend my query to use the “filter” function.

	
SELECT [Measures].[Internet Sales Amount] ON 0,
NON EMPTY Except (
  [Product].[Subcategory].[Subcategory],
  { [Product].[Subcategory].&[2], [Product].[Subcategory].&[1], [Product].[Subcategory].&[3] }
) ON 1
FROM [Adventure Works]

except 2

Lets sort the resultset using the order function

SELECT [Measures].[Internet Sales Amount] ON 0,
NON EMPTY Order (
  Except (
    [Product].[Subcategory].[Subcategory],
    { [Product].[Subcategory].&[2], [Product].[Subcategory].&[1], [Product].[Subcategory].&[3] }
  ),
  [Measures].[Internet Sales Amount],
  DESC
) ON 1
FROM [Adventure Works]

except 3

Nice!!

Basic query optimalization in DAX

What can we do to make queries as fast as possible? Here is some tips ๐Ÿ™‚

In my simple test I want to see the sales for year 2009. To make sure my queries does not hit any cache I’m clearing it using XMLA. In my first query I will try to use the ROW function and sum up the internet sales

EVALUATE
ROW (
"Sales", CALCULATE (
SUM ( 'Internet Sales'[Sales Amount] ),
'Date'[Calendar Year] = 2008
)
)

When watching the profiler I can see that this query took 156 milliseconds. The reason why this takes quite long is that we are one the row level.

Lets switch to table level.

EVALUATE
FILTER (
SUMMARIZE (
'Date',
'Date'[Calendar Year],
"Sales", SUM ( 'Internet Sales'[Sales Amount] )
),
'Date'[Calendar Year] = 2008
)

This query took 78 milliseconds. Twice as fast as the first query.

What if I move the calculation using ADDCOLUMNS outside the SUMMARIZE?

EVALUATE
ADDCOLUMNS (
FILTER (
SUMMARIZE ( 'Date', 'Date'[Calendar Year] ),
'Date'[Calendar Year] = 2008
),
"Sales", CALCULATE ( SUM ( 'Internet Sales'[Sales Amount] ) )
)

This query took 62 milliseconds

Do you have an even better optimalization tip? Feel free to drop me an mail

Use “between” when comparing dates in SQL

Recently I was at the SQL Saturday in Oslo. In one of the lessons (with Joe Chang) I was told that “between” was faster than using “month”, “year” and similar functions when comparing dates. They use the index in a better way.

I did a quick test this evening. And here is my findings.

--Create and index
CREATE NONCLUSTERED INDEX IX_FactInternetSales_OrderDate 
ON dbo.FactInternetSales
(
	OrderDate
) 
WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

-- Turn on statistics
SET STATISTICS IO ON

-- Query 1
select * from FactInternetSales
where year(OrderDate) = 2005 and month(OrderDate) = 07

-- Query 2
select * from FactInternetSales
where OrderDate between '2005/07/01' and '2005/07/31'

The IO statistics looked like this

(146 row(s) affected)
Table 'FactInternetSales'. Scan count 1, logical reads 689, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

(146 row(s) affected)
Table 'FactInternetSales'. Scan count 1, logical reads 459, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

As you can see much less logical read on query 2.

And the actual execution plan looked like this

QueryPlanDate

The conclusion is that mr Chang knows what he is talking about ๐Ÿ™‚

Optimize “Insert into” in SQL

In this post I will give you some advice on how to extract data from a large table into a new table. In this case we are using the “INSERT INTO” statement. When working with large tables you want to have a query that runs fast and uses as little logging as possible.

Consider this query

USE [AdventureWorksDW2012]

CREATE TABLE [dbo].[FactInternetSales_new](
	[ProductKey] [int] NOT NULL,
	[OrderDateKey] [int] NOT NULL,
	[DueDateKey] [int] NOT NULL,
	[ShipDateKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[SalesTerritoryKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [tinyint] NOT NULL,
	[RevisionNumber] [tinyint] NOT NULL,
	[OrderQuantity] [smallint] NOT NULL,
	[UnitPrice] [money] NOT NULL,
	[ExtendedAmount] [money] NOT NULL,
	[UnitPriceDiscountPct] [float] NOT NULL,
	[DiscountAmount] [float] NOT NULL,
	[ProductStandardCost] [money] NOT NULL,
	[TotalProductCost] [money] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[TaxAmt] [money] NOT NULL,
	[Freight] [money] NOT NULL,
	[CarrierTrackingNumber] [nvarchar](25) NULL,
	[CustomerPONumber] [nvarchar](25) NULL,
	[OrderDate] [datetime] NULL,
	[DueDate] [datetime] NULL,
	[ShipDate] [datetime] NULL
) ON [PRIMARY]

CREATE NONCLUSTERED INDEX IX_FactInternetSales_new ON dbo.FactInternetSales_new
(
	CustomerKey
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
ON [PRIMARY]

DECLARE @t1 DATETIME;
DECLARE @t2 DATETIME;

SET @t1 = GETDATE();
insert into [dbo].[FactInternetSales_new]
select * 
from [dbo].[FactInternetSales]
SET @t2 = GETDATE();
SELECT DATEDIFF(millisecond,@t1,@t2) AS elapsed_ms;

drop table [dbo].[FactInternetSales_new]

The output for this query will be

Insert 1

If I change the query so that the destination table is a heap (no index) and adds the “TABLOCK” query hint I will gain a minimally logged operation. And when the data is over I will create my index.

Lets take a look at the script

USE [AdventureWorksDW2012]

CREATE TABLE [dbo].[FactInternetSales_new_fast_one](
	[ProductKey] [int] NOT NULL,
	[OrderDateKey] [int] NOT NULL,
	[DueDateKey] [int] NOT NULL,
	[ShipDateKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[SalesTerritoryKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [tinyint] NOT NULL,
	[RevisionNumber] [tinyint] NOT NULL,
	[OrderQuantity] [smallint] NOT NULL,
	[UnitPrice] [money] NOT NULL,
	[ExtendedAmount] [money] NOT NULL,
	[UnitPriceDiscountPct] [float] NOT NULL,
	[DiscountAmount] [float] NOT NULL,
	[ProductStandardCost] [money] NOT NULL,
	[TotalProductCost] [money] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[TaxAmt] [money] NOT NULL,
	[Freight] [money] NOT NULL,
	[CarrierTrackingNumber] [nvarchar](25) NULL,
	[CustomerPONumber] [nvarchar](25) NULL,
	[OrderDate] [datetime] NULL,
	[DueDate] [datetime] NULL,
	[ShipDate] [datetime] NULL
) ON [PRIMARY]

DECLARE @t1 DATETIME;
DECLARE @t2 DATETIME;

SET @t1 = GETDATE();
INSERT INTO [dbo].[FactInternetSales_new_fast_one] WITH (TABLOCK)
select * 
from [dbo].[FactInternetSales]

CREATE NONCLUSTERED INDEX IX_FactInternetSales_new ON dbo.[FactInternetSales_new_fast_one]
(
	CustomerKey
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
ON [PRIMARY]

SET @t2 = GETDATE();
SELECT DATEDIFF(millisecond,@t1,@t2) AS elapsed_ms;

drop table [dbo].[FactInternetSales_new_fast_one]

This query will run in about half the time on my laptop

Insert 2

Save your deleted or updated data with output clause in SQL

I am always kind of nervous when I am deleting data from a production environment. Or updating large datasets. To be able of rolling back changes you can save you data into a backup table by using the output clause.

Lets take a look at an example

create table myTest (ID int, name varchar(50))

insert into myTest values (1, 'John')
insert into myTest values (2, 'Frank')
insert into myTest values (3, 'Sammy')
insert into myTest values (4, 'Julie')
insert into myTest values (5, 'Mike')

delete from myTest
output deleted.* 
where ID = 5;

drop table myTest

And this will give us an output on the rows we are deleting

output

If we want to save this to a new table we can change our syntax to this

create table myTest (ID int, name varchar(50))

insert into myTest values (1, 'John')
insert into myTest values (2, 'Frank')
insert into myTest values (3, 'Sammy')
insert into myTest values (4, 'Julie')
insert into myTest values (5, 'Mike')

create table myBackup (ID int, name varchar(50))

delete from myTest 
output deleted.* into myBackup
where ID = 5;

select * from myBackup

drop table myTest;
drop table myBackup;

To do the same for updated rows you can use this syntax (saves the data BEFORE the update)

update myTest 
	Set name = 'Willy'
output deleted.* into myBackup
where Id = 5

And to save the new data

update myTest 
	Set name = 'Willy'
output inserted.* into myBackup
where Id = 5

Get first date with sales and last date with sales in MDX

To get the first date with sales and last date with sales we can use the “tail” and the “head” function in MDX. These functions are quite handy if you want to have a default date in your reports.

Here is the code:

WITH MEMBER [Last Date With Sales] AS
  Tail (
    Filter ( [Date].[Calendar].[Date].Members, [Measures].[Internet Sales Amount]  NULL ),
    1
  ).Item ( 0 ) .MEMBER_CAPTION
MEMBER [First Date With Sales] AS
  Head (
    Filter ( [Date].[Calendar].[Date].Members, [Measures].[Internet Sales Amount]  NULL ),
    1
  ).Item ( 0 ) .MEMBER_CAPTION
SELECT
{ [Measures].[First Date With Sales], [Measures].[Last Date With Sales] } ON 0
FROM [Adventure Works]

And the result will look like this

Head and tail

Return cross product of two or more sets in MDX using “CrossJoin”

One of the functions I use almost all the time when I am writing MDX is the crossjoin function. This function returns the cross product of two or more sets in MDX.

Lets look at an example

SELECT [Measures].[Internet Sales Amount] ON 0,
NON EMPTY [Date].[Calendar].[Calendar Year] ON 1
FROM [Adventure Works]

This code will give us sales per year. But what if we want sales per year per city? Lets introduce the crossjoin function!

SELECT [Measures].[Internet Sales Amount] ON 0,
NON EMPTY CrossJoin([Date].[Calendar].[Calendar Year], [Customer].[City].[City])  ON 1
FROM [Adventure Works]ยจ

And that would gives ut the result we want

How about sales per contry per city per year? Then we will have to nest 2 crossjoins like this.

SELECT [Measures].[Internet Sales Amount] ON 0,
NON EMPTY CrossJoin([Customer].[Country].[Country], CrossJoin([Customer].[City].[City], [Date].[Calendar].[Calendar Year]))  ON 1
FROM [Adventure Works]

Instead of crossjoin you can also use a ‘*’. That is the way I ususally write my code

Here is an example on that

SELECT [Measures].[Internet Sales Amount] ON 0,
NON EMPTY [Customer].[Country].[Country] * [Customer].[City].[City] * [Date].[Calendar].[Calendar Year]  ON 1
FROM [Adventure Works]