James' SQL Footprint: June 2012

Saturday, June 30, 2012

Several useful T-SQL Function

1. Using the OVER clause with aggregate functions
The following sample is from BOL:
-------------------------------------
USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
    ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
    ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
    ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN(43659,43664);
-------------------------------------

If using subqueries, the script will be more complicated:
-------------------------------------
SELECT A.SalesOrderID, B.ProductID, B.OrderQty, A.Total, A.Avg, A.Count, A.Min, A.Max
FROM
(
SELECT SalesOrderID
  ,SUM(OrderQty) AS 'Total'
  ,AVG(OrderQty) AS 'Avg'
  ,COUNT(OrderQty) AS 'Count'
  ,MIN(OrderQty) AS 'Min'
  ,MAX(OrderQty) AS 'Max'
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN(43659,43664)
GROUP BY SalesOrderID
) A
INNER JOIN Sales.SalesOrderDetail B
ON A.SalesOrderID=B.SalesOrderID
GO
-------------------------------------

However, if the table size is big, you'd better verify the performance of over clause comparing with subqueries first. Based on my testing, the performance of over clause is not as good as subqueries. Here is the execution plan and statistics of both 2 queries:

a) OVER Clause query

The execution plan uses the table spool to cache the temporary data, and generate aggregation data.

the io statistics reveals there are 3 physical reads and 55 locical reads

b) subqueries

here the execution plan seek the table 2 times.

Physical reads is the same 3, but there is only 12 logical reads. the sql server elapsed time is only 27ms.

looks like subquery is faster than OVER clause :) , But OVER Clause with Aggregation function makes your script concise and easy to understand.

2. CTE - Common Table Expressions
CTE is another way to make you code concise and efficient.
a) Recursive Queries Using CTE
Returning hierarchical data is a common use of recursive queries, for instance, displaying employees in an organizational chart. If you are familiar with oracle sqlplus, you can get the hierarchical data by the "Connect by prior" command just like code below:

--Oracle code--
SELECT employee_id, manager_id, first_name, last_name
FROM employee
START WITH employee_id = 1
CONNECT BY PRIOR employee_id = manager_id;

In SQL Server, you can use the Recursive CTE query get the same result. here is the sample:
-------------------------------------
CREATE TABLE employee (
employee_id INTEGER,
manager_id INTEGER,
first_name VARCHAR(10) NOT NULL,
last_name VARCHAR(10) NOT NULL,
title VARCHAR(20),
salary INT
);
insert into employee (EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY)
values (6,4,'Jane','Brown','Support Person',45000),
(3,2,'Fred','Hobbs','Sales Person',200000),
(4,1,'Susan','Jones','Support Manager',500000),
( 1 ,0, 'James','Smith','CEO',800000),
(2 ,1,'Ron','Johnson','Sales Manager',600000),
(5,2,'Rob' ,'Green','Sales Person', 40000);

;WITH DIRECTREPORT(EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY, LEVEL)
AS
(
SELECT EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY, 0 AS LEVEL
FROM employee
WHERE MANAGER_ID=0

UNION ALL

SELECT e.EMPLOYEE_ID, e.MANAGER_ID,e.FIRST_NAME,e.LAST_NAME,e.TITLE,e.SALARY, d.LEVEL+1 AS LEVEL
FROM employee e INNER JOIN DIRECTREPORT d on e.manager_id=d.EMPLOYEE_ID
)
SELECT EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY, LEVEL
FROM DIRECTREPORT
-------------------------------------

the output is like

b) Delete duplicate row with CTE
you can not only select data from CTE but also delete data by CTE. Without CTE, deleting duplicated row is not easy, you can find some ways with the KB below
http://support.microsoft.com/kb/139444
http://support.microsoft.com/kb/70956

However, with CTE there is very nice way to implement it. Here is the script:
-------------------------------------
CREATE TABLE dup(a int, b int)
GO
insert into dup(a,b) values(1,1),(1,1),(1,1),(2,2),(2,2),(3,3),(4,4),(4,4),(4,4),(5,5),(5,5)
GO
;WITH dupcte (a,b, row_count)
AS
(
SELECT a,b,
ROW_NUMBER() OVER(PARTITION BY a,b ORDER BY a,b) AS row_count
FROM dup
)
DELETE
FROM dupcte
WHERE row_count > 1
GO
GO
select * from dup
GO
-------------------------------------
The Final output is like

3. Merge Statement
The T-SQL Merge Statement is simliar with Oracle sqlplus "Merge Into"(study from Oracle? :) ). It provide you an easy way to impletement:
If find the same data in the destination table then update, or else insert a new row.

Sample code:
-------------------------------------
--Create the employee table and insert the testing data with the code in 2.a
DECLARE @EMPLOYEE_ID INT, @MANAGER_ID INT, @SALARY INT
DECLARE @FIRST_NAME VARCHAR(10), @LAST_NAME VARCHAR(10), @TITLE VARCHAR(20)
SET @EMPLOYEE_ID=4
SET @MANAGER_ID=1
SET @FIRST_NAME='Susan'
SET @LAST_NAME='Jones'
SET @TITLE='Support Manager'
SET @SALARY=550000

--if find employee then update, or else insert.
MERGE INTO EMPLOYEE AS TARGET
USING (VALUES(@EMPLOYEE_ID, @MANAGER_ID,@FIRST_NAME,@LAST_NAME,@TITLE,@SALARY))
AS SOURCE (EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY)
ON TARGET.EMPLOYEE_ID=Source.EMPLOYEE_ID
WHEN MATCHED THEN
UPDATE SET SALARY=SOURCE.SALARY
WHEN NOT MATCHED BY TARGET THEN
INSERT (EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY)
VALUES (@EMPLOYEE_ID, @MANAGER_ID,@FIRST_NAME,@LAST_NAME,@TITLE,@SALARY);
-------------------------------------

4. GO
Sorry, this is not T-SQL function, "GO" is only a command of sqlcmd and SSMS. why I mention it here is there is a interesting parameter [count] for it.

GO [count]

The batch preceding GO will execute the specified number of times.so if you want to run a simple script multiple times, you can try it.

Saturday, June 23, 2012

Rebuild index with "Alter Index Rebuild" or "DBCC DBREINDEX"

You can rebuild all index for a table with "Alter Index Rebuild" and "DBCC DBREINDEX" .

First, in BOL, for "DBCC DBREINDEX" , it mentions
This feature will be removed in the next version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature. Use ALTER INDEX instead.

Looks like we'd better replace all "DBCC DBREINDEX" script to "Alter Index Rebuild", however there is a little bit difference between their behavior, let's look a sample:

1. Turn on the "auto create statistics" option for the testing db
ALTER DATABASE [test] SET AUTO_CREATE_STATISTICS ON WITH NO_WAIT
GO

2. Create sample table
create table abc(t1 int, t2 int)
go
declare @i int
set @i = 0
while @i < 20000
begin
insert into abc(t1, t2) values (@i, @i)
set @i = @i + 1
end
create index abcindx on abc(t1)
go

3. Run the select command below, it will create a statistics automatically on column t2
select count(*) from abc where t2> 500 and t2<600

so we will get 2 statistics on the abc table

abcindx is created by index abcindx
_WA_Sys_00000002_025D5595 is created by SQL Server automatically to optimize the query performance

4. Check the statistics status
dbcc show_statistics ( abc, _WA_Sys_00000002_025D5595)
go
dbcc show_statistics ( abc, abcindx)
go

5. Update the table
update abc set t1=12000 where t1>12000
update abc set t2=12006 where t2>11000
go
6. Rebuild index with DBCC DBREINDEX
DBCC DBREINDEX (abc)

7. Check the Statistics again, both 2 statistics are updated

8. Drop the table abc, and redo the step 1 to 5

9. Rebuild index with "ALTER INDEX ALL REBUILD"
ALTER INDEX ALL ON dbo.abc rebuild

10. Check the Statistics again, only statistics abcindx is updated, the auto statistics is not updated yet.

11. In order to update the auto statistics, you need to run
UPDATE STATISTICS abc
12. Drop the table, and redo step 1 and 2

13. Manually create statistics abcstc on column t2
CREATE STATISTICS abcstc on abc(t2)

14. Redo Step 5, update the table

16. Rebuild index with "ALTER INDEX ALL REBUILD"
ALTER INDEX ALL ON dbo.abc rebuild

17. Check the statistics, you will find the statistics abcstc still hasn't been updated.

So DBCC DBREINDEX not only update the indexes, but also update the statistics(manual or auto created), however, ALTER INDEX only update the statistics which created by index. if you run ALTER INDEX, you'd better run UPDATE STATISTICS as well.

Sunday, June 17, 2012

Shrink transaction log file

Sometimes the shrink transaction log file doesn't work.

Before we troubleshoot it, we need to know how the log file works:

Transaction Log Physical Architecture

http://msdn.microsoft.com/en-us/library/aa174526(v=sql.80).aspx

Here are some tools which can help us investigate the log file issues.

a. DBCC SQLPERF(logspace) : tell you how much transaction log is being used
b. DBCC LOGINFO: undocumented command which can tell you the VLFs (Virtual Log Files) status in the log file. For more infomation, please refer to http://sqlblog.com/blogs/kalen_delaney/archive/2009/12/21/exploring-the-transaction-log-structure.aspx
c. DBCC OPENTRAN: determine whether an open transaction exists within the transaction log

Let's create a testing db first

======================================================================

--the initial log file size is 64MB, the autogrew is enabled with 64MB:

CREATE DATABASE [testlog] ON PRIMARY
( NAME = N'testlog', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\master\testlog.mdf' , SIZE = 1048576KB , FILEGROWTH = 524288KB )
LOG ON

( NAME = N'testlog_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\master\testlog_log.ldf' , SIZE = 65536KB , FILEGROWTH = 65536KB )

GO
--create a test table
use testlog
GO

select a.* into testtable from sysobjects a, sysobjects b

--Make a full backup first, it is important, or else sql will reuse the log file space

BACKUP DATABASE [testlog] TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Backup\testlog.bak' WITH NOFORMAT, NOINIT, NAME = N'testlog-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

DBCC LOGINFO

=======================================================================

There are 4 VLF in the log file. Status "2" means that VLF is current active VL

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	25	2	64	0
2	16711680	16719872	0	0	0	0
2	16711680	33431552	0	0	0	0
2	16965632	50143232	0	0	0	0

let's insert some data:

=======================================================================

insert into testtable
select a.* from sysobjects a, sysobjects b,sysobjects c
GO

dbcc loginfo

GO
=======================================================================
the output of the "DBCC LOGINFO" like

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	25	2	64	0
2	16711680	16719872	26	2	64	0
2	16711680	33431552	27	2	64	0
2	16965632	50143232	0	0	0	0
2	16777216	67108864	0	0	0	27000000325600317
2	16777216	83886080	0	0	0	27000000325600317
2	16777216	100663296	0	0	0	27000000325600317
2	16777216	117440512	0	0	0	27000000325600317

The log file grew to 128MB now, the last active VLF is FSeqNo 27, so only 48MB(16MB x 3) log space was used. looks like the log file growth bases on the estimate of the transaction size, sometime the estimate is not accurate. ok, let's start our testing.

Create a active transaction, then insert some data:

====================================================================
--open a session and run
begin tran
insert into testtable
select a.* from sysobjects a

--open another session and run
insert into testtable
select a.* from sysobjects a, sysobjects b,sysobjects c
GO

DBCC LOGINFO

GO
====================================================================
the output of "DBCC LOGINFO" is

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	25	2	64	0
2	16711680	16719872	26	2	64	0
2	16711680	33431552	27	2	64	0
2	16965632	50143232	28	2	64	0
2	16777216	67108864	29	2	64	27000000325600317
2	16777216	83886080	0	0	0	27000000325600317
2	16777216	100663296	0	0	0	27000000325600317
2	16777216	117440512	0	0	0	27000000325600317

the last active VLF is 29 now. let's backup the log file:

BACKUP LOG [testlog] TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Backup\testlog.bak' WITH NOFORMAT, NOINIT, NAME = N'testlog-Transaction Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

DBCC LOGINFO

GO

the output is

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	25	0	64	0
2	16711680	16719872	26	0	64	0
2	16711680	33431552	27	2	64	0
2	16965632	50143232	28	2	64	0
2	16777216	67108864	29	2	64	27000000325600317
2	16777216	83886080	0	0	0	27000000325600317
2	16777216	100663296	0	0	0	27000000325600317
2	16777216	117440512	0	0	0	27000000325600317

because we have a active tran open in VLF FSeqNo 27, so the bakcup log only truncate the VLF prior to FSeqNo 27. if you try to shrink the log file back to 64MB,

DBCC SHRINKFILE(2, 64)

the output is

DbId	FileId	CurrentSize	MinimumSize	UsedPages	EstimatedPages
7	2	10240	8192	10240	8192

The currentsize is 80MB=10240(pages) * 8 / 1024, so the shrinkfile only remove the last 3 VLF, run

DBCC LOGINFO

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	30	2	128	0
2	16711680	16719872	26	0	64	0
2	16711680	33431552	27	2	64	0
2	16965632	50143232	28	2	64	0
2	16777216	67108864	29	2	64	27000000325600317

is correct! In order to find out which transaction is still active, we can run :

DBCC OPENTRAN

Transaction information for database 'testlog'.

Oldest active transaction:
SPID (server process ID): 53
UID (user ID) : -1
Name : user_transaction
LSN : (27:4861:1)
Start time : Jun 17 2012 12:37:02:740PM
SID : 0x0105000000000005150000005d28f57fd53ad8354354e02a75841400

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Based on the output we can get the spid is 53, the transaction type is user_transaction, we can also get what spid 53 is running by :

DBCC INPUTBUFFER (53)

the output is like

EventType	Parameters	EventInfo
Language Event	0	begin tran insert into testtable select a.* from sysobjects a

so active transaction will stop the log file shrik working. let's continoue our testing.

Commit the transaction first, and insert more data

--run in the session which you create the transaction
commit tran

--insert more data

insert into testtable
select a.* from sysobjects a, sysobjects b,sysobjects c
GO

DBCC LOGINFO

ok, the log file size grew again

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	30	2	128	0
2	16711680	16719872	31	2	128	0
2	16711680	33431552	27	2	64	0
2	16965632	50143232	28	2	64	0
2	16777216	67108864	29	2	64	27000000325600317
2	16777216	83886080	32	2	64	31000000097600316
2	16777216	100663296	0	0	0	31000000097600316
2	16777216	117440512	0	0	0	31000000097600316
2	16777216	134217728	0	0	0	31000000097600316

compare with the previouse output, the original VLF FSeqNo 26 is reused(because its status was 0), the current number of it is 31, because FSeqNo 27,28,29 haven't been backed up(status is 2), so they can not be used. let's backup the log file again:

==================================================================
BACKUP LOG [testlog] TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Backup\testlog.bak' WITH NOFORMAT, NOINIT, NAME = N'testlog-Transaction Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

DBCC LOGINFO

GO
==================================================================

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	30	0	128	0
2	16711680	16719872	31	0	128	0
2	16711680	33431552	27	0	64	0
2	16965632	50143232	28	0	64	0
2	16777216	67108864	29	0	64	27000000325600317
2	16777216	83886080	32	2	64	31000000097600316
2	16777216	100663296	0	0	0	31000000097600316
2	16777216	117440512	0	0	0	31000000097600316
2	16777216	134217728	0	0	0	31000000097600316

OK, this time only VLF FSeqNo 32 is active(status = 2), that's cool!, let's check the log file usage:

wow! 98% free space, only 3MB space is used. can we shrink it to initial size 64MB now? let's try

DBCC SHRINKFILE(2, 64)

DbId	FileId	CurrentSize	MinimumSize	UsedPages	EstimatedPages
7	2	12288	8192	12288	8192

the log file size is 96MB = 12288 * 8 / 1024.....

So DBCC SHRINKFILE can only truncate the VLF from last active VLF to the end, even if you have more free space in the log file, you can not shrink them.

if you really need to shrink the file, you need to move the active VLF to the beginning of the log file, create a dummy table and insert data into it, if the transaction is big enough to move the active VLF to the beginning of the log file, then backup the log. here is the VLF status you will get

--generate big transaction

--run the insert several times until the active VLF move to the beginning of the log file

insert into testtable

select a.* from sysobjects a, sysobjects b,sysobjects c

--backup the log file

DBCC LOGINFO

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	40	2	128	0
2	16711680	16719872	36	0	64	0
2	16711680	33431552	38	0	64	0
2	16965632	50143232	39	0	64	0
2	16777216	67108864	37	0	128	27000000325600317
2	16777216	83886080	32	0	64	31000000097600316

cool! the active VLF has been moved to the beginning of the log file.
let's try shrink file again

DBCC SHRINKFILE(2, 64)
GO

DbId	FileId	CurrentSize	MinimumSize	UsedPages	EstimatedPages
7	2	8192	8192	8192	8192

DBCC LOGINFO

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	40	2	128	0
2	16711680	16719872	36	0	64	0
2	16711680	33431552	38	0	64	0
2	16965632	50143232	39	0	64	0

well, the log file size is 64MB now :)

can I shrink the log file to the size which is smaller than initial size? Yes, you can.
let's run

DBCC SHRINKFILE(2, 32)

DbId	FileId	CurrentSize	MinimumSize	UsedPages	EstimatedPages
7	2	6121	6121	6120	6120

DBCC LOGINFO

FileId	FileSize	StartOffset	FSeqNo	Status	Parity	CreateLSN
2	16711680	8192	40	2	128	0
2	16711680	16719872	36	0	64	0
2	16711680	33431552	38	0	64	0

We have 3 VLF, 48MB now. However, you can not shrink the log file smaller than 2 VLF, in our case, the smallest size of log file is 16MB * 2 = 32MB.

As a summary

1. Active transaction.

we can use "DBCC OPENTRAN" find out it, and use "DBCC INPUTBUFFER" to check the query. Sometimes replication will cause active tran issue.

As I remember, once I restored a database backup which was taken when the database was under transaction replication, Log Reader agent maintains pointer in the Transaction Log for tracking the changes in the database. so the log file can not be shrink, but we can use the command below to remove it

EXEC sp_repldone @xactid = NULL, @xact_segno = NULL, @numtrans = 0, @time = 0, @reset = 1

this command can only be used when the database is not in replication.

2. Active VLF is not at the beginning of the log file

backup log file first, then create a dummy table, generate a big transaction to move the active VLF to the beginning of the log file.

3. if log file keeps growing, check the sys.databases table

select log_reuse_wait_desc from sys.databases where name='dbname'

4. for the optimize VLF size, create the initial log file as 8GB, then extend to 16GB, then 24 GB and so forth,

http://www.sqlskills.com/BLOGS/KIMBERLY/post/Transaction-Log-VLFs-too-many-or-too-few.aspx

too many VLF will degrade the log performance.

5. Use "DBCC SQLPERF (LOGSPACE)" and "DBCC LOGINFO" to check the log file status.