Saturday, June 30, 2012

Several useful T-SQL Function

1. Using the OVER clause with aggregate functions
The following sample is from BOL:
-------------------------------------
USE AdventureWorks2008R2;
GO
SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
    ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
    ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
    ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN(43659,43664);
-------------------------------------

If using subqueries, the script will be more complicated:
-------------------------------------
SELECT A.SalesOrderID, B.ProductID, B.OrderQty, A.Total, A.Avg, A.Count, A.Min, A.Max
FROM
 (
 SELECT SalesOrderID
  ,SUM(OrderQty) AS 'Total'
  ,AVG(OrderQty) AS 'Avg'
  ,COUNT(OrderQty) AS 'Count'
  ,MIN(OrderQty)  AS 'Min'
  ,MAX(OrderQty) AS 'Max'
 FROM Sales.SalesOrderDetail
 WHERE SalesOrderID IN(43659,43664)
 GROUP BY SalesOrderID
 ) A
 INNER JOIN Sales.SalesOrderDetail B
 ON A.SalesOrderID=B.SalesOrderID
GO
-------------------------------------

However, if the table size is big, you'd better verify the performance of over clause comparing with subqueries first. Based on my testing, the performance of over clause is not as good as subqueries. Here is the execution plan and statistics of both 2 queries:

a) OVER Clause query
 The execution plan uses the table spool to cache the temporary data, and generate aggregation data.



the io statistics reveals there are 3 physical reads and 55 locical reads

b) subqueries

here the execution plan seek the table 2 times.
 Physical reads is the same 3, but there is only 12 logical reads. the sql server elapsed time is only 27ms.

looks like subquery is faster than OVER clause :) , But OVER Clause with Aggregation function makes your script concise and easy to understand.

2. CTE - Common Table Expressions
CTE is another way to make you code concise and efficient.
a) Recursive Queries Using CTE
Returning hierarchical data is a common use of recursive queries, for instance, displaying employees in an organizational chart.  If you are familiar with oracle sqlplus,  you can get the hierarchical data by the "Connect by prior" command just like code below:

--Oracle code--
SELECT employee_id, manager_id, first_name, last_name
FROM employee
START WITH employee_id = 1
CONNECT BY PRIOR employee_id = manager_id;

In SQL Server, you can use the Recursive CTE query get the same result. here is the sample:
-------------------------------------
CREATE TABLE employee (
  employee_id INTEGER,
  manager_id INTEGER,
  first_name VARCHAR(10) NOT NULL,
  last_name VARCHAR(10) NOT NULL,
  title VARCHAR(20),
  salary INT
);
insert into employee (EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY)
values (6,4,'Jane','Brown','Support Person',45000),
(3,2,'Fred','Hobbs','Sales Person',200000),
(4,1,'Susan','Jones','Support Manager',500000),
( 1 ,0, 'James','Smith','CEO',800000),
(2 ,1,'Ron','Johnson','Sales Manager',600000),
(5,2,'Rob' ,'Green','Sales Person', 40000);

;WITH DIRECTREPORT(EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY, LEVEL)
AS
(
 SELECT EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY, 0 AS LEVEL
 FROM employee
 WHERE MANAGER_ID=0

 UNION ALL

 SELECT e.EMPLOYEE_ID, e.MANAGER_ID,e.FIRST_NAME,e.LAST_NAME,e.TITLE,e.SALARY, d.LEVEL+1 AS LEVEL
 FROM employee e INNER JOIN DIRECTREPORT d on e.manager_id=d.EMPLOYEE_ID
)
SELECT EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY, LEVEL
FROM DIRECTREPORT
-------------------------------------

the output is like





 b) Delete duplicate row with CTE
you can not only select data from CTE but also delete data by CTE. Without CTE, deleting duplicated row is not easy, you can find some ways with the KB below
http://support.microsoft.com/kb/139444
http://support.microsoft.com/kb/70956

However, with CTE there is very nice way to implement it. Here is the script:
-------------------------------------
CREATE TABLE dup(a int, b int)
GO
insert into dup(a,b) values(1,1),(1,1),(1,1),(2,2),(2,2),(3,3),(4,4),(4,4),(4,4),(5,5),(5,5)
GO
;WITH dupcte (a,b, row_count)
AS
(
 SELECT a,b,
 ROW_NUMBER() OVER(PARTITION BY a,b ORDER BY a,b) AS row_count
 FROM dup
 )
 DELETE
 FROM dupcte
 WHERE row_count > 1
 GO
 GO
 select * from dup
 GO
 -------------------------------------
 The Final output is like












3. Merge Statement
The T-SQL Merge Statement is simliar with Oracle sqlplus "Merge Into"(study from Oracle? :) ).  It provide you an easy way to impletement:
If find the same data in the destination table then update, or else insert a new row.

Sample code:
 -------------------------------------
--Create the employee table and insert the testing data with the code in 2.a
DECLARE @EMPLOYEE_ID INT, @MANAGER_ID INT, @SALARY INT
DECLARE @FIRST_NAME VARCHAR(10), @LAST_NAME VARCHAR(10), @TITLE VARCHAR(20)
SET @EMPLOYEE_ID=4
SET @MANAGER_ID=1
SET @FIRST_NAME='Susan'
SET @LAST_NAME='Jones'
SET @TITLE='Support Manager'
SET @SALARY=550000

--if find employee then update, or else insert.
MERGE INTO EMPLOYEE AS TARGET
USING (VALUES(@EMPLOYEE_ID, @MANAGER_ID,@FIRST_NAME,@LAST_NAME,@TITLE,@SALARY))
AS SOURCE (EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY)
ON TARGET.EMPLOYEE_ID=Source.EMPLOYEE_ID
WHEN MATCHED THEN
 UPDATE SET SALARY=SOURCE.SALARY
WHEN NOT MATCHED BY TARGET THEN
 INSERT (EMPLOYEE_ID, MANAGER_ID,FIRST_NAME,LAST_NAME,TITLE,SALARY)
 VALUES (@EMPLOYEE_ID, @MANAGER_ID,@FIRST_NAME,@LAST_NAME,@TITLE,@SALARY);
 -------------------------------------

4. GO
Sorry, this is not T-SQL function, "GO" is only a command of sqlcmd and SSMS. why I mention it here is there is a interesting parameter [count] for it.

GO [count]
The batch preceding GO will execute the specified number of times.so if you want to run a simple script multiple times, you can try it.

Saturday, June 23, 2012

Rebuild index with "Alter Index Rebuild" or "DBCC DBREINDEX"

You can rebuild all index for a table with "Alter Index Rebuild" and "DBCC DBREINDEX" .

First, in BOL, for "DBCC DBREINDEX" , it mentions
This feature will be removed in the next version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature. Use ALTER INDEX instead.

Looks like we'd better replace all "DBCC DBREINDEX" script to "Alter Index Rebuild", however there is a little bit difference between their behavior, let's look a sample:


1. Turn on the "auto  create statistics" option for the testing db
ALTER DATABASE [test] SET AUTO_CREATE_STATISTICS ON WITH NO_WAIT
GO

2. Create sample table
create table abc(t1 int, t2 int)
go
declare @i int
set @i = 0
while @i < 20000
begin
     insert into abc(t1, t2) values (@i, @i)
     set @i = @i + 1 
end
create index abcindx on abc(t1)
go


3. Run the select command below, it will create a statistics automatically on column t2
select count(*) from abc where t2> 500 and t2<600

so we will get 2 statistics on the abc table
  • abcindx is created by index abcindx
  • _WA_Sys_00000002_025D5595 is created by SQL Server automatically to optimize the query performance


4. Check the statistics status
dbcc show_statistics ( abc, _WA_Sys_00000002_025D5595)
go
dbcc show_statistics ( abc, abcindx)
go


 5. Update the table
update abc set t1=12000 where t1>12000
update abc set t2=12006 where t2>11000
go
6.  Rebuild index with DBCC DBREINDEX
DBCC DBREINDEX (abc)

7.  Check the Statistics again, both 2 statistics are updated
 8. Drop the table abc, and redo the step 1 to 5

9. Rebuild index with "ALTER INDEX ALL REBUILD"
ALTER INDEX ALL ON dbo.abc rebuild

10. Check the Statistics again, only statistics abcindx is updated, the auto statistics is not updated yet.

 11. In order to update the auto statistics, you need to run
UPDATE STATISTICS abc
12. Drop the table, and redo step 1 and 2

13. Manually create statistics abcstc on column t2
CREATE STATISTICS abcstc on abc(t2)

14. Redo Step 5, update the table

16. Rebuild index with "ALTER INDEX ALL REBUILD"
ALTER INDEX ALL ON dbo.abc rebuild

17. Check the statistics, you will find the statistics abcstc still hasn't been updated.

So DBCC DBREINDEX not only update the indexes, but also update the statistics(manual or auto created), however, ALTER INDEX  only update the statistics which created by index. if you run ALTER INDEX, you'd better run UPDATE STATISTICS as well.

Sunday, June 17, 2012

Shrink transaction log file

Sometimes the shrink transaction log file doesn't work.

Before we troubleshoot it, we need to know how the log file works:

Transaction Log Physical Architecture
http://msdn.microsoft.com/en-us/library/aa174526(v=sql.80).aspx

Here are some tools which can help us investigate the log file issues.
a. DBCC SQLPERF(logspace) : tell you how much transaction log is being used
b. DBCC LOGINFO: undocumented command which can tell you the VLFs (Virtual Log Files) status in the log file. For more infomation, please refer to http://sqlblog.com/blogs/kalen_delaney/archive/2009/12/21/exploring-the-transaction-log-structure.aspx
c. DBCC OPENTRAN: determine whether an open transaction exists within the transaction log
 
Let's create a testing db first
======================================================================
--the initial log file size is 64MB, the autogrew is enabled with 64MB:
CREATE DATABASE [testlog] ON PRIMARY
( NAME = N'testlog', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\master\testlog.mdf' , SIZE = 1048576KB , FILEGROWTH = 524288KB )
LOG ON

( NAME = N'testlog_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\master\testlog_log.ldf' , SIZE = 65536KB , FILEGROWTH = 65536KB )
GO
--create a test table
use testlog
GO

select a.* into testtable from sysobjects a, sysobjects b
GO

--Make a full backup first, it is important, or else sql will reuse the log file space
BACKUP DATABASE [testlog] TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Backup\testlog.bak' WITH NOFORMAT, NOINIT, NAME = N'testlog-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

GO

DBCC LOGINFO
GO
=======================================================================
There are 4 VLF in the log file. Status "2" means that VLF is current active VL

FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
25
2
64
0
2
16711680
16719872
0
0
0
0
2
16711680
33431552
0
0
0
0
2
16965632
50143232
0
0
0
0

let's insert some data:
=======================================================================
insert into testtable
select a.* from sysobjects a, sysobjects b,sysobjects c
GO

dbcc loginfo
GO
=======================================================================
the output of the "DBCC LOGINFO" like

FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
25
2
64
0
2
16711680
16719872
26
2
64
0
2
16711680
33431552
27
2
64
0
2
16965632
50143232
0
0
0
0
2
16777216
67108864
0
0
0
27000000325600317

2
16777216
83886080
0
0
0
27000000325600317

2
16777216
100663296
0
0
0
27000000325600317

2
16777216
117440512
0
0
0
27000000325600317
 
The log file grew to 128MB now, the last active VLF is FSeqNo 27, so only 48MB(16MB x 3) log space was used. looks like the log file growth bases on the estimate of the transaction size, sometime the estimate is not accurate. ok, let's start our testing.

Create a active transaction, then insert some data:
====================================================================
--open a session and run
begin tran
insert into testtable
select a.* from sysobjects a

--open another session and run
insert into testtable
select a.* from sysobjects a, sysobjects b,sysobjects c
GO

DBCC LOGINFO
GO
====================================================================
the output of "DBCC LOGINFO" is

FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
25
2
64
0
2
16711680
16719872
26
2
64
0
2
16711680
33431552
27
2
64
0
2
16965632
50143232
28
2
64
0
2
16777216
67108864
29
2
64
27000000325600317
2
16777216
83886080
0
0
0
27000000325600317
2
16777216
100663296
0
0
0
27000000325600317
2
16777216
117440512
0
0
0
27000000325600317

the last active VLF is 29 now. let's backup the log file:

BACKUP LOG [testlog] TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Backup\testlog.bak' WITH NOFORMAT, NOINIT, NAME = N'testlog-Transaction Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10

GO

DBCC LOGINFO
GO

the output is
FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
25
0
64
0
2
16711680
16719872
26
0
64
0
2
16711680
33431552
27
2
64
0
2
16965632
50143232
28
2
64
0
2
16777216
67108864
29
2
64
27000000325600317
2
16777216
83886080
0
0
0
27000000325600317
2
16777216
100663296
0
0
0
27000000325600317
2
16777216
117440512
0
0
0
27000000325600317

because we have a active tran open in VLF FSeqNo 27, so the bakcup log only truncate the VLF prior to FSeqNo 27. if you try to shrink the log file back to 64MB,

DBCC SHRINKFILE(2, 64)

the output is
DbId
FileId
CurrentSize
MinimumSize
UsedPages
EstimatedPages
7
2
10240
8192
10240
8192

The currentsize is 80MB=10240(pages) * 8 / 1024, so the shrinkfile only remove the last 3 VLF, run

DBCC LOGINFO

FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
30
2
128
0
2
16711680
16719872
26
0
64
0
2
16711680
33431552
27
2
64
0
2
16965632
50143232
28
2
64
0
2
16777216
67108864
29
2
64
27000000325600317

is correct! In order to find out which transaction is still active, we can run :

DBCC OPENTRAN

Transaction information for database 'testlog'.
Oldest active transaction:
SPID (server process ID): 53
UID (user ID) : -1
Name : user_transaction
LSN : (27:4861:1)
Start time : Jun 17 2012 12:37:02:740PM
SID : 0x0105000000000005150000005d28f57fd53ad8354354e02a75841400

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Based on the output we can get the spid is 53, the transaction type is user_transaction, we can also get what spid 53 is running by :

DBCC INPUTBUFFER (53)

the output is like
EventType
Parameters
EventInfo
Language Event
0
begin tran insert into testtable select a.* from sysobjects a

so active transaction will stop the log file shrik working. let's continoue our testing.

Commit the transaction first, and insert more data
--run in the session which you create the transaction
commit tran
--insert more data
insert into testtable
select a.* from sysobjects a, sysobjects b,sysobjects c
GO

DBCC LOGINFO
GO

ok, the log file size grew again
FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
30
2
128
0
2
16711680
16719872
31
2
128
0
2
16711680
33431552
27
2
64
0
2
16965632
50143232
28
2
64
0
2
16777216
67108864
29
2
64
27000000325600317
2
16777216
83886080
32
2
64
31000000097600316

2
16777216
100663296
0
0
0
31000000097600316

2
16777216
117440512
0
0
0
31000000097600316

2
16777216
134217728
0
0
0
31000000097600316

compare with the previouse output, the original VLF FSeqNo 26 is reused(because its status was 0), the current number of it is 31, because FSeqNo 27,28,29 haven't been backed up(status is 2), so they can not be used. let's backup the log file again:
==================================================================
BACKUP LOG [testlog] TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Backup\testlog.bak' WITH NOFORMAT, NOINIT, NAME = N'testlog-Transaction Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO

DBCC LOGINFO
GO
==================================================================

FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
30
0
128
0
2
16711680
16719872
31
0
128
0
2
16711680
33431552
27
0
64
0
2
16965632
50143232
28
0
64
0
2
16777216
67108864
29
0
64
27000000325600317
2
16777216
83886080
32
2
64
31000000097600316
2
16777216
100663296
0
0
0
31000000097600316
2
16777216
117440512
0
0
0
31000000097600316
2
16777216
134217728
0
0
0
31000000097600316


OK, this time only VLF FSeqNo 32 is active(status = 2), that's cool!, let's check the log file usage:














wow! 98% free space, only 3MB space is used. can we shrink it to initial size 64MB now? let's try

DBCC SHRINKFILE(2, 64)
GO

DbId
FileId
CurrentSize
MinimumSize
UsedPages
EstimatedPages
7
2
12288
8192
12288
8192

the log file size is 96MB = 12288 * 8 / 1024.....

So DBCC SHRINKFILE can only truncate the VLF from last active VLF to the end, even if you have more free space in the log file, you can not shrink them.

if you really need to shrink the file, you need to move the active VLF to the beginning of the log file, create a dummy table and insert data into it, if the transaction is big enough to move the active VLF to the beginning of the log file, then backup the log. here is the VLF status you will get

--generate big transaction
--run the insert several times until the active VLF move to the beginning of the log file
insert into testtable
select a.* from sysobjects a, sysobjects b,sysobjects c

--backup the log file
BACKUP LOG [testlog] TO DISK = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Backup\testlog.bak' WITH NOFORMAT, NOINIT, NAME = N'testlog-Transaction Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO

DBCC LOGINFO
GO

FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
40
2
128
0
2
16711680
16719872
36
0
64
0
2
16711680
33431552
38
0
64
0
2
16965632
50143232
39
0
64
0
2
16777216
67108864
37
0
128
27000000325600317
2
16777216
83886080
32
0
64
31000000097600316
cool! the active VLF has been moved to the beginning of the log file.
let's try shrink file again

DBCC SHRINKFILE(2, 64)
GO

DbId
FileId
CurrentSize
MinimumSize
UsedPages
EstimatedPages
7
2
8192
8192
8192
8192

DBCC LOGINFO
GO
FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
40
2
128
0
2
16711680
16719872
36
0
64
0
2
16711680
33431552
38
0
64
0
2
16965632
50143232
39
0
64
0

well, the log file size is 64MB now :)
can I shrink the log file to the size which is smaller than initial size? Yes, you can.
let's run

DBCC SHRINKFILE(2, 32)
GO

DbId
FileId
CurrentSize
MinimumSize
UsedPages
EstimatedPages
7
2
6121
6121
6120
6120

DBCC LOGINFO
GO

FileId
FileSize
StartOffset
FSeqNo
Status
Parity
CreateLSN
2
16711680
8192
40
2
128
0
2
16711680
16719872
36
0
64
0
2
16711680
33431552
38
0
64
0

We have 3 VLF, 48MB now. However, you can not shrink the log file smaller than 2 VLF, in our case, the smallest size of log file is 16MB * 2 = 32MB.

As a summary
1. Active transaction.

we can use "DBCC OPENTRAN" find out it, and use "DBCC INPUTBUFFER" to check the query. Sometimes replication will cause active tran issue.

As I remember, once I restored a database backup which was taken when the database was under transaction replication, Log Reader agent maintains pointer in the Transaction Log for tracking the changes in the database. so the log file can not be shrink, but we can use the command below to remove it

EXEC sp_repldone @xactid = NULL, @xact_segno = NULL, @numtrans = 0, @time = 0, @reset = 1
this command can only be used when the database is not in replication.

 2. Active VLF is not at the beginning of the log file
backup log file first, then create a dummy table, generate a big transaction to move the active VLF to the beginning of the log file.

3. if log file keeps growing, check the sys.databases table
select log_reuse_wait_desc from sys.databases where name='dbname'

4. for the optimize VLF size, create the initial log file as 8GB, then extend to 16GB, then 24 GB and so forth,
http://www.sqlskills.com/BLOGS/KIMBERLY/post/Transaction-Log-VLFs-too-many-or-too-few.aspx

too many VLF will degrade the log performance.

5. Use "DBCC SQLPERF (LOGSPACE)" and "DBCC LOGINFO" to check the log file status.