Tuesday, May 15, 2012

Ad hoc query optimization in SQL Server

When ad hoc queries are executed in sql server, if it is executed without parameters, and it is simple,  SQL Server parameterizes the query internally to increase the possibility of matching it against an existing execution plan, that's called "Simple Parameterization"

For instance:

--clear plan cache first
DBCC FREEPROCCACHE

--run first ad hoc query
select * from dbo.Sales where SalesID= 567

--check query plan

WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
COALESCE(DB_NAME(p.dbid), p.query_plan.value('(//RelOp/OutputList/
ColumnReference/@Database)[1]',
'nvarchar(128)')) AS DatabaseName
,DB_NAME(p.dbid) + '.' + OBJECT_SCHEMA_NAME(p.objectid, p.dbid) + '.' +
OBJECT_NAME(p.objectid, p.dbid) AS ObjectName
,cp.objtype
,cp.cacheobjtype
,p.query_plan
,cp.UseCounts
,cp.plan_handle
,cp.size_in_bytes
,CAST('<?query --' + CHAR(13) + q.text + CHAR(13) + '--?>' AS xml)
AS SQLText
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) as q
ORDER BY DatabaseName, UseCounts DESC


you can find 2 rows below, one is the ad "hoc query" object, the other is the "
Prepared statement"

the "Prepared statement" is created by sql server Simple Parameterization. then if run a new ad hoc statment below

--run second ad hoc query
select * from dbo.Sales where SalesID= 123

This time there are 2 "ad hoc" objects, but only 1 "Prepared statement" object whose "UseCounts" is 2 now. Because of this simple parameterization, SQL Server recognizes that the following two statements generate essentially the same execution plan and reuses the first plan for the second statement.

Under the default behavior of simple parameterization, SQL Server parameterizes a relatively small class of queries. However, you can specify that all queries in a database be parameterized, subject to certain limitations, by setting the PARAMETERIZATION option of the ALTER DATABASE command to FORCED

ALTER DATABASE [dbname] set PARAMETERIZATION  FORCED

Forced Parameterization is not appropriate in all enviroments and scenarios. It is recommanded that you use it only for a very high volumne of concurrent queries, and when you are seeing hight CPU from a lot of compilation/recompilation.you can monitor the perfmon sql statistics below
• SQL Server: SQL Statistics: Batch Requests/sec
• SQL Server: SQL Statistics: SQL Compilations/sec
• SQL Server: SQL Statistics: SQL Recompilations/sec
Ideally, the ratio of SQL Recompilations/sec to Batch Requests/sec should be very low.

let's try another ad hoc statment below

--clear plan cache first
DBCC FREEPROCCACHE

--run third ad hoc query
select * from dbo.Sales where SalesID= 4444 or CustomerID=903 and CustomerID=17541

As this ad hoc statment is complicated, so simple parameterization doesn't work this time, there is only a "ad hoc" object, No "Prepared statement" object. Then we enable the "Forced Parameterization", and redo the test

 ALTER DATABASE [dbname] set PARAMETERIZATION FORCED
--clear plan cache first
DBCC FREEPROCCACHE

--run fourth ad hoc query
select * from dbo.Sales where SalesID= 4444 or CustomerID=903 and CustomerID=17541

 you can find the "Prepared statement" object created. if you run similiar statment below, the execution plan can be reused.

select * from dbo.Sales where SalesID= 333 or CustomerID=123 and CustomerID=345

Although we can eliminate the execution plan Recompilations of the ad hoc query by enable "Forced Parameterization", every time you run the ad hoc query, there is a new "ad hoc"object created, and it eat up your memory.   there is a system parameter "optimize for ad hoc workloads" which can save the memory usage of ad hoc statement. let run the testing below:

--enable the parameter
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'optimize for ad hoc workloads', 1
GO
RECONFIGURe with override
GO

--clear plan cache first
DBCC FREEPROCCACHE

--run fifth ad hoc query
select * from dbo.Sales where SalesID= 567

then check the plan cache, there is only "Compiled Plan Stub" object instead of "Compiled Plan".
So when this option is set to 1, the Database Engine stores a small compiled plan stub in the plan cache when a batch is compiled for the first time, instead of the full compiled plan. This helps to relieve memory pressure by not allowing the plan cache to become filled with compiled plans that are not reused.

let's run the same query again
--run sixth ad hoc query
select * from dbo.Sales where SalesID= 567

this time the "Compiled Plan Stub" become to "Compiled Plan". So when the batch is invoked (compiled or executed) again, the Database Engine compiles the batch, removes the compiled plan stub from the plan cache, and adds the full compiled plan to the plan cache.

Brett Hawton has a query which can help you determin if you need to use 'optimize for ad hoc workloads'
-- Do not run this TSQL until SQL Server has been running for at least 3 hours
SET NOCOUNT ON
SELECT objtype AS [Cache Store Type],
        COUNT_BIG(*) AS [Total Num Of Plans],
        SUM(CAST(size_in_bytes as decimal(14,2))) / 1048576 AS [Total Size In MB],
        AVG(usecounts) AS [All Plans - Ave Use Count],
        SUM(CAST((CASE WHEN usecounts = 1 THEN size_in_bytes ELSE 0 END) as decimal(14,2)))/ 1048576 AS [Size in MB of plans with a Use count = 1],
        SUM(CASE WHEN usecounts = 1 THEN 1 ELSE 0 END) AS [Number of of plans with a Use count = 1]
        FROM sys.dm_exec_cached_plans
        GROUP BY objtype
        ORDER BY [Size in MB of plans with a Use count = 1] DESC
DECLARE @AdHocSizeInMB decimal (14,2), @TotalSizeInMB decimal (14,2)
SELECT @AdHocSizeInMB = SUM(CAST((CASE WHEN usecounts = 1 AND LOWER(objtype) = 'adhoc' THEN size_in_bytes ELSE 0 END) as decimal(14,2))) / 1048576,
        @TotalSizeInMB = SUM (CAST (size_in_bytes as decimal (14,2))) / 1048576
        FROM sys.dm_exec_cached_plans
SELECT @AdHocSizeInMB as [Current memory occupied by adhoc plans only used once (MB)],
         @TotalSizeInMB as [Total cache plan size (MB)],
         CAST((@AdHocSizeInMB / @TotalSizeInMB) * 100 as decimal(14,2)) as [% of total cache plan occupied by adhoc plans only used once]
IF  @AdHocSizeInMB > 200 or ((@AdHocSizeInMB / @TotalSizeInMB) * 100) > 25  -- 200MB or > 25%
        SELECT 'Switch on Optimize for ad hoc workloads as it will make a significant difference' as [Recommendation]
ELSE
        SELECT 'Setting Optimize for ad hoc workloads will make little difference' as [Recommendation]
GO

3 comments:

  1. Thank you for this excellent article. I have read some other stuff basically recommending to turn on adhoc optimization regardless. However, after running your query against my SP2010 environment, only 20MB (of 5000MB) was actually in use by single use query plans. From reading your article I get the sense that there might in fact be a slight negative impact to performance from having this optimization turned on in such cases. What is your view?

    ReplyDelete
    Replies
    1. Hi Claude, you are correct. 'optimize for ad hoc workloads' can relieve the memory pressure, but adhoc queries need to be complied 2 times before their full compiled plan are cached.

      As to "Forced Parameterization", SQL Book Online says:

      "Forced parameterization, in effect, changes the literal constants in a query to parameters when compiling a query. Therefore, the query optimizer might choose suboptimal plans for queries. In particular, the query optimizer is less likely to match the query to an indexed view or an index on a computed column. "

      "Generally, the PARAMETERIZATION FORCED option should only be used by experienced database administrators after determining that doing this does not adversely affect performance."

      Delete