Thursday, January 23, 2014

Error : Property Size is not available for Database "[tempdb]"

Today when I clicked Properties of tempdb in SSMS, I got the error message below:

At first I thought the database has size problem, so I extended the file size of tempdb, but I was still not able to see the properties of tempdb, and I confirmed that the there was enough free space in tempdb.
Then I checked the sql errorlog file, there was no error message. Sql server looks good.

Then I run profiler trace, and monitor the whole process when I clicked "Properties" menu. unfortunately, there was still no error message regarding tempdb.

I thought it might be a client problem, not sql server problem, so I enabled the "User Error Message", this time I got an error message: "Lock request time out period exceed"

the last script showed in the pic upper was the problem script. It was blocked by other session, so the lock request timeout, then SSMS raised up the error message Property Size is not available for Database "[tempdb]"

Obviously it was a simple blocking issue. next step is trying to find out the blocking script. I reproduced the error message, at the same time, checking the blocking script

so the blocking session is 79, and wait resource is key 2:327680.  I got the table and index name with the script below:

and session 79 had xlock on the key 2:327680

but why SSMS queried table sysrowsets when I clicked "properties" menu? by checking the timeout query, it will retrieve data from sys.partitions table, and sys.partitions table is a view based on sys.sysrowsets

as to why session 79 had xlock on sys.sysrowsets ,  it was running a very big transaction involving some temp tables and some complicated queries, and I can not change the code of that application, so the only way was waiting it finish, but at least I know the tempdb is fine and sql server is runnning well, I can totally ignore this error :)

Friday, November 15, 2013

Use application lock to synchronize T-SQL code

Application lock can be used to synchronize T-SQL code, you can control if the session is able to run the t-sql code at same time, it is like the lock keyword in C#.

1. Get lock
you can get application lock with store procedure sp_getapplock, which must be run from within a transaction.

EXEC @res = sp_getapplock @Resource = 'mylock', @LockMode = 'shared';
select @res

2. Release lock
EXEC sp_releaseapplock @Resource = 'mylock';

Here is example
1. Start a session and run the code below
--spid 52
EXEC @res = sp_getapplock @Resource = 'mylock', @LockMode = 'exclusive';
select @res

2. Open another session and run the code below
--spid 54
EXEC @res = sp_getapplock @Resource = 'mylock', @LockMode = 'shared';
select @res

the session spid 54 will be blocked, you can check the blocking status with "sp_lock" or DMV "sys.dm_tran_locks".

here, spid54 is blocked by spid52 and waiting for application lock "mylock". 
if you release lock in spid 52, spid54 can get the application share lock then. by this way you can synchronize the code execution between sp_getapplock and sp_releaseapplock among multi-thread.

like all other lock, application lock will cause deadlock issue as well, please note, if the sp_getapplock is failed because of deadlock victim, the transaction will not be rollback, sp_getapplock will return -3 instead. it will cause orphan transaction sometimes, so you need to rollback the transaction explicitly. 

Saturday, November 9, 2013

log shipping restore job failed on secondary server

Several Situation will cause the log shipping restore job failed on secondary server

1. Restore job is not able to access the transaction log backup file.

    a) Other process is accessing the backup file, and lock the file, so restore job can not access it at the same time. 
        you can use "Process Explorer" to check which process locks the transaction log backup file
    b)permission issue. 
        check the secondary server,  make sure the SQL Agent service startup account has been granted correct permission.

2. Transaction log backup file is corrupted
    Then have to rebuild the log shipping.....

3. Some log backup file is missing
    All the log backup files should be consecutive. the restore job restores the log backup files based on the time stamp in backup file name. on secondary, System table msdb..log_shipping_secondary_database has a column saving the file name of last restored backup file,  any log backup file with bigger time stamp in its file name will be restored, if there is log backup file missing, the lsn chain will be broken, then the restore job will be failed. In order to find out the missing file, you can:
    a) on primary server, run the query below to list all log backup file history
select distinct s.first_lsn, s.last_lsn, s.backup_finish_date,y.physical_device_name
from msdb..backupset as s inner join
msdb..backupfile as f on f.backup_set_id = s.backup_set_id inner join
msdb..backupmediaset as m on s.media_set_id = m.media_set_id inner join
msdb..backupmediafamily as y on m.media_set_id=y.media_set_id
where (s.database_Name='xxx') order by s.backup_finish_date desc

   b) on Secondary, 
       run "Restore headeronly" to check the log backup file first_lsn and Last_lsn

   c) check the database last lsn on secondary server with "DBCC DBTABLE"
       --Get the database last lsn
declare @str char(64),
@file bigint,
@file_size int,
@array bigint,
@array_size int,
@slot int,
@command nvarchar(400),
@end_of_log_lsn     numeric(25,0),
@database_name varchar(100)
        --Set the database name here
Set @database_name='mytest'

set @command = N'dbcc dbtable(' + CONVERT (nvarchar, DB_ID(@database_name)) + N') with tableresults, no_infomsgs'

declare @temp table(parentObject nvarchar(255),Object nvarchar(255),field nvarchar(255),value nvarchar(255))
insert into @temp exec( @command )
select @str = value from @temp where field like '%m_flushLSN%';
set @file_size = charindex(N':', @str)
set @file = cast( LEFT( @str, @file_size - 1) as bigint)
set @array_size = charindex(N':', @str, @file_size + 1)
set @array = cast( substring( @str, @file_size + 1, @array_size - @file_size - 1) as bigint)
set @slot = cast( substring( @str, @array_size + 1, charindex(N' ', @str, @array_size + 1) - @array_size ) as int)
set @end_of_log_lsn = @file * 1000000000000000 + @array * 100000 + @slot
-- make sure @slot < MAX_SHORT (1024 * 64)
-- check to make sure tht file and array are < 4 Gig.

if (@end_of_log_lsn < 1000000000000000 or @slot >= 65536 or @file >= 4294967296 or @array >= 4294967296)
set @end_of_log_lsn = null

select @end_of_log_lsn as 'Last LSN'
After you get the last lsn of the database on secondary, comparing it with the log backup file first_lsn and Last_lsn to see if it is out of the backup file lsn range. if you can not find the db last lsn fit into any log backup files, then it mean there is file missing, then use the query in step a) to search which file is missing. 

and bear in mind, the first thing is checking restore job history to get the detail error message, then use the steps upper to narrow down the issue. 

Friday, November 8, 2013

"Failover Partner" keyword in connection string - Q&A

1. How to connect client to a Database Mirroring Session with failover aware

A: add FailoverPartner keyword in the connection string, for instance:
"Server=sq01;Failover Partner=sq02; Database=mydb; Network=dbmssocn;Trusted_Connection=yes"

2. why I get timeout error more frequently when I connect to mirrored database with keywords "Failover Partner"?

A: You probably encounter a .Net bug which happens on mirrored database, please refer to the link below
the workaround would be 
•Set the time-out connection string to the 150 

3. If the "failover partner" parameter is incorrectly setup in the connection string, is the client still aware of the failover, and connect to the mirror(new primary) after failover?

A: The Answer is YES.
when client connect to the primary server first time, it will download the mirror server name, and cache it in client process as failover partner name, it means, even if you set the wrong "Failover Partner" value in the connection string, client still can connect to the correct mirror server when failover happen. for instance, 

we have database mirror setup on SQ01 and SQ02, SQ01 is primary, in your connections string, we use SQ03 as failover partner as below, actually sq03 is not existed. 

"Server=sq01;Failover Partner=sq03; Database=mydb; Network=dbmssocn;Trusted_Connection=yes"

when you failover db from sq01 to sq02, your connection will be failed, but after you reconnect it without changing any connection string,  you can still get the data which is from sq02. 

so even without specifying the failover partner name in the connection string, client can still access primary or mirror when failover, just like below
"Server=sq01;Failover Partner=; Database=mydb; Network=dbmssocn;Trusted_Connection=yes"

Now you know, the words "Failover Partner" is the key in the connection string

4. how about the "Server" keywords in the connection string? if it is wrong, can client still support failover?

A: "Server" is most important in the connection string, here has several situation. let's use the previous sample
primary is sq01, mirror is sq02, sq03 is not existed.

1. "Server=sq03;Failover Partner=sq02; Database=mydb; Network=dbmssocn;Trusted_Connection=yes" 
Failed, client can not connect to the server since 02 is mirror and not online.

2. "Server=sq03;Failover Partner=sq01; Database=mydb; Network=dbmssocn;Trusted_Connection=yes" 
successful, client connected 03 failed, then tried to connect failover partner which is sq01
but it doesn't support failover database from sq01 to sq02.  in other words, after client get data from sq01 successfully, if we failover db from sq01 to sq02, client will lost connection, and can not failover to sq02 automatically.

5. last question, just for interesting, if 2 db have same name on sq01 and sq02, both online, does it supports failover

A: No, it will not happen. let's say you have database mydb on sq01 and sq02, both are online without mirroring, you use the connection string below
"Server=sq01;Failover Partner=sq02; Database=mydb; Network=dbmssocn;Trusted_Connection=yes" 

then , if the db on sq01 is online, your client can get data from sq01, but if the db on sq01 is not accessible, you will get error below:
Exception calling "Open" with "0" argument(s): "Server sq02, database mydb is not configured for database mirroring."

so it tells us the client provider is aware of mirror :) don't want to cheat it. 

database mirror is phasing out from new sql server version, alwayson is new technology of HA now.

Wednesday, November 6, 2013

One Database Mirror disconnected case

Today I got a database mirror case, I was told the all databases were in disconnected\in recovery status on both primary and mirror server, it is a dev environment.

Here just share my troubleshooting steps:

1. First, I login primary and mirror server. I got the same database status on both 2 servers

so it showed to me:
       a) since all databases had problem, it should be a system level error, not single database issue
       b) database in "disconnected/In Recovery" status, it indicts the server not only lost communication with the partner server(primary or mirror), but also lost communication with witness. so sql server can not bring database online on either servers, or else it will cause split-brain problem.

the question now is : why the 3 servers lost communication with each other?

2. Check connection among 3 servers with telnet command
telnet [sqlservername] [port]

all 3 servers can connect each other, so the network should be fine, and endpoing port is opened, firewall should be fine.

3. Check SQL Server Errorlog
I checked the sql server errorlog on all 3 servers(primary, mirror, and witness), I found same error message below:
2013-11-06 23:07:47.70 Logon       Database Mirroring login attempt by user 'xxx\xxxxx-db.' failed with error: 'Connection handshake failed. The login 'xxx\xxxxx-db' does not have CONNECT permission on the endpoint. State 84.'.  [CLIENT: xx.xx.xx.xx]

so it should be permission account of account 'xxx\xxxxx-db'

4. Check Account 'xxx\xxxxx-db' in sql server
By checking account permission, it shows the account is sys admin on every sql servers. so there should be no permission issue.

in your case, if you find account is not sys admin, make sure you grant connect endpoint permission to that account.

so what's wrong here? 
Although the account is sys admin in sql server, but it doesn't mean the account is 100% ok, because it is window domain account, when using window authentication mode, sql server will ask window to authenticate the user account first(by NTLM or Kerberos), if windows authentication is failed, sql server will reject that login request.

so I tried to login sql server with the account 'xxx\xxxxx-db', this time I got error message below:

Good. we are closed to the answer. normally, you can find the detail information of login failed error in errorlog just like below:
2006-02-27 00:02:00.34 Logon     Error: 18456, Severity: 14, State: 8.
2006-02-27 00:02:00.34 Logon     Login failed for user '<user name>'. [CLIENT: <ip address>]

the most important information is "State" code:

2 and 5
Invalid userid
Attempt to use a Windows login name with SQL Authentication
Login disabled and password mismatch
Password mismatch
Invalid password
11 and 12
Valid login but server access failure
SQL Server service paused
Change password required

But unfortunately, this time I didn't find anything from errorlog! then how to capture the detail information? you are right, Profiler trace!

I run the Profiler trace on sql server, and enabled the "Audit login failed" event only. finally I got the detail error message below

by searching the error message, I found the link below

It prove my guess that the window authentication has problem, in other words, the Account should have problem. 

what is the next step?

5. Check if there was any changes made before the error occurred
I am not a AD expert, troubleshooting the window account authentication is little bit hard for me, I can do it, but it might take several hours or days to troubleshooting it. so the easiest way is checking with the guy who report this issue to me, or ask AD expert to look into it. 

I asked the engineer who report this issue : is there any change they made on the account 'xxx\xxxxx-db' before the error occurred? suddenly they seems to remember something, and told me wait a minute, then they come back to me and said the issue has been solved, the mirror is recovered  now. I went back to the server, it is true, all database are in mirror synchronized status.

I asked them what happen, they told they made some changes on that account(like pull the account out of some domain groups), if I didn't mention the account to them, they would never think about the problem was caused by their change. so problem is solved. 

Despite no root cause here, we have some steps to narrow down the problem, and find a fastest way to solve it. 

Sunday, October 20, 2013

is count(1) faster than count(*)?

Recently I saw a post regarding the count(1) and count(*)

the post said "
You know that doing COUNT(*) on a table with a lot of columns and a lot of rows can take a lot of time and memory
so use count(1) instead of count(*).  Is count(1) really faster than count(*)? Is it true? let's do testing.

first, let's setup the testing db 

--Create DB
( NAME = N'test', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\test.mdf' , SIZE = 512000KB , FILEGROWTH = 102400KB )
( NAME = N'test_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\DATA\test_log.ldf' , SIZE = 102400KB , FILEGROWTH = 102400KB )
use [test]

1. Heap Table

--Create Heap table
CREATE SEQUENCE dbo.mysequence
      START WITH 1
select NEXT VALUE FOR dbo.mysequence AS [ID],, b.create_date,
c.type_desc into mytable 
from sys.objects a, sys.objects b, sys.objects c
--Save table page status
ObjectID NVARCHAR(200),
IndexID NVARCHAR(200),
PartitionNumber NVARCHAR(200),
PartitionID NVARCHAR(200),
iam_chain_type NVARCHAR(200),
PageType NVARCHAR(200),
IndexLevel NVARCHAR(200),
NextPageFID NVARCHAR(200),
NextPagePID NVARCHAR(200),
PrevPageFID NVARCHAR(200),

INSERT INTO DBCCResult EXEC ('DBCC IND(test,mytable,-1) ')
--with DBCC IND, we can know there are total 4747 pages in table mytable

a) test count(*)
--Clean Cache
--test count(*)
set STATISTICS time on
set STATISTICS profile on
select count(*) from mytable
set STATISTICS profile off
set STATISTICS time off
set STATISTICS io off
--save all buffer pool page
select * into Buffer1
from sys.dm_os_buffer_descriptors where database_id=db_id('test')
--check if all data pages have been read to buffer pool
select count(*)

from Buffer1 b inner join DBCCResult d on b.page_id=d.PagePID 

below is the execution plan and IO statistics

and it shows all 4747 pages have been read to buffer pool

b) test count(1)
--Clean Cache
--test count(1)
set STATISTICS time on
set STATISTICS profile on
select count(1) from mytable
set STATISTICS profile off
set STATISTICS time off
set STATISTICS io off
--save all buffer pool page
select * into Buffer2
from sys.dm_os_buffer_descriptors where database_id=db_id('test')
--check if all data pages have been read to buffer pool
select count(*)
from Buffer2 b inner join DBCCResult d on b.page_id=d.PagePID

you will see the execution plan is the same as count(*), and so is IO statistics

and you can see 4747 pages read to buffer pool as well.

2. Clustered Index Table

created clustered index on table



and rerun the test upper, I still got the same result against count(1) and count(*), they both use clustered index scan(table scan), and read the same page into buffer pool

3. Heap + Non-Clustered Index

create the heap table with the script of first testing, then create non-clustered index below. 
CREATE NONCLUSTERED INDEX IX_mytable_date ON dbo.mytable

then rerun the testing with count(1) and count*), still no difference, same execution plan and io statistics. but we see index scan this time, which is expected.

so based on my testing, there is no performance difference between them, if you see any difference between count(1) and count(*), please let me know , I am interesting on researching it :)

Tuesday, October 1, 2013

Enable Instant File Initialization to accelerate database restore

Today my colleague come to me and ask me why her database restore query was hang. She was going to restore a database with 200GB data file and 11GB log file.

She had run the restore command for about 15 minutes, but the restore process is still in 0%, it is not the normal situation she knows. I connected to the server and checked the running query status, the "restore database" command was waiting for the "ASYNC_IO_COMPLETION"

then I checked the disk performance, the database file is still in writing, 110MB/sec

before the database restore start writing data back to the data file, the data file needs to be initialized first. if instant file initialization is enabled, this step will be skipped for data file(only data file), looks like instant file initialization is not enabled on this server.

In order to prove it, I run the script from

DBCC TRACEON(3004,3605,-1)
EXEC sp_readerrorlog
DBCC TRACEOFF(3004,3605,-1)

in the error log file, I do see the
2013-10-01 18:31:05.56 spid104     Zeroing E:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestFileZero.mdf from page 0 to 131232 (0x0 to 0x40140000)

so the server is not enabled  instant file initialization. the data file is 200GB, disk write speed is 100-110MB/sec, we get : the file initialization will take about 33 minutes.

I told her just waited for half hour, then you would see the progress. and finally, she told me the restore estimate time started changing after 35 minutes :)

This case proves that enabling Instant File Initialization is really important for sql performance, including database restore.