Search This Blog & Web

Thursday, May 31, 2012

Schema XML Memoir (Definition to Implementation)


XML datatype first introduce in SQL SERVER 2005 as a column and it has a lots of benefits and implementation in today's environment. There are some restrictions in defining xml as a data type but still it has a huge benefits for large objects and complex structures like student information history, patient history in hospital management, logical comparison of properties in Property based database and CV detail of employees on any CV site. All of above have huge data to store but very little to update only thing required is to compare and return.

I have assigned task to optimize a table for its DML and reading operation and this table has XML column in int. I am not a beginner in xml implementation but I have assigned task to optimized it first time in last 5 years so this was quite a good task for me I have searched for it and complete this task by improving around 75% time. During this process I have learned many new things and I want to share it from start till end step by step.


I have gathered this information by performing operations on my environment and also using following links


Please have a look at the following links I have discussed in detail

1.       Typed XML vs UnTyped XML
4.       Binding and Verify XML Schema with database
5.       Create Column with XML Schema
6.       Quering data from XML
7.       Insert, Update and Delete on an XML
8.       Creating Indexes on XML Column
a.       Primary Index
b.      Secondary Index (Property, Value, Path)
9.       Drop or Alter XML Schema structure
10.   Why and Why Not XML

Tuesday, May 29, 2012

Hash functions to improve string comparison and DB desgin


Scenario

I have gone through some difficult experience to compare strings much time during database design and Querying and think about we need to get some different solution to perform same operation. Many times there are requirements to create Varchar field as Primary column like Account Number in Chart of Account table in Accounts software or Comments column in any table with varchar(max) or (100) etc.

Problem

When we try to join string based column or use in our where clause even to find a single record with like or equals clause it increase our query cost. I came across same scenario in recent time to compare a string based column to just check whether it equals to parameter string value or not and it took 76% of total query cost. SQL SERVER Optimizer hints to create a cover index using Primary Key plus this column which improve query performance by at least 50% but at the same time I came across Hash Key function that satisfied my needs, storage and optimization. So I am going to discuss what is Hash key and how can we implement it using SQL SERVER 2008 and above. I have gathered information from many sources and msdn is one of them.

What is Hash Function

Hash is the value (int or varbinary) result of an algorithm (known as a hash function) applied to a given string. You just need to provide your string as input and you will get a unique hash value as an output. If we provide a complete page string to this function and then change just a character to pass value to it. It will return different values. There is small possibility to repeat same value and this will known as hash collision.

Where can I use Hash Function?

·         Security implementation as encryption of string data.
·         Reduce network traffic for cross DB queries because of small size required to compare instead of string values.
·         Less space requires to store and campare values because it returns int and binary values instead of string characters.
·         Performance increase due to small db types and indexes uses if we implement index on it.
·         Avoid whole string comparisons and requires comparing just int values as checksum.
·         If we create hash column in our DB design along with string columns then it is easy for us to implement joins on these column that will act like string joins as these are hash values of string values.

Careful using Hash Function

·         Hash collision may return more than your expected result set by returning same hash value. For this reason, CheckSum might return more values then you expected but if you want to use this column to identify your column changes then consider HashBytes. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.
·          
Types of Hash function

SQL SERVER provides multiple hash functions like CheckSum, CheckSum_Binary, HashBytes etc. I am going to put example against CheckSum and HashBytes to just shows the function of hash values in logic.

Hash Keys in Database Design

You can efficiently design your table to use Hash key columns along with regular string column but it depends on your string column requirements in queries. Like in following table design I have created a varchar(1000) column and store 2 row in it. When we try to compare run query on this table to some value from String1 column it will compare as string and consume 100s of bytes of space and resources as well.

===================== Starts =======================
--- drop table if already exists
drop table Hashfunction
go

create table Hashfunction
(id int identity primary key,
            string1 varchar(1000),   
            checksumCol as CHECKSUM(STRING1) ,
            HashByteCol as HashBytes('SHA1',string1)  )

Insert into Hashfunction(string1)
values ('Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.'),
('Returns the MD2, MD4, MD5, SHA, or SHA1 hash of its input.')
===================== Code Ends =======================

If you look at the CheckSumCol and HashByteCol. I have added two computed columns to add hash key values and if you look at their parameter passed is String1 column it means CheckSum and HashByte column will return hash values against String values for both rows as shown in following picture




CHECKSUM

Returns the checksum value computed over a row of a table, or over a list of expressions. Output of checksum value is integer. CHECKSUM is intended for use in building hash indexes. You can create Checksum value for more than one column like

select CHECKSUM('this is my test','this is checksum test')
returns -506357463

CHECKSUM returns an error if any column is of noncomparable data type. Noncomparable data types are text, ntext, image, XML, and cursor, and also sql_variant with any one of the preceding types as its base type.
CHECKSUM satisfies the properties of a hash function: CHECKSUM applied over any two lists of expressions returns the same value if the corresponding elements of the two lists have the same type and are equal when compared using the equals (=) operator. If one of the values in the expression list changes, the checksum of the list also generally changes. The order of expressions affects the resultant value of CHECKSUM.

The CHECKSUM value is dependent upon the collation. The same value stored with a different collation will return a different CHECKSUM value.

Hash Indexes

CHECKSUM computes a hash value, called the checksum, over its list of columns or arguments. The hash value is intended for use in building hash indexes. If the arguments to CHECKSUM are columns, and an index is built over the computed CHECKSUM value, the result is a hash index. This can be used for equality searches over the columns.

---- Hash Indexe on CheckSum int Column
CREATE INDEX Hashfunction_index ON Hashfunction (CheckSumCol);

The checksum index can be used as a hash index, particularly to improve indexing speed when the column to be indexed is a long character column. The checksum index can be used for equality searches.

--- Comparision2 as Int
select * from Hashfunction where CheckSumCol =  CHECKSUM('Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.')

Creating the index on the computed column materializes the checksum column, and any changes to the column value will effect to the computed column. Alternatively, an index could be built directly on the column indexed.

HASHBYTES

Despite of CheckSum function that returns int value hashbyte returns varbinary value max up to 8000 bytes. There are extreamly less chances of collision in hashbytes instead of checksum. Hash byte function use hash algorithms MD2 | MD4 | MD5 | SHA | SHA1
The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1.
In above created table you can see HashBytes computed column in Hashfunction table. You can use it as following
--- Comparision3 as VarBinary
select * from Hashfunction where HashByteCol = HASHBYTES('SHA1','Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.')

Allowed input values are limited to 8000 bytes. If you pass values more than 8000 values it will generate hash value only for first 8000 characters like

DECLARE @v varchar(max);
SET @v = REPLICATE ('A', 9000); -- Input longer than 8,000 bytes  SELECT LEN(@v) AS VarcharLength, HashBytes('SHA1', @v) AS HashValue;SET @v = REPLICATE ('A', 8000); -- Input exactly 8,000 bytes  SELECT LEN(@v) AS VarcharLength, HashBytes('SHA1', @v) AS HashValue;

The output is as follows:
VarcharLength        HashValue 
8000                0xD2967D6425E56C18BA979EEFB4E0DBD1269D9BC9
8000                0xD2967D6425E56C18BA979EEFB4E0DBD1269D9BC9

Monday, May 28, 2012

SQL SERVER 2008 Express with Advance features

One of my friends finds difficulty to get detail about Advance features in SQL SERVER 2008 Express Advance features and wants to know weather SSRS supported in it or not.
Look at the detail of features in SQL SERVER 2008 Express with Advance Features

I got this detail from different sources of msdn linked below.

You can download it from this link
http://www.microsoft.com/en-us/download/details.aspx?id=1842


features supported and unsupported by SSRS 2008 Express with Advance Edition is

SQL Server Express provides the following Reporting Services functionality:
·         On-demand report processing for each user who views a report. When a user opens a report, the report is initialized, the query is processed, data is merged into the report layout, and the report is rendered into a presentation format.
·         Rendering formats are available for HTML, Acrobat, and Excel.
·         Report data sources must be SQL Server relational databases that run locally in SQL Server Express.
·         Report server management and report viewing are supported through Report Manager.
·         Configuration is supported through the Reporting Services Configuration tool.
·         Rs.exe, rsconfig.exe, and rskeymgmt.exe command line utilities are available in SQL Server Express.
·         Windows Authentication and predefined roles are used to map existing group and user accounts to a named collection of operations.
Other editions of SQL Server include a larger set of Reporting Services features. The following list describes the features that are documented in SQL Server Books Online, but cannot be used in this edition:
·         Scheduled report processing, caching, snapshots, subscriptions, and delivery are not supported.
·         Analysis Services, Oracle, XML, SAP, SQL Server Integration Services (SSIS), OLE DB, and ODBC data sources are not supported.
·         Remote data sources are not supported. Reports that are hosted in a SQL Server Express report server must retrieve SQL Server relational data from a local SQL Server Express Database Engine instance.
·         Ad hoc reporting through semantic models and Report Builder is not supported.
·         TIFF (Image), XML, and CSV rendering extensions are not supported.
·         The Reporting Services API extensible platform for delivery, data processing, rendering, and security is not supported.
·         Custom authentication extensions and custom role assignments are not supported. You must map existing Windows domain user and group accounts to predefined role definitions.
·         Custom report items are not supported.
·         Managing a long-running report process is not supported. Specifically, you cannot use the Manage Jobs feature in Report Manager to stop report processing.
·         Scale-out deployment is not supported.
·         SharePoint integrated mode is not supported.
·         Report Builder 2.0 is not supported. Report Builder 2.0 can connect to a SQL Server Express database on the local computer or on a remote report server.


list of Supported features and not Supported features in SQL SERVER Express 2008
http://msdn.microsoft.com/en-us/library/ms365248(v=sql.100).aspx


SQL Server Express supports most of the features and functionality of SQL Server 2008. This includes the following:

Stored procedures
SQL Server Configuration Manager
Views
Replication (as a subscriber only)
Triggers
Advanced Query Optimizer
Cursors
SMO/RMO
sqlcmd and osql utilities
Integration with Visual Studio
Snapshot isolation Levels
Service Broker (as a client only)¹
Native XML support. This includes XQuery and XML schemas
SQL CLR
Transact-SQL language support
Multiple Active Result Sets (MARS)
Dedicated Administrator Connection²
Full text search
Subset of Reporting Services features³
Import/Export Wizard

The following table lists the additional SQL Server 2008 database features that are not supported in this version of SQL Server Express. It also lists database features from earlier versions of SQL Server that are not supported.
SQL Server features not supported in SQL Server Express
SQL Server features from earlier versions not supported in SQL Server Express
Database mirroring
SQL Mail
Online restore
Failover clustering
Database snapshot
Distributed partitioned views
Parallel index operations
VIA protocol support
Mirrored media sets
Log shipping
Partitioning
Parallel DBCC
Address Windowing Extensions (AWE)
Parallel Create Index
Hot-add memory
Enhanced Read Ahead and Scan
Native HTTP SOAP access
Indexed views (materialized views)
SQL Mail and Database Mail
Partitioned views
Online Index Operations
-
SQL Server Agent and SQL Server Agent Service
-

Friday, May 25, 2012

Log file unexpected growth in SQL SERVER with Database Mirroring

While implementing Database mirroring a major problem with the database sizes are to handle log file size of a database.


Reasons of Log Size Increases


1- As it is necessary to set recovery modal of a database as "FULL" all the transactions and DML operations are stored into log file before data file write. With the passage of time it will increase Size of your log file. Log file size can increase for multiple reasons we has discussed one lets discuss more reasons

2- While implementing database mirroring when fail over occurs or mirror database is unavailable due to network connections or server downtime. Log file holds all pending transactions until mirror will be available again and data will commit to the mirror. Due to this pending process Log size increase in a huge size in very low time.

3- If drive space will low to hold database files on mirror server then principal server database is unable to send and commit data on mirror server. In return it hold up all the structural and DML changes until mirror is available.


Shrinking Log File


When any of the above reason occurs you need to apply different solutions to overcome this problem. Lets discuss how can we avoid from such conditions specially when database mirroring is in place

  • 1- Plan a full backup of your database for backup purpose as per your plan.
  • 2 - Plan  transactional log backup of your database after some time within Full backup.

In this way your Log file size will be in control because every time transactional backup took place it will remove all the committed transactions from the log file and write to backup media. this is how it decreases the Log file size in the transaction. for more detail about backups have a look at this link

http://shamas-saeed.blogspot.com/2010/05/comprehensive-note-on-backup-policies.html

  • Implementing Shrink Log Job
  • Use DBCC Shrink DB Query

In above conditions you need to monitor very carefully that Jobs implementation is not on regular bases (this will increase fragmentation and effect performance) and ran successfully and you have mentioned that release space must return to the OS. You can create Shrink Log Job through Management Studio from this path... but this will shrink your database when job successfully executed otherwise log size will grows.

DBSERVER--> Management-- > Maintainess Plan -- > Plan Wizard -- > shrink DB

In Second way you need to manually notice when log file size requires shrinkage and you will run DBCC command as follows


dbcc shrinkdatabase('DBName',10)
dbcc shrinkfile('DBName_log',10)


ShrinkDatabase will release unused space and decrease to 10MB free space if more is occupied
ShrinkFile will decrease to 10 MB by reverted all uncommitted transactions.

Note: Your database status must be "Simple" not "Full". If you want to do this in your mirroring solution then you need to disconnect mirroring and change database status to Simple.

You can shrink your database using Management Studio as well you can shrink database file. look at the following screens.






Thursday, May 24, 2012

How to find defult transaction level using DBCC - SQL SERVER

I need to know what is my Default settings for Transaction level and Datetime etc. Using DBCC command i can do this easily.

DBCC useroptions

running the above statement will return default values for all following available options.


Monday, May 21, 2012

Changing database from Restore to Normal mode.

Some time database enters into restoring mode like in database mirroring condition and you need to change it to normal mode to restore latest backup or delete mirror db. Changing database from Restore to Normal is a critical update because all your uncommitted data from log file will be truncated.

============= Query ======================
RESTORE DATABASE dbname WITH RECOVERY 
============= Query ======================


this will be its output.


Could not write a checkpoint record in database ID 17 because the log is out of space. Contact the database administrator to truncate the log or allocate more space to the database log files.
RESTORE DATABASE successfully processed 0 pages in 12.209 seconds (0.000 MB/sec).

Finding Database and Table Size for all list of Tables in Sql server 2008 and Above

I have found a very good question discussion about how to find database physical size and table physical size using SQL Query. This query also returns number of rows and data size for each table in MB and GBs.


==================== Query ===============================


-- Script to analyze table space usage using the
-- output from the sp_spaceused stored procedure
-- Works with SQL 7.0, 2000, and 2005, 2008, R2

set nocount on


print 'Show Size, Space Used, Unused Space, Type, and Name of all database files'

select
[FileSizeMB] =
convert(numeric(10,2),sum(round(a.size/128.,2))),
        [UsedSpaceMB] =
convert(numeric(10,2),sum(round(fileproperty( a.name,'SpaceUsed')/128.,2))) ,
        [UnusedSpaceMB] =
convert(numeric(10,2),sum(round((a.size-fileproperty( a.name,'SpaceUsed'))/128.,2))) ,
[Type] =
case when a.groupid is null then '' when a.groupid = 0 then 'Log' else 'Data' end,
[DBFileName] = isnull(a.name,'*** Total for all files ***')
from
sysfiles a
group by
groupid,
a.name
with rollup
having
a.groupid is null or
a.name is not null
order by
case when a.groupid is null then 99 when a.groupid = 0 then 0 else 1 end,
a.groupid,
case when a.name is null then 99 else 0 end,
a.name




create table #TABLE_SPACE_WORK
(
TABLE_NAME sysname not null ,
TABLE_ROWS numeric(18,0) not null ,
RESERVED varchar(50) not null ,
DATA varchar(50) not null ,
INDEX_SIZE varchar(50) not null ,
UNUSED varchar(50) not null ,
)

create table #TABLE_SPACE_USED
(
Seq int not null
identity(1,1) primary key clustered,
TABLE_NAME sysname not null ,
TABLE_ROWS numeric(18,0) not null ,
RESERVED varchar(50) not null ,
DATA varchar(50) not null ,
INDEX_SIZE varchar(50) not null ,
UNUSED varchar(50) not null ,
)

create table #TABLE_SPACE
(
Seq int not null
identity(1,1) primary key clustered,
TABLE_NAME SYSNAME not null ,
TABLE_ROWS int not null ,
RESERVED int not null ,
DATA int not null ,
INDEX_SIZE int not null ,
UNUSED int not null ,
USED_MB numeric(18,4) not null,
USED_GB numeric(18,4) not null,
AVERAGE_BYTES_PER_ROW numeric(18,5) null,
AVERAGE_DATA_BYTES_PER_ROW numeric(18,5) null,
AVERAGE_INDEX_BYTES_PER_ROW numeric(18,5) null,
AVERAGE_UNUSED_BYTES_PER_ROW numeric(18,5) null,
)

declare @fetch_status int

declare @proc varchar(200)
select @proc = rtrim(db_name())+'.dbo.sp_spaceused'

declare Cur_Cursor cursor local
for
select
TABLE_NAME =
rtrim(TABLE_SCHEMA)+'.'+rtrim(TABLE_NAME)
from
INFORMATION_SCHEMA.TABLES
where
TABLE_TYPE = 'BASE TABLE'
order by
1

open Cur_Cursor

declare @TABLE_NAME varchar(200)

select @fetch_status = 0

while @fetch_status = 0
begin

fetch next from Cur_Cursor
into
@TABLE_NAME

select @fetch_status = @@fetch_status

if @fetch_status <> 0
begin
continue
end

truncate table #TABLE_SPACE_WORK

insert into #TABLE_SPACE_WORK
(
TABLE_NAME,
TABLE_ROWS,
RESERVED,
DATA,
INDEX_SIZE,
UNUSED
)
exec @proc @objname =
@TABLE_NAME ,@updateusage = 'true'


-- Needed to work with SQL 7
update #TABLE_SPACE_WORK
set
TABLE_NAME = @TABLE_NAME

insert into #TABLE_SPACE_USED
(
TABLE_NAME,
TABLE_ROWS,
RESERVED,
DATA,
INDEX_SIZE,
UNUSED
)
select
TABLE_NAME,
TABLE_ROWS,
RESERVED,
DATA,
INDEX_SIZE,
UNUSED
from
#TABLE_SPACE_WORK

end --While end

close Cur_Cursor

deallocate Cur_Cursor

insert into #TABLE_SPACE
(
TABLE_NAME,
TABLE_ROWS,
RESERVED,
DATA,
INDEX_SIZE,
UNUSED,
USED_MB,
USED_GB,
AVERAGE_BYTES_PER_ROW,
AVERAGE_DATA_BYTES_PER_ROW,
AVERAGE_INDEX_BYTES_PER_ROW,
AVERAGE_UNUSED_BYTES_PER_ROW

)
select
TABLE_NAME,
TABLE_ROWS,
RESERVED,
DATA,
INDEX_SIZE,
UNUSED,
USED_MB =
round(convert(numeric(25,10),RESERVED)/
convert(numeric(25,10),1024),4),
USED_GB =
round(convert(numeric(25,10),RESERVED)/
convert(numeric(25,10),1024*1024),4),
AVERAGE_BYTES_PER_ROW =
case
when TABLE_ROWS <> 0
then round(
(1024.000000*convert(numeric(25,10),RESERVED))/
convert(numeric(25,10),TABLE_ROWS),5)
else null
end,
AVERAGE_DATA_BYTES_PER_ROW =
case
when TABLE_ROWS <> 0
then round(
(1024.000000*convert(numeric(25,10),DATA))/
convert(numeric(25,10),TABLE_ROWS),5)
else null
end,
AVERAGE_INDEX_BYTES_PER_ROW =
case
when TABLE_ROWS <> 0
then round(
(1024.000000*convert(numeric(25,10),INDEX_SIZE))/
convert(numeric(25,10),TABLE_ROWS),5)
else null
end,
AVERAGE_UNUSED_BYTES_PER_ROW =
case
when TABLE_ROWS <> 0
then round(
(1024.000000*convert(numeric(25,10),UNUSED))/
convert(numeric(25,10),TABLE_ROWS),5)
else null
end
from
(
select
TABLE_NAME,
TABLE_ROWS,
RESERVED =
convert(int,rtrim(replace(RESERVED,'KB',''))),
DATA =
convert(int,rtrim(replace(DATA,'KB',''))),
INDEX_SIZE =
convert(int,rtrim(replace(INDEX_SIZE,'KB',''))),
UNUSED =
convert(int,rtrim(replace(UNUSED,'KB','')))
from
#TABLE_SPACE_USED aa
) a
order by
TABLE_NAME

print 'Show results in descending order by size in MB'

select * from #TABLE_SPACE order by USED_MB desc
go

drop table #TABLE_SPACE_WORK
drop table #TABLE_SPACE_USED
drop table #TABLE_SPACE