Database Programming, SQL and Stuff VB.Net, ASP.Net, Java, JSP

Wednesday, August 13, 2008

SQL Server Clustering and Consolidation - Balancing Resources

Problem
In my SQL Server environment we initially clustered to support a consolidation effort in which we were focusing on elimination of approximately thirteen Microsoft SQL Server 2000 and 2005 instances. Along the way we had a mission critical application that was backed by SQL 2000 that was budgeted without hardware for supporting a dedicated SQL Server instance. We were forced into hosting this database on our relatively new SQL 2005 Enterprise cluster as a result. Our consolidation effort essentially ended on the day that decision was made. I've finally been able to make the case that this application needs a dedicated SQL environment. What started as a single database for supporting this product has morphed into an eight-database suite that, through design and support, requires more than its share of system (and Database Administrator) resources because of where it is hosted.

Just because you have disk space, plenty of memory, and abundant CPU does not mean you can pile databases onto an instance of Microsoft SQL Server; you need to pay attention to load balancing. In the case of a clustered environment I take it one-step-removed: a step I like to call node balancing.

Author's Note: This is part three of a continuing series on Microsoft SQL Server 2005 Clustering. Some of the terminology used in this tip and future clustering tips in the series were defined and explained in parts 1 and 2. While parts 1 and 2 are not pre-requisites for this tip it is advised that you also read those tips in this series (and any future tips as well!)

Solution

Node Balancing

Microsoft Clustering Services are not designed for load balancing, but rather failover in the event of hardware failure. Therefore when I play off the concept of load balancing what I am actually referring to is the process by which I make a determination of which databases to host on my various SQL Server instances. I base this determination on a number of factors:

Relative importance of the application to the entity
Amount of resources required by the database(s) that support this application
Level of access required by the support staff (possibly Third-Party Vendor)
Architectural considerations
Collation requirements

Each one of these factors may determine how I formulate my architectural recommendations for hosting any particular database that is a candidate for clustering. However it is usually a combination of factors that determine why I may choose to host a database or set of databases on their own cluster, own active node in a cluster, or on a shared node with multiple non-related databases. I detail each of these factors next with recommendations based upon my experience for each.

Relative Importance of the Application

There are two distinct reasons for clustering Microsoft SQL Server: consolidation and fail-over. Consolidation will allow you to host multiple databases together on a single SQL instance up to a point where the resources of the server can no longer support additional database activity. Clustering for failover allows you to maintain an actively-accessible database instance in the event a node in the cluster encounters stress or damage and a shutdown occurs. Databases that are considered "mission critical" should be hosted in such a way that affords high-availability. For the purposes of this tip we will state that the solution we're recommending for that purpose is failover clustering. Conversely, databases that don't meet the requirements of being "mission critical" do not necessarily need to be available if a hardware failure is encountered. They may be able to be down for a short period of time, perhaps even for days depending upon their usage patterns. These databases may be hosted on the cluster only for the sake of consolidating multiple instances worth of databases on a more powerful clustered instance of SQL Server. More powerful? Certainly. If you're going to take advantage of clustering I strongly suggest that you do so based upon the 64-bit platform and with Enterprise Edition SQL Server. The limitations against Standard Edition SQL (no online indexing, limitation on only two nodes in a cluster) make it a less-expensive, yet less scalable solution.

For the basis of this tip I will break down the database classes, based upon criticality into three tiers. These are the classifications we use in our company and it works well for us (so far).

Tier 1 - Your company's most-important databases. You may perhaps have only one of these - maybe as many as a half dozen. These are the systems that keep your company running or service your customers with their critical data requirements: databases providing data related to Patient Care, Online Banking, or Criminal History information would be candidates for this class of databases.
Tier 2 - These are the databases that host data for applications that, while important, may allow some appreciable measure of downtime. Payroll , Educational Registration, Annual Employee Performance Review programs, and E-Commerce sites would fall into this category.
Tier 3 - We all have plenty of these in our environments. These are the databases that get created because someone in accounting wanted to "upsize their Excel Database". These databases typically support department-level internal applications that run during business hours and can be down for periods of time with effect only on productivity of a select group on individuals.

Let's take care of the easy choice first: Tier 3 databases should be hosted together. There, that was easy. Seriously, we do not need to dwell on the Tier 3 databases, we need to focus our attention on answering the following questions:

Should Tier 1 databases be hosted on a dedicated node in shared cluster or should they be hosted in their own active-passive cluster?

I've advocated both sides of this debate at different points in time. By placing the Tier 1 database on a dedicated node within a larger, shared cluster you still reap the benefits of clustering, with only a single additional server to manage. (So long as there is already a passive node in the cluster.) If you were to dedicate a complete cluster for the sole purpose of hosting this database you must add at least two new servers to your environment along with all the maintenance and administration that entails - not only for you, but for the multitude of teams and staff that are responsible for supporting a server in today's IT environments. One should also not forget the added costs associated with yet another passive node (hopefully) just sitting idle in your data center. Conversely, what happens if this database or one of the others in the cluster require a patch or configuration change at the server level? In a clustered architecture this requires all nodes in the cluster to be patched regardless of whether the database is currently being hosted on the server. You need to be prepared for a failover situation and this requires that all nodes in the cluster be identical in content and configuration from a software perspective. Therefore you may find yourself in a situation where a Tier 2 or Tier 3 database causes compatibility issues for a Tier 1 database in your shared cluster - even if the Tier 1 database is hosted on its own node.

Recommendation: If you have the funding, Tier 1 databases, those that are most-critical to the well being of your company and customers, should be hosted on their own cluster. Furthermore, it is recommended that you also provide additional redundancies to protect against media (disk) corruption on the SAN volumes assigned to the database via third-party SAN replication tools or by native SQL technologies such as replication, database mirroring, or log shipping.

Should Tier 2 databases be hosted on the same node as Tier 3 databases or should they be on their own dedicated node in a shared cluster?

Just because a database may be classified as Tier 2 using the definitions I've presented does not make it unimportant. However, they seldom would require their own dedicated cluster unless you're company manufactures money or has worked out that whole alchemy process of turning rocks to gold. These databases would need to compete for resources equally against less-important (and possibly more intrusive databases) if hosted on the same node with Tier 3 databases so in most situations that architecture is not advised either. This is where you need to also take other factors such as resource requirements and activity trends into consideration in attempts of balancing a node for support of perhaps one or more Tier 2 databases.

Recommendation: if funding permits host Tier 2 databases on their own node in an shared cluster environment. If resources allow, and your benchmarking proves that multiple Tier 2 databases can co-exist on the same node, host those databases together. Considerations need to be taken for any database that is hosted on a shared Microsoft SQL Server instance. Failovers, maintenance, and downtimes will impact all databases on the instance.

Amount of Resources Required by The Database

I support hundreds of databases in my environment. We have very large databases that have very little activity. We have relatively small databases that have hundreds of concurrent static connections that remain open all day long. We have a little bit of everything in between. What may be the most consuming task for a Database Administrator is determining just what usage patterns each of your databases have. The process for collecting this information is outside the scope of this tip, but I'll make suggestions for resources at the end of this tip on how to determine load over time for a specific database. Once you've collected this data, the goal is to balance out the resource requirements across all your databases you plan on hosting on a given instance. What makes this process extremely difficult is that some items that impact resources greatly may be out of your control - even as the Database Administrator. A properly-indexed INSERT query may run in fractions of a second, however that same query, with inadequate indexes or out-of-date statistics, may run for minutes or worse. If this is a database that was developed in-house you may have the ability and rights to correct the issue by adding or altering the indexes. If this is a commercially-developed database your company purchased you may have the ability to make similar changes, but may be precluded from doing so by the nature of an existing support agreement with the vendor. There is also the impact of application life cycles. Your company comes to rely more (or perhaps less if the case may be) on a database over time. This means your initial performance monitoring results used to determine how and where to host a given database may become outdated. The information you collect over a sampling period may not be indicative of the database activity or impact later in your production environment.

This is the grayest of gray areas - a sliding scale of metadata - one of the leading causes of premature baldness, depression, and substance abuse by Database Administrators. Unfortunately it is also an important process in determining the hosting structure for your databases. I highly recommend rounding up whenever possible for the sole reason that it is better to have excess resources sitting idle on your database servers than finding yourself in need of resources when your databases need it most.

Recommendation: Use SQL Server Performance Monitor and Profiler to monitor activity against candidate databases for clustering to best determine high and low points in activity throughout a typical week in order to find databases that may be compatible for co-hosting on a shared SQL Server instance in your enterprise's cluster.

Level of Access Required by Support Staff/Vendor

So we move from difficult to easy. This really is a simple line-in-the-sand for most Database Administrators. On one side of the line: Database Role Rights. The other side of the line: Server Role Rights. If the support team (be they internal or third-party) is agreeable to having their permissions to a SQL instance capped at Database role rights (db-owner role rights or less) then this makes the database a candidate to be hosted on a shared SQL instance. It is also quite common that a support team would need to have certain role permissions in msdb to support any jobs associated with their database(s). I would still consider this a candidate for a shared database environment. However, if the vendor or support team mandates that they have any level of Server Role membership (System Administrator, Database Creator, Security Administrator, etc.) then they are going to be isolated to a dedicated SQL instance. The reason is that it is the primary responsibility of the Database Administrator to isolate access to only those individuals, teams, or entities that have legal, ethical, or commercial rights on any given database. If you allow a vendor for database X to have server-level permissions on a SQL instance they could conceivably affect every other database on that instance.

Recommendation: If you can not limit non-DBA access to database-only rights, and server-level permissions, role rights, or securables are required by the application support staff then the database(s) in question should be on a dedicated node within your cluster.

Architectural Considerations

This category exists for only a single question: is there a mandate, by either the structure of the application and database or by support agreements, that the application and the database must reside on the same physical server? If so, not only should you not look at clustering for this solution, but you should definitely plan to dedicate an instance to just the database(s) that support this application.

If the program architecture requires the database to reside on the same physical server as the application, web, or other non-SQL Server components then do not host the database within your cluster. Instead dedicate a standalone SQL instance for the database(s) and provide redundancy/high-availability by other means such as replication, database mirroring, or log shipping.

Collation Requirements

Granted, in SQL Server 2005, the collation settings for the databases can be different from the collations for the system databases. However, it is critical that you realize that there are implications when an vendor mandates their databases run under a specific non-standard collation. Typically the issue is tempdb. Think of all the processing that occurs in tempdb: this is where temporary objects are created for all databases such as temp tables, table variables and cursors. Work tables used by SQL Server to internally sort data are created in tempdb. Issues arise when tempdb is a different collation than the user database and results of computations may not match expectations. My suggestion when non-standard collations are specified is to ask if this is a requirement or a preference. If a requirement ask for specific reasons why this collation is required. (You'll be shocked how many times vendors will erroneously cite "That's what Microsoft's documentation states!") If a non-standard collation is required then you're looking at a dedicated instance of SQL Server. I recently had to rebuild a cluster node only because I did not use the default collation for SQL 2005, but rather selected the same criteria piecemeal and built my own collation. Since it did not match in name to "SQL_Latin1_CI_AS", scripts in a single database running alongside 40 other databases failed a validation check.

Recommendation: If a non-standard collation is required it does not mean you can not host the database in the existing cluster, however it may require you to dedicate an instance of SQL only for this database. You may be able to set the minimum and maximum memory requirements for this instance low enough to reside on an existing node with another SQL Server instance depending upon the required resources for the database in question. You may need to dedicate a node for the SQL instance if you find resource contention between this instance and any other instances running on the same cluster node. It is best to determine if the non-standard collation is a requirement or a request.

SEO Tips - Email Marketing Tips, Getting More Email Subscribers

Growing an email list is a critical part of email marketing success. The more people you have to talk to, the more chance you have of getting a good number of people to do what you want them to do, be that click, comment or buy. In the previous article of this email marketing series we looked at getting started in email marketing, now we will look at how you can grow your list from a few to thousands of subscribers.

Size Is Not Everything

First a some caution is necessary. Size is not the main goal.

Aim to build a good quality relationship with people who want to hear from you rather than just focus on quantity.

Always target the best sources of motivated, interested subscribers, not just where you can grab handfuls of names.

Prepare

Have your sign up form visible, above the fold, but also have a subscribe page that effectively describes the benefits of signing up and that you and other people can link to. Especially if you followed the advice in the previous article and have an incentive to sign up. This is important as you will see.

Optimize

You should be continuously looking at what you are doing to see if you are being effective in attracting people to your list and if you are attracting only the right people. Unsubscribes, and the reasons for them, will tell you a great deal. Do more of what works.

10 Best Sources of Email Subscribers

OK, here is the bit you really wanted to read. Where to get lots of eager subscribers:

You! - Common sense? Probably, but people really do not take advantage of all their own potential places to reach new subscribers. Build on your own touch points such as:
1. Your blog - Content is a fantastic way to reach subscribers. Give them great content followed by a call to action to tell them why they should subscribe. Kind of like this series you are reading, if you want to not miss the next part then you will need to subscribe, right?
2. Forum postings - Many forums allow you to include links back to your own site. Instead of “here is my blog / home page”, put “get tips and news right to your inbox by signing up to my newsletter”. Think about it, the people taking part in the Walrus Polishing forum will be interested in more info on that topic, especially if your postings are good.
3. Other lists - You might already have email newsletters. Some of the topics will overlap, while others might intersect occasionally. For example, Darren launched Digital Photography School partly through mentions on ProBlogger.
4. Email signatures - Any email interaction is a potential for a new sign up, especially email discussion lists. Do NOT harvest email addresses, put the subscribe URL in your signature with a call to action and if they are interested they will opt-in.
5. Networking - Be it online or off, there will be a chance to put your subscribe page on your profile, business card, powerpoint … now aren’t you glad you spent time creating a compelling subscribe page?
Forwards - Forwarded emails are an excellent source of new subscribers because it is both an endorsement and a free sample of your good stuff. This is why you should include subscription options or obvious and compelling web links in your subscriber messages, not for existing subscribers but those you tell. If it is too hard to sign up they will just think “interesting” then move on.
MGM - “Member Get Member” or “Recommend a Friend” is a common way to get your subscribers to bring friends. Rather than just ask or hope they will forward, ask or incentivise them to use a special form. This form will send a customized email to each of their submitted friends, telling the friend about this cool thing, and optionally then delivering some goodies as a thank you. Another way to do it is give additional chances of winning a prize for each member they attract. Just be careful, this can be spammy if taken too far!
Incentives - I have already mentioned incentives a fair bit, because they work. There are three ways to get sign ups from free gifts:
1. Before - Subscriber bonus that you can’t get without giving your email address, my Flagship Content ebook is one example
2. After - Links in the document, video pages, and so on, encourage sign up but you do not need to opt-in to get the content. This is basically how blogs work and also “watch video one, opt-in to get part two”.
3. Before AND after - The most advanced is to give free, open content, get opt-in then reinforce subscription with further permissions. You want to build a deeper relationship, from reader, to subscriber, from subscriber to customer, and so on. Consider what I have been doing moving feed subscribers to email subscription by offering benefits over and above my daily content.
Virals - A viral is basically anything that gets passed around, be it a funny video, a quiz or a small game. The same before/after approaches can be used as in the “incentives” point. The advantage of a viral is people want to pass it on, so can be used as a opt-in delivery system.
Bloggers - Other bloggers are a great source of subscribers:
1. Reviews - They review your video, download, or other incentive, their readers click to get it and opt-in.
2. Links - Make your newsletter or incentive about a focused topic and bloggers will link it when talking about your topic.
3. Guest posts - Mention your newsletter in your attribution line “Sign up to get more tips like these …”.
List owners - Referrals from other lists work very well because these people already demonstrate a willingness to subscribe! Again, your incentive or bonus might be the way in, but also through networking and getting to know list owners you can do each other the favor.
Partners - Team up with other publishers to build a new list, as Darren and I did with the ProBloggerBook.
Affiliates - People will want to promote you if there is something in it for them. Offer a commission on any sales, connect your affiliate cookie to newsletter signups. Affiliates send visitors to get your free subscriber bonus, when any sales happen the affiliate gets rewarded for delivering the lead.
Paid - Advertising can be very effective but always test your ROI to make sure you are getting the best price per lead.
1. Email newsletter advertising - For a price you can get space in popular email newsletters, usually in the cost per so many thousand emails. This is a better alternative than buying lists but should still be tested before spending a great deal.
2. Web advertising - Use pay per click or banner advertising (such as Performancing Ads) to drive people to your list using your incentive as your call to action.
3. Offline - Classifieds and print advertising can work, just do the same as above and use your incentive to drive sign ups

Summary

There are many variations of the above tactics but they are the main ways to get people on to your list. Always build on your success by monitoring, testing and tracking your subscriber sources and developing new routes to members.

I have mentioned testing and tracking quite a bit, it is vitally important, so in the next part of this series I will go through email metrics and how they are a powerful tool in building a profitable email marketing campaign.

Saturday, August 2, 2008

How to Find Keywords in SQL Server Stored Procs and Functions

Problem
Our business users are storing some data in SharePoint lists that we need to access from our ETL process that updates our SQL Server data warehouse. Can you provide us with the details and an example of how we can do this from a SQL Server CLR function?

Solution
Windows SharePoint Services (WSS) provides an API and a number of web services that can be used to access the data in the SharePoint content database programmatically. While the content database is in fact a SQL Server database, the recommended way of accessing the data is via the API or a web service. When using the API your code has to be running on the SharePoint server; you can call the web service from just about anywhere. Therefore, the web service approach provides the most flexibility and is probably the best in most cases. For a SharePoint list you can use the Lists web service. Starting with SQL Server 2005 you can code stored procedures, triggers, user-defined functions, user-defined aggregates, and user-defined types using Microsoft .NET code; e.g. Visual Basic .NET or C#. For the task at hand, a table-value function (TVF) written using the CLR would be a good choice. A TVF is a function that returns a result set. You can use the function in a SQL statement as if it were a table.

Before we proceed to walk through the code for our TVF, let's get our environment ready. By default the CLR is not enabled in SQL Server. Please refer to our earlier tip How to Return a Result Set from a SQL Server 2005 CLR Stored Procedure for the steps you need to perform to enable the CLR on your SQL Server instance.

Now let's talk about what we are going to demonstrate in the code that follows. We need a TVF that returns a result set of the SharePoint lists for a particular site. It's a good idea to refer to a SharePoint list using it's Name which is a GUID. The textual name of a SharePoint list is the Title. Users can change the Title but the GUID remains unchanged. We need a second TVF to retrieve the contents of a particular SharePoint list.

We will walk through the following steps in our code example:

Setup the web service proxy class
Review the C# code sample for the CLR functions
Generate a static serialization assembly
Deploy the CLR functions

Web Service Proxy Class

When you write .NET code that uses a web service, you invoke the web service methods via a proxy class. The proxy class has methods that mirror the web service; you call the methods on the proxy class and the proxy class actually communicates with the web service. To create the proxy class you simply need the URL of the web service. In a Visual Studio project you can create the web service proxy by right clicking on the Web References node in the Solution Explorer. Alternatively you can create the web service proxy by running the WSDL utility which is a command line tool. In a Visual Studio project you fill in the following dialog:

Note that the URL for your web service will be different than the one shown above which exists in a Virtual PC. You can specify anything you like for the Web reference name.

The command to create the web service proxy using the WSDL utility is as follows:

WSDL /o:ListsService.cs /n:WSS http://192.168.2.66/_vti_bin/Lists.asmx

The output from WSDL is a class file that you manually add to your Visual Studio project or include in your compile. The /n option specifies the namespace for the generated code. The advantage of using WSDL is that you can fine tune the generation of the proxy code; there are quite a few command line options available. Whether you use Visual Studio or WSDL, you wind up with the proxy class.

You can find WSDL.EXE in the folder C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin (assuming Visual Studio 2005).

CLR Code Sample

The following C# function (and one that follows it) will return a result set of the SharePoint lists in a particular site:

    [SqlFunction(SystemDataAccess = SystemDataAccessKind.Read,
                FillRowMethodName = "GetListInfo")]
   public static IEnumerable GetListCollection(SqlString url)
   {
       DataTable t = new DataTable();
       WindowsImpersonationContext ctx = null;
       WindowsIdentity id = SqlContext.WindowsIdentity;
       try
       {
           // impersonate the caller
           ctx = id.Impersonate();
           // create instance of web service proxy class
           WSS.Lists svc = new WSS.Lists();
           svc.Url = url.ToString();
           svc.Credentials = CredentialCache.DefaultNetworkCredentials;
           // call web service method; return as a DataTable
           XmlNode node = svc.GetListCollection();
           XmlTextReader rdr = new XmlTextReader(node.OuterXml,
                                       XmlNodeType.Element, null);
           DataSet ds = new DataSet();
           ds.ReadXml(rdr);
           t = ds.Tables[0];
       }
       finally
       {
           // undo the impersonation
           if (ctx != null)
               ctx.Undo();
       }
       return t.Rows;

The main points in the above code are:

The [SqlFunction] which precedes the actual function is a .NET attribute. It is used to specify metadata about the .NET object which it adorns; in this case the class. The [SqlFunction] attribute is used for a CLR function. The SystemDataAccess property is required because we will be impersonating the caller and this requires access to system tables. The FillRowMethodName property is the name of a companion function (to be discussed below); this function actually returns the result set a row at a time.
A CLR TVF must return an object that implements the IEnumerable interface. This is a standard interface implemented by .NET collection classes.
The parameter to the function is the URL of the Lists service for a site; e.g. http://servername/_vti_bin/Lists.asmx.
DataTable is a class in the .NET framework that provides an in-memory representation of a table or result set. It's base class implements the IEnumerable interface required to be returned by the function.
The SqlContext object is automatically available in a CLR function; we use it to get the WindowsIdentity of the caller.
The default behavior when accessing resources outside of SQL Server is to use the credentials of the SQL Server service; in many cases this isn't appropriate as the SQL Server service's account has the bare minimum permissions required. In many cases you will need to impersonate the caller who then must have the appropriate permissions.
After creating the instance of the web service proxy class, we set the URL based on the value passed to the function and specify to use the current credentials which are now the caller's after the impersonation code is executed.
We call the web service method which returns an XML node object.
We transform the XML into a Data Table via the DataSet object, which is a standard class in the .NET framework which implements an in-memory copy of a collection of tables or result sets.
In the finally block we undo the impersonation, reverting back to the credentials of the SQL Server service account.
In the last line of code we return the collection of rows in the DataTable.

The second part of our code sample is the GetListInfo function which was specified in the SqlFunction attribute in the FillRowMethodName property:

    public static void GetListInfo(
                       object obj,
                       out SqlString name,
                       out SqlString title,
                       out SqlString url )
   {
       DataRow r = (DataRow)obj;
       name = new SqlString(r["Name"].ToString());
       title = new SqlString(r["Title"].ToString());
       url = new SqlString(r["DefaultViewUrl"].ToString());
   }

The main points in the above code are:

This function is called behind the scenes; we will connect the GetListCollection function above to the TVF when we deploy (in the next section). The GetListCollection function retrieves the data to be returned by the TVF; this function is called to return the result set one row at a time.
The object parameter value is actually a DataRow; the GetListCollection function returned a collection of DataRow objects.
We use the SqlString object instead of the standard .NET String class.
Each column in the result set is returned as an out parameter in the function.

The above functions should be implemented in a single source file (e.g. SharePointList.cs) and compiled to create a class library DLL. The sample code is available here. You can use a Visual Studio project to create the class library DLL or compile with the following command:

CSC /target:library /out:MSSQLTipsCLRLib.dll *.cs

You can find CSC.EXE in the folder C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727 (assuming v2.0 of the .NET framework).

Generate a Static Serialization Assembly

When writing .NET code that accesses a web service, there are some behind the scenes things that happen to dynamically generate code used to pass data to and from the web service. This is called serialization and just think of it as a special packaging of data to and from the web service. When calling a web service from a CLR function, the SQL Server CLR implementation doesn't allow this dynamic serialization. The work around is to use the SGEN utility to create a static serialization assembly (i.e. DLL). The command line for SGEN is as follows (assuming we compiled our earlier functions into a class library called MSSQLTipsCLRLib):

SGEN /a:MSSQLTipsCLRLib.dll

SGEN creates a DLL; in this case MSSQLTipsCLRLib.XmlSerializers.dll. Both MSSQLTipsCLRLib.dll and MSSQLTipsCLRLib.XmlSerializers.dll must be deployed to SQL Server, which we will do in the next section.

You can find SGEN.EXE in the folder C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin (assuming Visual Studio 2005).

Deploy the CLR Functions to SQL Server

Execute the following T-SQL script to deploy the DLLs to SQL Server and create the CLR function (create the mssqltips database if necessary or substitute an existing database name):

ALTER DATABASE mssqltips
SET TRUSTWORTHY ON
GO
USE mssqltips
GO
CREATE ASSEMBLY MSSQLTipsCLRLib
FROM 'C:\mssqltips\MSSQLTipsCLRLib.dll'
WITH PERMISSION_SET = UNSAFE
GO
CREATE ASSEMBLY [MSSQLTipsCLRLib.XmlSerializers]
FROM 'C:\mssqltips\MSSQLTipsCLRLib.XmlSerializers.dll'
WITH PERMISSION_SET = SAFE
GO
CREATE FUNCTION fn_GetSharePointLists(@url nvarchar(256))
RETURNS TABLE
(
name nvarchar(256),
title nvarchar(256),
url nvarchar(256)
)
AS
EXTERNAL NAME MSSQLTipsCLRLib.SharePointList.GetListCollection
GO

The TRUSTWORTHY option is set on to allow the external access outside of SQL Server; the default is SAFE. CREATE ASSEMBLY deploys the DLL to SQL Server. The FROM clause must point to the actual path of the DLL. PERMISSION_SET must be set to UNSAFE because of using certain .NET serialization attributes associated with the web service. Finally the EXTERNAL NAME of the CREATE FUNCTION creates the TVF and associates it to the assembly, class and function. When you include the TVF in the FROM clause of a SELECT statement, the .NET code is run and it returns a result set.

To execute the TVF, execute the following script (substitute the URL to your SharePoint site):

SELECT * FROM dbo.GetListCollection 'http://192.168.2.66/_vti_bin/Lists.asmx'

You will see output that looks something like this:

The second function that was discussed was a TVF that would retrieve the items from a SharePoint list and return them as a result set. The code is practically the same as above, but calls a different web service method to retrieve the list of items from a Calendar. You can review the code in the download here. You also have to execute a T-SQL command to create the function; it is in the SQL script file in the download.

To execute the TVF, execute the following script:

SELECT *
FROM dbo.fn_GetSharePointCalendar(
          'http://192.168.2.66/_vti_bin/Lists.asmx',
          '{A67891C7-7690-43F9-B115-DC419888080A}')

The second parameter is the list name which is a GUID; you get this value by executing the fn_GetSharePointLists function. You will see output that looks something like this (only two columns are returned):

Thursday, July 3, 2008

MSSQL: Avoid Untrusted Constraints in SQL Server

Problem
Some time ago, I loaded a large set of data into one my tables. To speed up the load, I disabled the FOREIGN KEY and CHECK constraints on the table and then re-enabled them after the load was complete. I am now finding that some of the loaded data was referentially invalid. What happened?

Solution
Disabling constraint checking for FOREIGN KEYS and CHECK constraints is accomplished using the ALTER TABLE statement and specifying a NOCHECK on the constraint. However, when re-enabling constraints, I've seen many instances where the DBA would re-enable the constraints by specifying CHECK but forget (or not know) to ask the SQL Server engine to re-verify the relationships by additionally specifying WITH CHECK. By specifying CHECK but not additionally specifying WITH CHECK, the SQL Server engine will enable the constraint but not verify that referential integrity is intact. Furthermore, when a FOREIGN KEY or CHECK constraint is disabled, SQL Server will internally flag the constraint as not being "trustworthy". This can cause the optimizer to not consider the constraint relationship when trying to generate the most optimal query plans.

In a Management Studio connection, run the following script to create a parent table called EMPLOYEE and a child table called TIMECARD. TIMECARD rows may only exist provided that the EMPLOYEE row exists

SET NOCOUNT ON
GO

CREATE TABLE DBO.EMPLOYEE
(
EMPLOYEEID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
FIRSTNAME VARCHAR(50) NOT NULL,
LASTNAME VARCHAR(50) NOT NULL
)
GO

CREATE TABLE DBO.TIMECARD
(
TIMECARDID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
EMPLOYEEID INT NOT NULL,
HOURSWORKED TINYINT NOT NULL,
DATEWORKED DATETIME NOT NULL
)
GO

-- Only allow valid employees to have a timecard
ALTER TABLE DBO.TIMECARD
ADD CONSTRAINT FK_TIMECARD_EMPLOYEEID FOREIGN KEY (EMPLOYEEID)
REFERENCES DBO.EMPLOYEE(EMPLOYEEID)
ON DELETE CASCADE
GO

INSERT INTO DBO.EMPLOYEE (FIRSTNAME, LASTNAME)
SELECT 'JOHN', 'DOE'
GO

INSERT INTO DBO.TIMECARD (EMPLOYEEID, HOURSWORKED, DATEWORKED)
SELECT 1, 8, '2008-01-01'
GO

Now run the following query (valid for both SQL Server 2000 and 2005) to check the foreign key constraint's trustworthiness

SELECT CASE WHEN OBJECTPROPERTY(OBJECT_ID('FK_TIMECARD_EMPLOYEEID'), 'CnstIsNotTrusted') = 1 THEN 'NO' ELSE 'YES' END AS 'IsTrustWorthy?'
GO

You will see that the SQL Server engine considers the foreign key constraint as referentially valid

Now we'll load some additional data but we'll disable the constraint prior to load. Once the rows are loaded, we'll re-enable the constraint. Typically, a DBA would do this when loading large amounts of data in order to speed up the load. For illustrative purposes, we'll just load 3 new rows.

ALTER TABLE DBO.TIMECARD NOCHECK CONSTRAINT FK_TIMECARD_EMPLOYEEID
GO

INSERT INTO DBO.TIMECARD (EMPLOYEEID, HOURSWORKED, DATEWORKED)
SELECT 1, 8, '2008-01-02'
GO
INSERT INTO DBO.TIMECARD (EMPLOYEEID, HOURSWORKED, DATEWORKED)
SELECT 1, 8, '2008-01-03'
GO
INSERT INTO DBO.TIMECARD (EMPLOYEEID, HOURSWORKED, DATEWORKED)
SELECT 2, 8, '2008-01-04'
GO

ALTER TABLE DBO.TIMECARD CHECK CONSTRAINT FK_TIMECARD_EMPLOYEEID
GO

If we now re-examine the constraint's trustworthiness and you will see that the SQL Server engine now does not believe the constraint is trustworthy enough to use in execution plan generation.

Wait a minute! If you didn't notice, I have an error in my script! The 3rd row I added is for EMPLOYEEID = 2. However, this EMPLOYEEID does not exist in my table! This is because the CHECK issued when the constraint was re-enabled strictly controls re-enabling the constraint; it does not verify that referential integrity is intact. Running DBCC CHECKCONSTRAINTS(TIMECARD) verifies that referential integrity has been violated:

I knew that there was an error in my script. But what if I didn't? This integrity violation would lurk in my database until TIMECARD totals were generated for month-end payroll and the employee's paycheck would be shortchanged by 8 working hours. I can just imagine his displeasure with the Payroll Department for this mistake and Payroll's displeasure with me for letting this squeak through.

This is why it's imperative that you re-enable your constraints by additionally specifying the little known WITH CHECK option. Without specifying WITH CHECK, the SQL Server engine will enable your constraints without checking referential integrity (and as we see, will consider the constraints untrustworthy for query optimization). In the example, if we re-enabled the constraint using the following statement, we would've seen up front that there was an integrity issue with the loaded data.

ALTER TABLE DBO.TIMECARD WITH CHECK CHECK CONSTRAINT FK_TIMECARD_EMPLOYEEID
GO

If we fix the erroneous row and then re-enable the constraint correctly, we see that that no referential integrity violation is detected and the constraint can now be trusted by the SQL Server optimizer.

UPDATE DBO.TIMECARD SET EMPLOYEEID = 1 WHERE EMPLOYEEID = 2
GO
ALTER TABLE DBO.TIMECARD WITH CHECK CHECK CONSTRAINT FK_TIMECARD_EMPLOYEEID
GO
SELECT CASE WHEN OBJECTPROPERTY(OBJECT_ID('FK_TIMECARD_EMPLOYEEID'), 'CnstIsNotTrusted') = 1 THEN 'NO' ELSE 'YES' END AS
'IsTrustWorthy?'
GO

If you're worried about having untrusted constraints in your database, you can check using the following query (this can be run on both SQL Server 2000 and 2005 databases)

select table_name, constraint_name
from information_schema.table_constraints
where (constraint_type = 'FOREIGN KEY' or constraint_type = 'CHECK')
and objectproperty(object_id(constraint_name), 'CnstIsNotTrusted') = 1
go

Thursday, April 3, 2008

Installing SQL Server 2005 Performance Dashboard Reports

Problem
I am trying to leverage the new SQL Server 2005 DMVs for performance analysis. While they're very useful, it would be nice to be able to see graphical representations of the data collected by the DMVs. Is there a graphical utility within SQL Server I can use as well?

Solution
In SQL Server 2005 Service Pack 2, Microsoft introduced a new feature that affords DBAs the ability to run Custom Reports. This feature allows DBAs to run their own Reporting Services custom reports from within the SQL Server Management Studio. To leverage this feature, Microsoft has released a Performance Dashboard that you can download and use to identify performance issues in your database server. The dashboard abstracts the data that is collected from the SQL Server 2005 dynamic management views (DMVs). However, it should not be used as a substitute for learning the various DMVs and their purpose. Note that the custom reporting feature is only available as of Service Pack 2 of SQL Server 2005. Both your database server and the Management Studio must be at Service Pack level 2 or greater. You don't need Reporting Services installed to run the dashboard.

Open a Management Studio session and set your focus to the Object Explorer pane. Right clicking on any node will display a Reports option which leads to the Custom Reports option.

You can download the Performance Dashboard from the following link

http://www.microsoft.com/downloads/details.aspx?FamilyId=1d3a4a0d-7e0c-4730-8204-e419218c1efc&displaylang=en

The following is from the web-site in the above link:

Common performance problems that the dashboard reports may help to resolve include:

CPU bottlenecks (and what queries are consuming the most CPU)
IO bottlenecks (and what queries are performing the most IO)
Index recommendations generated by the query optimizer (missing indexes)
Blocking
Latch contention

The name of the download file from the above link is DownloadSQLServer2005_PerformanceDashboard.msi. It is a Windows installer file that, when executed, will create a folder named PerformanceDashboard on your server. If you choose the default location, the installer will place the files in the following default location on the server: C:\Program Files\Microsoft SQL Server\90\Tools\PerformanceDashboard.

Double clicking the installer, we're presented with the Welcome screen. Click Next.

We're presented with License Agreement screen. Enter your license information and click Next.

We're presented with a registration screen. Enter your registration information and click Next.

Accept the default features by clicking Next

We're now ready to install. Click Install

Installation is now complete!

Now that the Dashboard has been installed, we need to set our server up to run it. Open a new query window within SQL Server Management Studio. Navigate to the location to where you chose to install the dashboard files (the default is C:\Program Files\Microsoft SQL Server\90\Tools\PerformanceDashboard). Execute script setup.sql. This script will connect to the msdb database and install everything needed to run custom reports.

Once you've run the set up script, you're ready to run the custom report. Right click on any node in the Object Explorer and click on Custom Reports...

A file search dialog box will open. Navigate to the folder where you installed the dashboard files. The dashboard report file is performance_dashboard_main.rdl.

Once you open the .rdl file via the dialog, you're presented with the following warning screen.

Clicking Run in the warning dialog, we're presented with the report!

Saturday, February 9, 2008

SQL Server Performance Tuning Interview Questions

In the latest installment of the SQL Server interview questions, we will outline questions suitable for a DBA or Developer interview to assess the candidates skills related to SQL Server performance tuning. In this tip, the questions are there to read, but the answers are intentionally hidden to really test your skills. Once you read the question and have determined your answer, then highlight the answer to see how you did. Good luck!

Solution

Question Difficulty = Easy

Question 1: Name five different tools which can be used for performance tuning and their associated purpose.

Performance Monitor\System Monitor - Tool to capture macro level performance metrics.
Profiler - Tool to capture micro level performance metrics based on the statements issued by a login, against a database or from host name.
Server Side Trace - System objects to write the detailed statement metrics to a table or file, similar to Profiler.
- Additional information: SQL Server Performance Statistics Using a Server Side Trace
Dynamic Management Views and Functions - SQL Server objects with low metrics to provide insight into a specific portion of SQL Server i.e. the database engine, query plans, Service Broker, etc.
- Additional information: Dynamic Management Views\Functions
Management Studio's Built-In Performance Reports - Ability to capture point in time metrics as pre-defined by Microsoft.
- Additional information: Built-In Performance Reports in SQL Server 2005
Custom scripts - Custom scripts can be developed to monitor performance, determine IO usage, monitor fragmentation, etc all in an effort to improve performance.
Third party applications - Performance monitoring and tuning applications from vendors in the SQL Server community.
- Additional information: SQL Server Performance Monitoring Tools

Question 2: Explain how the hardware running SQL Server can help or hinder performance.

Taxed CPUs will queue requests and hinder query performance.
Insufficient memory could cause paging resulting in slow query performance.
Incorrectly configured disk drives could exacerbate IO problems.
Additional information: Hard Drive Configurations for SQL Server and Hardware 101 for DBAs

Question 3: Why is it important to avoid functions in the WHERE clause?

Because SQL Server will scan the index or the table as opposed to seeking the data. The scan is a much costly operation than a seek.
Often a slightly different approach can be used to prevent using the function in the WHERE clause yielding a favorable query plan and high performance.
Additional information: Performance Tip: Avoid functions in WHERE clause

Question 4: How is it possible to capture the IO and time statistics for your queries?

Use the SET STATISTICS IO and SET STATISTICS TIME settings in your queries or enable the settings in your Management Studio session.
Additional information: Getting IO and time stats for your queries

Question 5: True or False - It is possible to correlate the Performance Monitor metrics with Profiler data in a single SQL Server 2005 native product?

True - This functionality is possible with SQL Server Profiler.
Additional information: Correlating Performance Monitor and Trace Data

Question Difficulty = Moderate

Question 1: How can I/O statistics be gathered and reviewed for individual database files?

By using the fn_virtualfilestats function to capture the metrics.
This process can be automated with a script to determine the file usage with numerous samples.
Additional Information: Gathering I/O statistics down to the database file level

Question 2: What is a query plan and what is the value from a performance tuning perspective?

A query plan is the physical break down of the code being passed to the SQL Server optimizer.
The value from a performance tuning perspective is that each component of the query can be understood and the percentage of resource utilization can be determined at a micro level. As query tuning is being conducted, the detailed metrics can be reviewed to compare the individual coding techniques to determine the best alternative.
Additional Information: Maximizing your view into SQL Query Plans and Capturing Graphical Query Plans with Profiler

Question 3: True or False - It is beneficial to configure TempDB with an equal number of fixed sized files as the number of CPU cores.

True.

Question 4: Explain the NOLOCK optimizer hint and some pros\cons of using the hint.

The NOLOCK query hint allows SQL Server to ignore the normal locks that are placed and held for a transaction allowing the query to complete without having to wait for the first transaction to finish and therefore release the locks.
This is one short term fix to help prevent locking, blocking or deadlocks.
However, when the NOLOCK hint is used, dirty data is read which can compromise the results returned to the user.
Additional information: Getting rid of some blocking issues with NOLOCK

Question 5: Explain three different approaches to capture a query plan.

SHOWPLAN_TEXT
SHOWPLAN_ALL
Graphical Query Plan
sys.dm_exec_query_optimizer_info
sys.dm_exec_query_plan
sys.dm_exec_query_stats

Question Difficulty = Advanced

Question 1: True or False - A LEFT OUTER JOIN is always faster than a NOT EXISTS statement.

False - With always being the operative word. Depending on the situation the OUTER JOIN may or may not be faster than a NOT EXISTS statement. It is necessary to test the techniques, review the query plans and tune the queries accordingly.

Question 2: Name three different options to capture the input (code) for a query in SQL Server 2005.

DBCC INPUTBUFFER
fn_get_sql
sys.dm_exec_sql_text
Additional information: SQL Server statements currently running with fn_get_sql and SQL Server 2000 to 2005 Crosswalk - Code Identification

Question 3: Explain why the NOCOMPUTE option of UPDATE STATISTICS is used.

This command is used on a per table basis to prevent the table from having statistics automatically updated based on the 'Auto Update Statistics' database configuration.
Taking this step will prevent UPDATE STATISTICS from running during an unexpected time of the day and cause performance problems.
By setting this configuration it is necessary to manually UPDATE STATISTICS on a regular basis.
Additional information: The NORECOMPUTE option of UPDATE STATISTICS

Question 4: Explain a SQL Server deadlock, how a deadlock can be identified, how it is a performance problem and some techniques to correct deadlocks.

A deadlock is a situation where 2 spids have data locked and cannot release their lock until the opposing spid releases their lock. Depending on the severity of the deadlock, meaning the amount of data that is locked and the number of spids that are trying to access the same data, an entire chain of spids can have locks and cause a number of deadlocks, resulting in a performance issue.
Deadlocks can be identified by Profiler in either textual, graphical or XML format.
- Additional information: Capturing Deadlock Information in XML Format and How To: Graphical Deadlock Chain
Deadlocks are a performance problem because they can prevent 2 or more processes from being able to process data. A deadlock chain can occur and impact hundreds of spids based on the data access patterns, number of users, object dependencies, etc.
Deadlocks could require a database design change, T-SQL coding change to access the objects in the same order, separating reporting and OLTP applications, including NOLOCK statements in SELECT queries that can accept dirty data, etc.

Question 5: Please explain why SQL Server does not select the same query plan every time for the same code (with different parameters) and how SQL Server can be forced to use a specific query plan.

The query plan is chosen based on the parameters and code being issued to the SQL Server optimizer. Unfortunately, a slightly different query plan can cause the query to execute much longer and use more resources than another query with exactly the same code and only parameter differences.
The OPTIMIZE FOR hint can be used to specify what parameter value we want SQL Server to use when creating the execution plan. This is a SQL Server 2005 hint.
Additional information: Optimize Parameter Driven Queries with the OPTIMIZE FOR Hint

Tuesday, February 5, 2008

Comodo Firewall 3.0.16.295 - Complete Security Against Internet Attacks

Comodo claim that their firewall is unique in that it passes all known leak tests to ensure the integrity of data entering and exiting your system. Comodo has put firewall through all kinds of sophisticated tests to ensure its firewall powerful enough to ward off these attacks with default settings. No other firewall has had to work this hard. It is an award winning superb utility.

PC Magazine Online's Editor's Choice
Secures against internal and external attacks
Blocks internet access to malicious Trojan programs
Safeguards your Personal data against theft
Delivers total end-point security for
Personal Computers and Networks

Key features of "Comodo Personal Firewall":

· Application Component Authentication - validates all the components of an application before allowing it internet access.
· Application Behavior Analysis - analyzes each application behavior and detects any suspicious activity before allowing internet access.
· Defense against Trojan Protocols - advanced protocol driver level protection
· Smart Alerts - Every alert includes a Security Consideration section with advice to users.
· Windows Security Center Integration - Windows XP SP2 recognizes Comodo Firewall Pro
· Self Protection against Critical Process Termination - cannot be shut down by Trojans, Spyware or viruses.
· PC Security during PC Start Up - includes the option to secure the host while the operating system is booting.
· Automatic Updater - includes an interactive automatic updater component so that users can check for updates any time.
· Error Reporting Interface - includes an XP style bug reporting interface.
· Firewall Logging - reveals all the activities with detailed descriptions of the events.
· Security Rules Interface - combines with an easy to use GUI.
· Application Activity Control - watches each application in detail by showing addresses, ports and amount of traffic.
· Graphical User Interface - enables or disables any part of the firewall with one click
· Application Recognition Database - recognizes over 10000 applications and determines their security risks.

Chang log:

NEW! Anti-Leak Configuration:

- A new default configuration is introduced to make D+ show fewer number of popup alerts while still remaning leak proof.

NEW! On-Demand Virus Scanning:

- CFP now provides an option to scan for viruses during the installation and from its graphical user interface

NEW! A-VSMART Warranty Program:

- CFP now provides the users an option to enroll one of the available A-VSMART Warranty programs

IMPROVED! Self-Defense:
- There has been various reports that CFP 3.0 is attacked by some malware to disable its protection.
The self defense has been modified such that an ungraceful termination of CFP will block every unknown action (i.e. it will function as if "Block all unknown actions if the application is closed" option is selected. This option was not enabled by default).

IMPROVED! Default Configuration:
- Default configuration now protects more registry keys and more COM interfaces.
- Default Web Browser and FTP Client policies are modified to support passive FTP requests

IMPROVED! Handling of known code executing applications:
- Defense+ has been modified such that some known code executing programs such as rundll32.exe or windows scripting host are not autimatically trusted anymore.

IMPROVED! Pending Files:
- Defense+ has been modified such that it is not going to report any pending files if it is not in clean PC mode.

FIXED! Bugs in Defense+ Engine:
- Fixed numerous bugs that could stop Defense+ to properly handle the suspicious actions(e.g. bugs in registry and file protection, key logging etc).

- Fixed the bug that could prevent CFP from functioning properly in certain types of hardware configurations( e.g. when a USB harddisk is present etc.).

FIXED! Minor Bugs in the Graphical User Interface

Requirements:

Windows XP/2003/Vista [32 bit

Download Comodo Firewall 3.0.16.295 : http://www.sendspac e.com/file/ dmyfzw