Apr
9
2009

SSIS and clustering: What you should do instead

Lots of customers ask about configuring SQL Server Integration Services in a failover cluster. We recommend that you DON'T configure SSIS as a cluster resource. There are almost no benefits to doing so, and you can gain many of the benefits that you want by simple configuration changes. By editing the configuration file for the SSIS service on each node of a cluster, you can manage the packages on any node from any other node. For more information, please see the Books Online topic, Configuring Integration Services in a Clustered Environment.

Today, Microsoft Senior Premier Field Engineer Steve Howard provided these additional details about the recommendations that he makes to customers who ask about clustering. Thanks, Steve, for permission to share:


I agree that restarting a running package automatically would be neat, but this would be different from other cluster-aware technologies (and other failover technologies) that we have. For example, if a failover happens with SQL Server, the queries do not restart, and other jobs running in Agent do not restart. I suppose it would be possible to write a job to check for jobs that did not complete successfully and restart those jobs, then schedule the that job to run at startup. That sounds feasible, but I have never done that.

What I’m describing is supported out of the box. It is really the same process that you must go through to manage package on a standalone machine with multiple instances (or even just one named instance). I find this question to be the most common question that customers have when I go onsite. Customers usually just do not understand the function of the SSIS service. When I explain it to them, and we work through it together, they are satisfied. I’ll explain here what I go through with customers, and students in the SSIS workshop.

In the installation folder, they will find the configuration file. For SQL 2005, by default, this path is: C:\Program Files\Microsoft SQL Server\90\DTS\Binn and for SQL 2008, this is C:\Program Files\Microsoft SQL Server\100\DTS\Binn. In either case, the name of the file is MsDtsSrvr.ini.xml. When first installed, this file will look like this:

 <?xml version="1.0" encoding="utf-8"?>

<DtsServiceConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

  <StopExecutingPackagesOnShutdown>true</StopExecutingPackagesOnShutdown>

  <TopLevelFolders>

    <Folder xsi:type="SqlServerFolder">

      <Name>MSDB</Name>

      <ServerName>.</ServerName>

    </Folder>

    <Folder xsi:type="FileSystemFolder">

      <Name>File System</Name>

      <StorePath>..\Packages</StorePath>

    </Folder>

  </TopLevelFolders> 

</DtsServiceConfiguration>

(I just use notepad for this task, but some people prefer to use XML notepad or some XML editor like that.)

In the top level folder, the servername is ".", which means it will only connect to the default instance on the local machine (local to where the service is running). So when I connect to that SSIS service, I can only see the default instance on the machine where the SSIS service is running. Everything here is relative to where the service is running. (I tell students that it is the center of management for SSIS). I can connect to this machine with Management Studio on any machine, but with this configuration, I will only see the default instance running on the machine where the SSIS service I connected to is running.

If I have multiple instances on this machine, I need to add top-level folders so I can manage all the instances installed on this machine from this SSIS service. (I’ll get to clusters in a moment). Let’s say that I have both a SQL 2005 instance and a SQL 2008 instance on this machine. Then in the SQL 2008 SSIS MsDtsSrvr.ini.xml, I need to set it up to manage these instances. (I cannot manage SQL 2008 instances from SQL 2005 SSIS, so I must configure the SQL 2008 SSIS to be able to manage both from one service.) In that case, I would add the top-level folders with names that let me distinguish among the servers where I am managing packages:

<?xml version="1.0" encoding="utf-8"?>

<DtsServiceConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

  <StopExecutingPackagesOnShutdown>true</StopExecutingPackagesOnShutdown>

  <TopLevelFolders>

    <Folder xsi:type="SqlServerFolder">

      <Name>SQL 2008 MSDB</Name>

      <ServerName>.\SQL2K8</ServerName>

    </Folder>

    <Folder xsi:type="SqlServerFolder">

      <Name>SQL 2005 MSDB</Name>

      <ServerName>.</ServerName>

    </Folder>

    <Folder xsi:type="FileSystemFolder">

      <Name>File System</Name>

      <StorePath>..\Packages</StorePath>

    </Folder>

  </TopLevelFolders>

</DtsServiceConfiguration>

So, I have added one folder that is named “SQL 2008 MSDB” and that points to the named instance SQL2k8 on the local machine. The other folder is named “SQL 2005 MSDB” and that points to the default instance on the local machine. When I make this edit, restart the SSIS service so it will read the modified configuration file, then connect to this SSIS instance, I can now see both servers and manage packages on both:

So now, I can see running packages on either server, I can import, export, or manually start the packages. But none of this is really necessary to be able to design, install, or run those packages. The Service is just for convenience for managing this.

So now, let’s take this concept to a cluster. For our cluster, let’s have 4 nodes names Node1, Node2, Node3, and Node4. On this, let’s install 4 instances of SQL in 4 separate resource groups. Let’s use the network names net1, net2, net3, and net4, and let’s install instances InstanceA, InstanceB, InstanceC, and InstanceD on those net names respectively so that the full names of our instances will be net1\InstanceA; net2\InstanceB; net3\InstanceC, and net4\InstanceD. Any of the 4 nodes can host any of the instances in our setup.

To be able to manage packages on any of those instances, you are going to have to modify your config file. To be able to manage packages on all 4 instances from any one machine, we would make modifications like I did above so that the config file will now look like this:

<?xml version="1.0" encoding="utf-8"?>

<DtsServiceConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

  <StopExecutingPackagesOnShutdown>true</StopExecutingPackagesOnShutdown>

  <TopLevelFolders>

    <Folder xsi:type="SqlServerFolder">

      <Name>InstanceA MSDB</Name>

      <ServerName>net1\InstanceA</ServerName>

    </Folder>

    <Folder xsi:type="SqlServerFolder">

      <Name>InstanceB MSDB</Name>

      <ServerName>net2\InstanceB</ServerName>

    </Folder>

    <Folder xsi:type="SqlServerFolder">

      <Name>InstanceC MSDB</Name>

      <ServerName>net3\InstanceC</ServerName>

    </Folder>

    <Folder xsi:type="SqlServerFolder">

      <Name>InstanceD MSDB</Name>

      <ServerName>net4\InstanceD</ServerName>

    </Folder>

    <Folder xsi:type="FileSystemFolder">

      <Name>File System</Name>

      <StorePath>..\Packages</StorePath>

    </Folder>

  </TopLevelFolders>

</DtsServiceConfiguration>

So now, whatever machine I put that config file onto will see and be able to manage packages on those 4 machines, just as in the screenshot above, I can see the packages and manage them on those two instances. If I put this on node1, then if I connect to node1, I can manage all of them from that machine. But just having it on one node will be a bit of a pain. So, once I have this configured, and I have tested to make sure it will see all the instances where I want to manage packages, I just copy the MsDtsSrvr.ini.xml file into place on node2, node3, and node4 (if I have installed the SSIS service on those nodes). Now, I can connect to SSIS on any of those nodes.

Most DBAs don’t care what the node names are, but they know the network names of their SQL Server instances very well. In that cluster configuration we described, these network names resolve to IP addresses that move with the SQL Server instance when it fails over. So from Management Studio on the DBA's workstation, he can connect to the SSIS service on net1 and see all 4 instances on his cluster. If it fails over, and he still wants to connect to SSIS to manage packages on any of the 4 nodes on that cluster, he could connect to net1, and it would connect to the SSIS service running on the node where Net1\InstanceA is now hosted, and he will still see the same thing – he doesn’t know or care that he is now connected to the SSIS service on a different node. If he wanted to, he could even specify the cluster name (instead of any of the SQL network names) and still connect to an SSIS service and still see the same set of folders.

In some environments, the DBA has one server that is his/hers where they set up their management tools. The SSIS configuration that we have allows the DBA to be able to configure the XML file on that one machine to see and manage packages on all instances and machines that they manage by connecting to a single SSIS service. He/she just needs to configure the XML file on that one machine.

Where I see confusion/frustration from customers is that they think of Management Studio as the center of their management tools. With SSIS, it is the SSIS service that is the center of the management tools. Customers, before education, think of the SSIS service as running the packages, but this is not the case. The SSIS service is for management. Management Studio gives them a graphical interface into the service, but the center of management for SSIS is the SSIS service.

If I have one complaint about this, it is that we do not really have a front end for customers so that they don’t have to manually edit the XML files. But really, that XML file is so simple that it is not difficult to edit with tools like Notepad or XML Notepad.

 And in that situation, what have we gained if we cluster the SSIS service? 


The preceding information is presented here for exactly what it is: the educated opinion of an experienced Microsoft field engineer.

What many corporate customers are really looking for, presumably, is high availability for ETL processes, especially long-running processes. Except for its support for transactions, and its ability to restart from checkpoints after failure, SSIS out of the box doesn't currently have a complete answer for HA concerns.


Please comment on my blog!

You don't have to sign up or sign in - anonymous comments are enabled. I "moderate" the comments to delete spam before you see it, but otherwise, I publish your comments immediately. You can also email me at dougbert@dougbert.com. Thank you!

-Doug

Comments (10) -

Matthew

Great post!


This is one of those perennial SSIS questions, and one for which I've never seen a better or more comprehensive answer.


dougbert

What does Steve mean when he says that the SSIS service is the "center of management", and not SSMS? Here's his explanation:


On the point about the SSIS Service being the center of management vs. what our marketing documentation says, we’re really quibbling on context. I’m talking about the perspective for setting up to manage, and the purpose of the SSIS service. The DBA will probably still use Management Studio as the front end to manage packages, so once the SSIS service is configured, Management studio becomes the “center” of their attention, and the tool they use to manage. But when I say that SSIS is the center of management, I mean that everything that you see in SSIS in Management Studio is being seen from the perspective of the SSIS service you are connected to, where it is running, and what it is configured to see, and not from the perspective of where Management Studio is installed, or where the packages are installed.


Todd McDermid

Great article/interview Douglas.


Even though I still don't understand clustering - I do now understand enough about it and how it works with SSIS to make some sense of it.  Perhaps even more valuable was the information about the role of the SSIS service.  That is one of the more misunderstood aspects of SSIS - as everyone expects that "if there is a service, it must be running the packages".


FrankI

First let me say this was a great explanation on SSIS.  I do have a question as I have tried your solution where you add multiple instances but when I add SSIS package to a certain folder, the second instance folder shows that same package, even though I orginally imported the SSIS package on the first instance, can you explain to me what I am seeing here?  


Thanks


Frank I


dougbert

I have responded to Frank offline, and encouraged him to post his question in the SSIS forum. I am not an expert in clustering! Meanwhile I'm also checking with a couple friends at Microsoft. I will post the explanation when it's received.


-Doug


Sneha

Very good Post. Saved lot of my time, got better understanding of how it really works. Thanks a bunch!


wabis

Without Changing MsDtsSrvr.ini.xml file i'm able to execute SSIS in both the nodes , did failover too..not sure why didn't i need to change the .ini file..any idea?

dougbert

In reply to the comment from Wabis dated 8/16/2012, I received the following remarks from a colleague who's more knowledgeable about SSIS and clustering:

"Remember that neither the SSIS runtime nor the SSIS service is clustered. What's failing over is most likely the SQL Server Database Engine (unless you've clustered the service, which is discouraged and unlikely).

"The INI file doesn't apply to the SSIS runtime. The INI file only affects the SSIS service, which is there for management purposes.

"Remember too that SQL Server itself isn't required for running SSIS packages. You don't need SQL Server (apart from possible licensing issues) to run packages - you only need the SSIS runtime. And those runtime components, which are all you need to run packages, are present on all the SQL nodes whether that node is currently the active SQL Server node or not."

Hope this helps.

-Doug

bala

Amazing post. Lot of thanks. I still have a doubt on
"The SSIS configuration that we have allows the DBA to be able to configure the XML file on that one machine to see and manage packages on all instances and machines that they manage by connecting to a single SSIS service"

1.Does this mean that i can modify the ini/xml file to manage not only the instances on 1 sql server/machine, but also on other sql servers?
2.By modifying the ini/xml file, we make the SSIS to work even during failover (and xml changes not only intended for managing packages). Is this correct?

dougbert

1. Yes, the service can manage remote instances. Please see this Books Online section; that is, this topic and the topics under it. msdn.microsoft.com/en-us/library/ms137731.aspx

2. It's important to be clear that SSIS does not fail over. If you have configured the SSIS Service on each node to let you see the packages on every other node, then even when a node fails, you can still access the packages on a different node.

Clustering is not my specialty and I also don't work daily with SSIS anymore. If you have more questions, I recommend you ask in the forum to reach a larger body of experts. social.msdn.microsoft.com/.../

Best wishes,

-Doug

Pingbacks and trackbacks (4)+