Tuesday, March 20, 2012

Parallel Development of SSIS packages

I have seen a number of posts regarding parallel development of SSIS packages and need some further information.

So far we have been developing SSIS packages along a single development stream and therefore have managed to avoid parallel development of our packages.

However, due to business pressures we will soon have multiple project streams running in parallel, and therefore multiple code branches, as part of that we will definitely need to redevelop the same SSIS packages in parallel. Judging from your post above and some testing we have done this is going to be a nightmare as we cannot merge the code. We can put in place processes to try and mitigate this but there are bound to be issues along the way.

Do you know whether this problem is going to be fixed? We are now using Team Foundation Server but presumably the merge algorythm used is same/similar to that of VSS and therefore very flaky?

However, not only are we having problems with the merging of the XML files, but we also use script tasks within the packages which are precompiled, as the DTSX files contain the binary objects associated with the script source code, if two developers change the same script task in isolated branches the binary is not recompiled as the merge software does not recognise this object.

Do you know whether these issues have been identified and are going to be fixed to be in line with the rest of Microsoft Configuration Managment principles of parallel development?

Many thanks.

[Microsoft follow-up] A question to be addressed here.|||

A. as long as no more then 1 developer works on the same dtsx package at a time there is no problem ( lock it in vss)

B. try to make the packages light, microsoft is not recommending heavy packages .

C. If you would like to merge codes from packages just copy all the elements from one package to the other ( use a container)

|||

In response to your points, I'm still not sure this is a satisfactory situtation for what is supposed to be an enterprise standard tool.

A - The problem we have is that we will have more than 1 developer working on the same package at the same time, and that those versions of packages will have to be deployed at different intervals. This is because of business priorities and is the normal development scenario on oither platforms such as .Net (C#) where you can have multiple branches with multiple developers working on the code. In these scenarios the Config Mgmt tool will wherever possible merge the source code automatically and raise conflicts wherever the code changes are in the same place to enable the Developer/Config Mgmt analyst to determine what cause of action to take. The problem we have here is the capability of the VSS or TFS to merge the DTSX packages is not accurate and our testing has proven that this sometimes works and more often doesn't.

Also say we deploy a package and that is running in production, as soon as it has gone live work on the next stream of development starts and the development team crack on making changes to the package. However, after a month in live a bug is identified in the package and an emergency fix is required to the DTSX package, the support team make this change to the live version to ensure that we fulfill the business needs as they cannot wait for the next development implementation for the fix. How do we merge that support change into the version of the package the development team are working on.... is this a manual process to physically open the development package as well and make the same changes in two places. Obviously this is very risky and introduces potential issues of developers knocking out support changes.

B - Wherever possible we have tried to keeping the packages light but specific ETL processes are always going to be quite intensive and include a lot of code, especially if working with a set of data within a pipeline.

C - When you mention merging the elements from one container to the other, how does this work if both version of packages also create new variables, add new connection managers, logging, checkpointing, error handlers or use the binary compiled objects from script tasks or even change the same script task.

I think the answer at the moment is that this cannot be done consistently and accurately but wanted to know whether Microsoft recognised this limitation of their product and how/if this would be addressed in future as the uptake and usage of SSIS increases, and this problem becomes more of an issue as development and support of ETL systems will be done in parallel.

Thanks.

|||

Hi Tom,

The SSIS team is aware of the issue, and your assessment is accurate. Right now SSIS packages essentially have to be treated as blobs, allowing only one person to work on them at a time because merging the XML is difficult. We have something in the works that should ease some of the problem, but we’re not sure if it will make it into the initial Katmai release at this point.

One alternative I’ve seen for projects/packages which change frequently is to switch to building the packages dynamically with code, instead of graphically in the UI. This makes the merging of changes much easier. However, the downside is that you lose the ease of use of the designer, and there can be a bit of learning curve for the object model.

Thanks,

~Matt

|||

Nice one Matt. Good to know that you guys are thinking about this. I'm interested to know what your " something in the works " is.

-Jamie

No comments:

Post a Comment