Tuesday, March 20, 2012

parallel execution

Hi
I have 3 sources as Flat files and the destination is a single table. Is it possible for running a package so that it is executed parallelly. I mean , all the 3 files should run at the same time and load it into the destination table.
If it is possible, please let me know how to do it

Regards
MThe easiest way is probably to have 3 copies of your Data Flow, each using a different File Connection Manager, and three copies of your File Connection Manager, each for a different file. You will also need to uncheck the "Table Lock" option on the destinations. If there are no precedences defined for your Data Flows, they will execute in parallel. Ugly, but simple.
There used to be a parallel execution option on the Loop task that would have allowed you to do this with one Data Flow, but it was removed.
It is also possible to use a master package that executes your current package three times in parallel, thus avoiding the Data Flow duplication above, but you'll face the challenge of having each instance of the package open a separate file. You would need to have three different Package Configuration files to define different values for a parameter that you would use in an Expression to set the ConnectionString property on your File Connection Manager. Complicated, but probably a superior design if you're up for it. You'll still need to turn off table locking.
Note that you'll need multiple processors to make this a worthwhile endeavor. Also, if your files are very large tempdb will suffer without the table locking.|||
Note that you'll need multiple processors to make this a worthwhile endeavor. Also, if your files are very large tempdb will suffer without the table locking.

Hey Jay,
Is it mandatory to have multiple processors for parallel execution for the above mentioned issue.
I have done all the above things that you have suggested, but it stills seems that the loading is happening sequentially. Please note that the my machine is single processor machine.

Regards
Meghana|||It shouldn't be mandatory, but I wouldn't expect you to see any benefits without multiple processors. I ran a test on a single processor machine and did not have any difficulty getting my Data Flows to run concurrently. I also discovered that disabling the table locking is not necessary.
One thing to check is the MaxConcurrentExecutables property. The only way I could force my package to run sequentially is to set this value to 1. The default value is -1, which means number of processors plus two, so it should let you load three files simultaneously.
I also tested performance. Using a 100k line flat file going directly into a local SQL table, I saw a 10% decrease in performance with two parallel files and 34% decrease with three.
|||

mmhaise wrote:

Hi
I have 3 sources as Flat files and the destination is a single table. Is it possible for running a package so that it is executed parallelly. I mean , all the 3 files should run at the same time and load it into the destination table.
If it is possible, please let me know how to do it

Regards
M

Have you tried using a MULTIFLATFILE connection manager? This will enable you to load data from all of the files using just a single data-flow (which is executed just once).
If the metadata of the 3 files is identical this should work very nicely.

If the metadata is different then you can use 2 or 3 Derived Column Transforms to make the pipeline metadata identical and then use a UNION ALL transform to combine the 3 data-paths into one. Again, this is a single data-flow that you execute just the once.
If you are intent on loading the 3 seperately then executing a package 3 times as Jay suggested would be the "cleanest" approach because you can reuse functionality rather than having 3 seperate but identical data-flows.

I would definately look into using the MULTIFLATFILE connection manager though if I were you.

-Jamie|||

Is it possible to make the MULTIFLATFILE connection load in parallel? I have tried using one to load 4 csv file in parallel to the same destination table. The data flow is loading them sequentially.

I have another package with four separate file connection managers and data flows; however, I thought it would be nice to use the MULTIFLATFILE since the files have identical metadata. We require them to load in parallel.

No comments:

Post a Comment