Wednesday, March 21, 2012

Parallel Processing of Partition

Guys,
I would like to thanks Elizabeth for a great document. Appreciate it.
Under " Maximize parallelism during processing", I see the following bench mark.


# of Processors

# of Partitions to be processed in parallel

4

2 – 4

8

4 – 8

16

6 – 16

In our scenario, we have a huge dataset which means processing about 20 partitions in parallel. The box we have is a beefy box DL 585 4dual core CPU with 32 GB memory and we are using the 64 bit architecture.
However when monitoring the trace I see the following:-

1 - Process

Processing of Fact 2001 Partition

8:05:43

10:11:04

25 - Execute SQL

SQL Statement

9:38:06

9:38:06

17 - Read Data

Reading Data from Fact 2001 Partition

9:38:06

10:06:20

16 - Write Data

Writing Data to Fact 2001 Partition

8:05:43

10:10:59

20 - Build Indexes

Build Indexes

10:11:00

10:11:03

28 - Build Aggregations and Indexes

Build Aggregations and Indexes

10:11:00

10:11:03

Based on the above the thread for processing the fact kicks off at 8:05am. However the actual sql to the underlying database does not get kicked off until 9:38am.

Does this mean the same thing that it is waiting for CPUs? However when I look at the perfMon stats on the Server side, the CPU usage is pretty vanilla and I do not understand how to correlate this.

Any suggestions is much appreciated.

Rgds

Hari

Running processing of 20 partitions in parallel on 4 processor box is bit of the overkill. The fact you have lots of data to process doesnt mean you should start processing many partitions in parallel.

As Perf Guide indicates, number of processors should determine number of parallel operations. If you try to start too many processing operations, you will get into behavior you observe. Say you told Analysis Server to process 20 partitions in parallel but there not enought resources. In this case operations running in parallel are going to start fighting for resoures and some of the become blocked. Analysis Server also starts many operations within sigle partition processing job in parallel. what you see is that Execute SQL and Read data started in parallel, but Write data didnt start till some data became avaliable.

You should also take a look at optimizing your SQL server database for the type of queries Analysis Server sends to speed up sending data to it.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

Ahh..thanks Edward.
Regarding this "The fact you have lots of data to process doesnt mean you should start processing many partitions in parallel."

At this point, we are doing a full process on the database though it is partitioned.So by default, it first starts the threads for the cubes in parallel and then spawns new threads for measuregroups and finally partitions in parallels and this is done by default. Any ways I can control this.

We are using the biggest possible box that is available now.
Quadra-core CPUs do not come out until June of this year.
Like you say, we are also working with the DB2 team to optimize the SQL by building indexes on the partition columns.

Just

|||

Hi Edward,

I am the Windows Server SA who supports Hari's machine. We had a similar issue on a 32-bit Windows 2003 OLAP machine that was acting the same way when building cubes. Analysis Services was running out of memory but we weren't seeing any memory or processor spikes. I opened a case and the engineer explained this was due to the fact that there was a 2 GB limitation, regardless of how much RAM the server has. We tried adding the /3GB and /USERVA switches. Neither worked and the solution to get past this was to upgrade to 64-bit.

How much memory is allocated to an application in the 64-bit version of Windows 2003? Is there a way to tell how much memory it's utilizing?

In addition the OS is seeing all 32 GB of RAM, but I can't tell if it's utilizing all 32 GB. Are the /PAE, /USERVA, and/or /AWE switches relevant to Win2K3 64-bit? Do I need to add anything to the boot.ini to ensure applications are able to use all available RAM? Do you have any suggestions for optimizing performance in SQL and or Analysis Services?

Thanks - Gib

|||

On 64bit machine there is no worry about single process being able to access as much memory as your machine can give to it. No modifying flags or setting special options for have to access more memory.

The amount of memory your Analysis Server will use during processing is depenedent on the size of the objects you will be processing and fact whether you will be processing them in parallel.

I'd say if you moved to 64bit machine with 32Gb of Ram, you should be able to handle quite a few things in parallel, so just go ahead an run your processing and see how high memory utilization goes.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

Thanks Edward. I'm going to shift my question to Hari's problem. Specifically in relation to your teply yesterday.

"Running processing of 20 partitions in parallel on 4 processor box is bit of the overkill. The fact you have lots of data to process doesnt mean you should start processing many partitions in parallel.

As Perf Guide indicates, number of processors should determine number of parallel operations. If you try to start too many processing operations, you will get into behavior you observe. Say you told Analysis Server to process 20 partitions in parallel but there not enought resources. In this case operations running in parallel are going to start fighting for resoures and some of the become blocked. Analysis Server also starts many operations within sigle partition processing job in parallel. what you see is that Execute SQL and Read data started in parallel, but Write data didnt start till some data became avaliable.

You should also take a look at optimizing your SQL server database for the type of queries Analysis Server sends to speed up sending data to it."

Hari added a chart that has a Processor to Parallel Partition correlation.


# of Processors

# of Partitions to be processed in parallel

4

2 – 4

8

4 – 8

16

6 – 16

We are running on an HP DL585 it has 4 Dual Core AMD Processors. That having been said can OLAP utilize them as 8 Processors or will it only see them as 4?

The other thing you had mentioned is 20 partitions is overkill. Does that mean this type of lag is to be expected? Is there any kind of tweaking that can be done either on the OS or in SQL to improve performance?

As it stands now, Hari isn't seeing any CPU or Memory spikes when the delay is occurring. If operations are blocked and fighting for resources will we see 100 % Memory and CPU utilization?

Thanks - Gib

|||

Yes, you should count a single core as a processor for the purpose of the chart you mention. DL 585 is great macine and you should count as you have 8 processors.
Therefore the per guide recommends you process 4 to 8 partitions in parallel.

The bottleneck might not be CPU or memory. It is also your relational database. You should take a look at your relational database and see how hard it is working to deliver data to Analysis Server. You might want to run database tuninig advisor to build more indexes supporting queries sent to it.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

|||

Edward, Do you know if tuning the relational database will enable Hari to process 20 partitions at the same time? Do you know of anyone who is doing this on a Quad Dual Core server?

According to the chart it doesn't look like this possible, even with 16 processors. I'm wondering if there is any kind of tweaking that can be done so that it's possible.

Thanks - Gib

|||

That is not going to help. Number of partitions you process in parallel depends on how many operations in parallel your hardware can perform. If you tune your system, optimize your relational database, and you still not satisfied with processing performance, you might start looking at getting newer machines with higher number of processors.

Edward.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

No comments:

Post a Comment