Tuesday, April 17, 2007

On Metadata-Driven Analysis Management Objects (Part 2)

Part 2: Code to Create and Process the Partition

In order to take advantage of AMO we naturally need to make use of the classes available in Microsoft.AnalysisServices.dll. Once we have defined the variables (discussed in Part 1) within the SSIS package, and wired them up to the single-row output from our metadata store using an Execute SQL Task, we can reliably retrieve and use them within the package Script Task. Our script task uses three files:
  • ScriptMain: serves as an entry point, connecting variable values with PartitionGeneratorController
  • PartitionGeneratorContorller: class file which governs execution of a PartitionGenerator object
  • PartitionGenerator: class containing all the properties and methods for creating and processing partitions and aggregations
A reference to Microsoft.AnalysisServices is made, ensuring that the latter two class files can use AMO appropriately.

Listing 1 is a snippet within ScriptMain, showing how the variables values are captured from the incoming DTS.Variables collection.

Next, we use the controller class to field the incoming arguments, and supply them to our PartitionGenerator object, which then creates a connection to the server (supplied by OLAPServerName and OLAPConnectionString) and iterates through collections of databases, cubes, measure groups and finally partitions. We use the metadata supplied to help us mine our way from collection to collection. Essentially, at the leaf-partition-level, the incoming partition name will always be new, since we concatentate a new DateKey to the PartitionName prefix. As mentioned in Part 1, this approach follows a "new partition each day" approach for both loading new data into the relational fact table as well as the target cube measure group. Listing 2 shows part of the nested-loop logic.

Once we have determined that the incoming partiton does not exist, we create a new partition with PartitionGenerator's .CreateNewPartition function. In Listing 3, arguments for the function are supplied as current SSAS 2005 objects we identified along the way through the nested For Each loops.

PartitionGenerator uses the QueryBindingString metadata value as an argument to define the QueryBinding property for the new partition. An object argument for the measure group is applied since it contains a reliable .ID property value that is used to define a new partition within an existing collection, within an existing measure group. We are also sure to use the Cube.Update method to ensure that the new partition is saved to the target cube measure group. Listing 4 provides the complete code for the function .CreateNewPartition. Back in the controller, provided .CreateNewPartition returns a new partition object safe and sound, we can go ahead and process it (shown in Listing 5, this follows on from the controller code shown in Listing 3).

Using the method .ProcessNewPartition, the PartitionGenerator object accepts the newly created partition, along with the measure group and cube objects This method uses the ProcessType.ProcessData enumeration option, to optimise "fast-loading" of the new partition. Using ProcessData means that we only process the fact data for the partition, and does not handle any dependent objects. A separate step (not discussed in this series of posts) handles dimensions separately. Listing 6 provides the complete code for .ProcessNewPartition. In our testing, we found that we needed to separate partition and aggregation processing logic into discrete operations, delineated by Cube.Update operations, in order for the SSIS Script Task to work effectively. The beneficial by-product of using ProcessData to load our partition is that we can quickly process aggregations with the ProcessType.ProcessIndex option.

In order to create and process aggregations, PartitionGenerator exposes another function .CreateNewAggregationDesign, which uses a dynamically-generated partition count document stored in the variable OLAPPartitionKeyCounts. This document describes the alignment between a fact table key column and its corresponding cube dimension attribute found in the partition. The PartitionCount attribute defines the distinct counts of dimension key members from the newly-generated relational fact table partition (this is also handled by a separate process, and is not covered here). Listing 7 shows what an example PartitionCount document looks like.

Further in the controller, provided the new partition is created and "fast-loaded" successfully, .CreateNewAggregationDesign is called (see Listing 8), returning a new AggregationDesign object driven by PartitionCount. The AggregationDesign is resident within a measure group, and is applied to the new partition; the ProcessType.ProcessIndexes enumeration is used to build the aggregation data and bitmap indexes for the new partition.

Inside .CreateNewAggregationDesign, the following steps are taken:

1. Determine whether an AggregationDesign already exists within the measure group (if it does, this represents a condition where the process itself is being re-run, thus the existing AggregationDesign object is overwritten by a new one with the same name)

2. Specify the EstimatedPerformanceGain property value for the AggregationDesign object (used for annotation purposes, this accomplished via the OptimisationLimit metadata value)

3. Specify the EstimatedRowCount property value from the PartitionCount document, in order to the specify the total number of rows for the partition

4. Add AggregationDesignDimension objects to the .Dimensions collection of the AggregationDesign object

5. From the PartitionCount XML document, set the EstimatedCount property value for each specified AggregationDesignDimension attribute

6. Design Aggregations for the AggregationDesign object, using the OptimisationLimit value, and return the object.

The full listing for .CreateNewAggregationDesign is shown in Listing 9. Note that we only use the referenced (ByRef) variables isFinished and optimisation in the .DesignAggregations method (of the AggregationDesign object) to indicate the limits on designing aggregations. The final step is analagous to working with the Set Aggregations Options dialog in the Aggregation Design Wizard, prior to clicking the Start button.

That's pretty much all there is to it. We now have a package in place which, with the right metadata, can process any partition on any cube measure group. On examining the completed partition (either via SSMS, or directly in XML), all the annotations, aggregation designs (visible at the measure group level), and aggregations are present. It should come as no suprise that our client queries tested significantly faster with the aggregations over no aggregations at all.

For those of you eager to rip the Listings to bits, bear in mind that a few alternates were considered before this final approach. One of the alternatives to applying the .NET code in an SSIS Script Task would be to create a stored procedure that calls a PartitionGenerator assembly (wrapping both the core class and its controller). Both Microsoft.AnalysisServices.dll and PartitionGenerator.dll would need to be registered, and, you would need to assign the permission set UNSAFE. On paper this is fine, but when you try to register Microsoft.AnalysisServices in the Database Engine, you may receive the following message in Management Studio.

Warning: The SQL Server client assembly 'microsoft.analysisservices, version=9.0.242.0, culture=neutral, publickeytoken=89845dcd8080cc91, processorarchitecture=msil.' you are registering is not fully tested in SQL Server hosted environment.

The risk conveyed by the warning was sufficient for us to explore alternatives. By the way, trying to register version 9.0.3042.0 of the .dll from SQL Server 2005 Service Pack 2 returned the same message in our environment.

Other serialisation issues also surfaced as we tried to iterate through collections of AggregationDesign objects. Such issues may be addressed by decorating one or more of the classes and methods with the Serializable() attribute in PartitionGenerator, although, since AMO is leveraged behind the scenes, it follows that they would likely need to expose classes that are serializable as well. Unfortunately, we did not have the time to research this any deeper, but I would like to hear if anyone else has found success taking this approach.

Other solutions, such as creating an SSAS assembly, or passing ASSL (XML/A) statements (either via an assembly or through a script task), did not satisfy our 'ease of support' requirements as mentioned in Part 1. I'm quite certain, however, that dynamic creation and execution of ASSL would work nicely.

While it does not profess to be the magic bullet solution, I hope it proves helpful to anyone coming to grips with AMO. As always, I welcome your feedback; if you have found a better or more efficient way around some of the issues raised, then please feel free to share!

Update (14-MAY-2008):
All Listings described above may now be accessed at the link below...




- Adrian Downes

5 comments:

Anonymous said...

FYI..The listing links are broken.

Adrian Downes said...

Hi anon... thanks for the FYI.

I have verified that all the listing links are working through this post. I have also checked with a few others that they can access the listing links as well.

One reason you may not be able to see the listings is that you have a popup blocker active. If this is the case, you may wish to temporarily disable this while viewing the links.

AD

Sergey Shenblum said...

This is nice but I have more extended functionality for Cubes Partition processing and cubes dimension processing for ProcessAdd and processupdate. I have used AMO and XMLA scripts (was not able to do everything with AMO as AMO is still in beta...whatever number...) For example, binding the table different then Fact table (in .dsv) to be able to process New Partition - does not work with AMO. I had to do it preparing XMLA script for new partition based on existing XMLA script for current partition...some of the code (one class) is at http://www.geocities.com/websyspro/blog/blog.html
started new blog at http://biandmssql.spaces.live.com
Cheers,
Sergey

Adrian Downes said...

Thanks for your comments, Sergey!

I always like to see new and diverse ways to attack certain problems.

Your approach of splitting the logic into C# and XMLA appears similar to something Eugene Asahara did back in February 2006:
(http://blogs.msdn.com/eugenea/archive/2006/02/22/536858.aspx)

I'm also curious to know which build number of SSAS 2005 you have installed as well as the exceptions you encountered, leading you to the conclusion that AMO does not support specifying a different table for your new partition.

I can see where this may be a problem if you did not have access to the source data itself, or, if the table column metadata (in your other table source) differ from those used by other partitions in the partition collection (of the measure group).

Microsoft documentation for the .Source property of the Partition class states:

"Source defines the binding for the partition by specifying one or more data sources, tables, and filters. This allows overrides to the defaults specified on the measure group."

(http://msdn2.microsoft.com/en-us/library/microsoft.analysisservices.partition.source.aspx)

To test things, I created a new version of the source view I used as a backbone for my QueryBinding variable (which I use with the Cube.DataSource.ID value) to specify a new partition's .Source property (using the QueryBinding type of TabularBinding). I ensured that data was visible via this new view (which has the same column names and data types as the old one), and, that this new view was also visible in my DSV. Next, I modified my metadata, ensuring that the QueryBinding string reflected the new view.

When I ran the SSIS package (containing the Script Task & AMO code), it ran successfully. Without trying to guess the details of the problem(s) you've encountered, I imagine the keys to success here are consistent column metadata among all partitions in a common PartitionCollection (for a given measure group), and, reliable
access to the appropriate tables/views.

If you believe this to be a bug, Sergey, then I'd recommend registering it with Microsoft.

Once again, thanks for your feedback!

Cheers,

AD

ps. good luck with your new blog, too

Nat Van Gulck said...

Hi Adrian,

This is an interesting post but the listings are no longer accessible as apparently "This user account has exceeded their daily bandwidth limit." Are you considering hosting those listings elsewhere? :)

Thanks!