In order to take advantage of AMO we naturally need to make use of the classes available in Microsoft.AnalysisServices.dll. Once we have defined the variables (discussed in Part 1) within the SSIS package, and wired them up to the single-row output from our metadata store using an Execute SQL Task, we can reliably retrieve and use them within the package Script Task. Our script task uses three files:
- ScriptMain: serves as an entry point, connecting variable values with PartitionGeneratorController
- PartitionGeneratorContorller: class file which governs execution of a PartitionGenerator object
- PartitionGenerator: class containing all the properties and methods for creating and processing partitions and aggregations
Listing 1 is a snippet within ScriptMain, showing how the variables values are captured from the incoming DTS.Variables collection.
Next, we use the controller class to field the incoming arguments, and supply them to our PartitionGenerator object, which then creates a connection to the server (supplied by OLAPServerName and OLAPConnectionString) and iterates through collections of databases, cubes, measure groups and finally partitions. We use the metadata supplied to help us mine our way from collection to collection. Essentially, at the leaf-partition-level, the incoming partition name will always be new, since we concatentate a new DateKey to the PartitionName prefix. As mentioned in Part 1, this approach follows a "new partition each day" approach for both loading new data into the relational fact table as well as the target cube measure group. Listing 2 shows part of the nested-loop logic.
Once we have determined that the incoming partiton does not exist, we create a new partition with PartitionGenerator's .CreateNewPartition function. In Listing 3, arguments for the function are supplied as current SSAS 2005 objects we identified along the way through the nested For Each loops.
PartitionGenerator uses the QueryBindingString metadata value as an argument to define the QueryBinding property for the new partition. An object argument for the measure group is applied since it contains a reliable .ID property value that is used to define a new partition within an existing collection, within an existing measure group. We are also sure to use the Cube.Update method to ensure that the new partition is saved to the target cube measure group. Listing 4 provides the complete code for the function .CreateNewPartition. Back in the controller, provided .CreateNewPartition returns a new partition object safe and sound, we can go ahead and process it (shown in Listing 5, this follows on from the controller code shown in Listing 3).
Using the method .ProcessNewPartition, the PartitionGenerator object accepts the newly created partition, along with the measure group and cube objects This method uses the ProcessType.ProcessData enumeration option, to optimise "fast-loading" of the new partition. Using ProcessData means that we only process the fact data for the partition, and does not handle any dependent objects. A separate step (not discussed in this series of posts) handles dimensions separately. Listing 6 provides the complete code for .ProcessNewPartition. In our testing, we found that we needed to separate partition and aggregation processing logic into discrete operations, delineated by Cube.Update operations, in order for the SSIS Script Task to work effectively. The beneficial by-product of using ProcessData to load our partition is that we can quickly process aggregations with the ProcessType.ProcessIndex option.
In order to create and process aggregations, PartitionGenerator exposes another function .CreateNewAggregationDesign, which uses a dynamically-generated partition count document stored in the variable OLAPPartitionKeyCounts. This document describes the alignment between a fact table key column and its corresponding cube dimension attribute found in the partition. The PartitionCount attribute defines the distinct counts of dimension key members from the newly-generated relational fact table partition (this is also handled by a separate process, and is not covered here). Listing 7 shows what an example PartitionCount document looks like.
Further in the controller, provided the new partition is created and "fast-loaded" successfully, .CreateNewAggregationDesign is called (see Listing 8), returning a new AggregationDesign object driven by PartitionCount. The AggregationDesign is resident within a measure group, and is applied to the new partition; the ProcessType.ProcessIndexes enumeration is used to build the aggregation data and bitmap indexes for the new partition.
Inside .CreateNewAggregationDesign, the following steps are taken:
1. Determine whether an AggregationDesign already exists within the measure group (if it does, this represents a condition where the process itself is being re-run, thus the existing AggregationDesign object is overwritten by a new one with the same name)
2. Specify the EstimatedPerformanceGain property value for the AggregationDesign object (used for annotation purposes, this accomplished via the OptimisationLimit metadata value)
3. Specify the EstimatedRowCount property value from the PartitionCount document, in order to the specify the total number of rows for the partition
4. Add AggregationDesignDimension objects to the .Dimensions collection of the AggregationDesign object
5. From the PartitionCount XML document, set the EstimatedCount property value for each specified AggregationDesignDimension attribute
6. Design Aggregations for the AggregationDesign object, using the OptimisationLimit value, and return the object.
That's pretty much all there is to it. We now have a package in place which, with the right metadata, can process any partition on any cube measure group. On examining the completed partition (either via SSMS, or directly in XML), all the annotations, aggregation designs (visible at the measure group level), and aggregations are present. It should come as no suprise that our client queries tested significantly faster with the aggregations over no aggregations at all.
For those of you eager to rip the Listings to bits, bear in mind that a few alternates were considered before this final approach. One of the alternatives to applying the .NET code in an SSIS Script Task would be to create a stored procedure that calls a PartitionGenerator assembly (wrapping both the core class and its controller). Both Microsoft.AnalysisServices.dll and PartitionGenerator.dll would need to be registered, and, you would need to assign the permission set UNSAFE. On paper this is fine, but when you try to register Microsoft.AnalysisServices in the Database Engine, you may receive the following message in Management Studio.
Warning: The SQL Server client assembly 'microsoft.analysisservices, version=22.214.171.124, culture=neutral, publickeytoken=89845dcd8080cc91, processorarchitecture=msil.' you are registering is not fully tested in SQL Server hosted environment.
The risk conveyed by the warning was sufficient for us to explore alternatives. By the way, trying to register version 9.0.3042.0 of the .dll from SQL Server 2005 Service Pack 2 returned the same message in our environment.
Other serialisation issues also surfaced as we tried to iterate through collections of AggregationDesign objects. Such issues may be addressed by decorating one or more of the classes and methods with the Serializable() attribute in PartitionGenerator, although, since AMO is leveraged behind the scenes, it follows that they would likely need to expose classes that are serializable as well. Unfortunately, we did not have the time to research this any deeper, but I would like to hear if anyone else has found success taking this approach.
Other solutions, such as creating an SSAS assembly, or passing ASSL (XML/A) statements (either via an assembly or through a script task), did not satisfy our 'ease of support' requirements as mentioned in Part 1. I'm quite certain, however, that dynamic creation and execution of ASSL would work nicely.
While it does not profess to be the magic bullet solution, I hope it proves helpful to anyone coming to grips with AMO. As always, I welcome your feedback; if you have found a better or more efficient way around some of the issues raised, then please feel free to share!
All Listings described above may now be accessed at the link below...
- Adrian Downes