Aggregating Data using Weighted Averages in Analytic Views

  • Module 2

    Sample Data Used In This Tutorial

    This tutorial uses data in the AV schema, which contains a star schema with sales data varying by time, product and geography. A copy of the fact table was created in the current schema by the setup script of this tutorial. This copy of the SALES_FACT table includes a new column PERCENT_MARGIN, which is the profit margin on for that particular sale.

    
    SELECT * FROM sales_fact WHERE rownum <= 10;
    

    The setup script also created three attribute dimensions and three hierarchies.

    
    SELECT * FROM user_hier_levels ORDER BY hier_name, order_num;
    

  • Module 3

    Calculating Averages Using SQL From Tables

    By looking at a small slice of the fact table it can be seen that a simple average of PERCENT_MARGIN is not very meaningful. Category -532 has a high margin at 36% but relatively few sales at 675 units. Category -529 has a low margin at 7% and the most sales at 7,336 units. Category -532 should have a greater weighting in the average margin calculation.

     
    SELECT * FROM sales_fact WHERE rownum <= 50 AND month_id = 'Apr-15' AND state_province_id = 'TEXAS_US'; 
    

    The following query returns simple average of PERCENT_MARGIN. This might be very misleading.

    SELECT AVG(percent_margin) FROM sales_fact WHERE rownum <= 50 AND month_id = 'Apr-15' AND state_province_id = 'TEXAS_US';

    It is probably more meaningful to weight the average of PERCENT_MARGIN using the number of units sold. The weighted average is calculated as the sum of (UNITS * PERCENT_MARGIN) divided by the sum of UNITS.

    
    SELECT
       SUM(units * percent_margin) / SUM(units)
    FROM
      sales_fact
    WHERE
      rownum <= 50
      AND month_id = 'Apr-15'
      AND state_province_id = 'TEXAS_US';
    

    Using this formula, categories that have a higher number of units sold are given a higher weight than categories with a lower number of units sold.

    The following query shows that there are varying numbers of units sold for each product category in South America.

    
    SELECT
      t.year_name,
      p.category_name,
      g.region_name,
      SUM(f.units) AS units
    FROM
      av.time_dim t,
      av.product_dim p,
      av.geography_dim g,
      sales_fact f
    WHERE
      t.month_id = f.month_id
      AND p.category_id = f.category_id
      AND g.state_province_id = f.state_province_id
      AND t.year_name = 'CY2015'
      AND g.region_name = 'South America'
    GROUP BY
      t.year_name,
      p.category_name,
      g.region_name;
    

    The next query calculates the simple and weighted average of percent margin. Note that in this query, PERCENT_MARGIN * UNITS is first calculated for each row and then summed.

    
    SELECT
      t.year_name,
      g.region_name,
      SUM(f.units) AS units,
      SUM(f.sales) AS sales,
      ROUND(AVG(f.percent_margin),3) AS simple_average_percent_margin,
      ROUND(SUM(f.percent_margin * f.units) / SUM(f.units),3) AS units_weighted_average_percent_margin
    FROM
      av.time_dim t,
      av.geography_dim g,
      sales_fact f
    WHERE
      t.month_id = f.month_id
      AND g.state_province_id = f.state_province_id
      AND t.year_name = 'CY2015'
      AND g.region_name = 'South America'
    GROUP BY
      t.year_name,
      g.region_name; 
    

  • Module 4

    Calculating Weighted Average in the Analytic View

    To calculate the weighted average in the analytic view, the same calculation steps are required as with the SELECT from tables.

    1. Calculate the weighting factor for each detail row.
    2. Sum the weighting factor and the number of units sold
    3. Divide the summed weighting factor by the summed units.

    To do this:

    1. Add a virtual column (PERCENT_MARGIN_WEIGHT) to the table to calculate the weighting factor for each detail row (or add a column to a view.)
    2. Add the measure PERCENT_MARGIN_WEIGHT aggregated by SUM to the analytic view.
    3. Add the WEIGHTED_AVERAGE_PERCENT_MARGIN calculated measure to the analytic view.

    Add PERCENT_MARGIN_WEIGHT to the SALES_FACT table:

    
    ALTER TABLE sales_fact ADD percent_margin_weight AS (units * percent_margin);

    Query the fact table to view the new column:

    
    SELECT *
    FROM sales_fact
    WHERE
      rownum <= 50
      AND month_id = 'Apr-15'
      AND state_province_id = 'TEXAS_US';
    

    Create the analytic view (note the weighting factor and weighted average measures):

    
    CREATE OR REPLACE ANALYTIC VIEW sales_av
      CLASSIFICATION caption VALUE 'Sales AV'
      CLASSIFICATION description VALUE 'Sales Analytic View'
      CLASSIFICATION created_by VALUE 'George Jones'
    USING sales_fact
    DIMENSION BY
      (time_attr_dim
        KEY month_id REFERENCES month_id
        HIERARCHIES (
          time_hier DEFAULT),
       product_attr_dim
        KEY category_id REFERENCES category_id
        HIERARCHIES (
          product_hier DEFAULT),
       geography_attr_dim
        KEY state_province_id 
        REFERENCES state_province_id
        HIERARCHIES (
          geography_hier DEFAULT)
       )
    MEASURES
     (sales FACT sales AGGREGATE BY SUM,
      units FACT units AGGREGATE BY SUM,
      -- Simple average.
      avg_percent_margin FACT percent_margin AGGREGATE BY AVG,
      -- Weighting factor.
      percent_margin_weight FACT percent_margin_weight AGGREGATE BY SUM,
      -- Weighted average.
      weighted_average_percent_margin AS (percent_margin_weight / units)
      )
    DEFAULT MEASURE SALES;
    

    Query the analytic view:

    
    SELECT
      time_hier.member_name AS time,
      geography_hier.member_name AS geography,
      sales,
      units,
      ROUND(avg_percent_margin,3) AS average_price_per_unit,
      ROUND(weighted_average_percent_margin,3) AS weighted_average_percent_margin
    FROM sales_av
      HIERARCHIES (
        time_hier,
        geography_hier)
    WHERE
      time_hier.level_name = 'YEAR'
      AND geography_hier.level_name = 'REGION'
      AND time_hier.member_name = 'CY2015'
      AND geography_hier.member_name = 'South America'
    ORDER BY
      time_hier.member_name,
      geography_hier.member_name;
    

  • Module 5

    Handling NULL Data

    In the SALES_FACT table PERCENT_MARGIN is dense relative to UNITS. That is, for every row where there is UNITS data there is also PERCENT_MARGIN data. Many times it will be the case that one of the measure will be NULL. These cases require addition handling.

    Consider the case where a row exists with a value for UNITS and there are NULL values for PERCENT_MARGIN. In this case the sum of UNITS cannot be used at the denominator to the weighted average calculation because weight of those rows is unknown.

    Run the following query and note that for every value of UNITS there is a value for PERCENT_MARGIN.

    
    SELECT *
    FROM
      sales_fact
    WHERE
       state_province_id like '%_MX'
       AND month_id = 'Dec-15' 
    ORDER BY
       state_province_id,
       category_id;
    

    Run the following query to select UNITS, PERCENT_MARGIN_WEIGHT and WEIGHTED_AVERAGE_PERCENT_MARGIN from the tables.

    
    SELECT
        f.month_id,
        g.country_id,
        SUM(f.units),
        SUM(f.percent_margin_weight) AS percent_margin_weight,
        ROUND(SUM(f.percent_margin_weight) / SUM(f.units),3) AS weighted_average_percent_margin
      FROM
        av.geography_dim g,
        sales_fact f
      WHERE
        g.state_province_id = f.state_province_id
        AND g.country_id = 'MX'
        AND f.month_id = 'Dec-15'
      GROUP BY 
        f.month_id,
        g.country_id;
    

    The weighted average percent margin is 12316.52 percent_margin_weight / 95778 units = .129.

    Next, set PERCENT_MARGIN = NULL for Baja California.

    
    UPDATE sales_fact
    SET percent_margin = null
    WHERE
      month_id = 'Dec-15'
      AND state_province_id = 'BAJA_CALIFORNIA_MX';
    
    COMMIT;
    

    View the updated data.

    
    SELECT * 
    FROM sales_fact
    WHERE
      state_province_id LIKE '%_MX'
      AND month_id = 'Dec-15'
    ORDER BY
      state_province_id,
      category_id;
    

    Note that PERCENT_MARGIN_WEIGHT is also NULL.

    Run the following query to select UNITS, PERCENT_MARGIN_WEIGHT and WEIGHTED_AVERAGE_PERCENT_MARGIN again.

    
    SELECT
        f.month_id,
        g.country_id,
        SUM(f.units),
        SUM(f.percent_margin_weight) AS percent_margin_weight,
        ROUND(SUM(f.percent_margin_weight) / SUM(f.units),3) AS weighted_average_percent_margin
      FROM
        av.geography_dim g,
        sales_fact f
      WHERE
        g.state_province_id = f.state_province_id
        AND g.country_id = 'MX'
        AND f.month_id = 'Dec-15'
      GROUP BY 
        f.month_id,
        g.country_id;
    

    Note that the WEIGHTED_AVERAGE_PERCENT_MARGIN is .118 rather than .129. This is because the sum of PERCENT_MARGIN weight has decreased while the sum of UNITS has remained the same.

    Because the weight of rows with NULL values is unknown those rows should not be considered in the weighted average calculation. To account for this, create a new column in the fact table that returns NULL when PERCENT_MARGIN is NULL.

    
    ALTER TABLE sales_fact
    ADD percent_margin_units
      AS (CASE
            WHEN percent_margin IS NULL THEN NULL
            ELSE units
          END);

    View the data.

    SELECT * FROM sales_fact WHERE state_province_id like '%_MX' and month_id = 'Dec-15' ORDER BY state_province_id, category_id;

    Run the following query to view UNITS, PERCENT_MARGIN_UNITS, PERCENT_MARGIN_WEIGHT and WEIGHTED_AVERAGE_PERCENT_MARGIN again.

    
    SELECT
        f.month_id,
        g.country_id,
        SUM(f.units),
        SUM(f.percent_margin_units),
        SUM(f.percent_margin_weight) AS percent_margin_weight,
        ROUND(SUM(f.percent_margin_weight) / SUM(f.percent_margin_units),3) AS weighted_average_percent_margin
      FROM
        av.geography_dim g,
        sales_fact f
      WHERE
        g.state_province_id = f.state_province_id
        AND g.country_id = 'MX'
        AND f.month_id = 'Dec-15'
      GROUP BY 
        f.month_id,
        g.country_id;
    

    The rows where PERCENT_MARGIN are NULL are not longer considered in the calculation and the weighted average percent margin is .129.

    To apply this the analytic view add the PERCENT_MARGIN_UNITS measure and use it in the denominator of the WEIGHTED_AVERAGE_PERCENT_MARGIN measure.

    
    CREATE OR REPLACE ANALYTIC VIEW sales_av
      CLASSIFICATION caption VALUE 'Sales AV'
      CLASSIFICATION description VALUE 'Sales Analytic View'
      CLASSIFICATION created_by VALUE 'George Jones'
    USING sales_fact
    DIMENSION BY
      (time_attr_dim
        KEY month_id REFERENCES month_id
        HIERARCHIES (
          time_hier DEFAULT),
       product_attr_dim
        KEY category_id REFERENCES category_id
        HIERARCHIES (
          product_hier DEFAULT),
       geography_attr_dim
        KEY state_province_id 
        REFERENCES state_province_id
        HIERARCHIES (
          geography_hier DEFAULT)
       )
    MEASURES
     (sales FACT sales AGGREGATE BY SUM,
      units FACT units AGGREGATE BY SUM,
      -- Weighted Average Units.
      percent_margin_units FACT percent_margin_units AGGREGATE BY SUM,
      -- Simple average.
      avg_percent_margin FACT percent_margin AGGREGATE BY AVG,
      -- Weighting factor.
      percent_margin_weight FACT percent_margin_weight AGGREGATE BY SUM,
      -- Weighted average.
      weighted_average_percent_margin AS (percent_margin_weight / percent_margin_units)
      )
    DEFAULT MEASURE SALES;
    

    Run the following query to select UNITS, PERCENT_MARGIN_UNITS, PERCENT_MARGIN_WEIGHT and WEIGHTED_AVERAGE_PERCENT_MARGIN from the analytic view.

    
    SELECT
      time_hier.member_name AS time,
      geography_hier.member_name AS geography,
      units,
      percent_margin_units,
      percent_margin_weight,
      ROUND(weighted_average_percent_margin,3) AS weighted_average_percent_margin
    FROM sales_av
      HIERARCHIES (
        time_hier,
        geography_hier)
    WHERE
      time_hier.level_name = 'MONTH'
      AND geography_hier.level_name = 'COUNTRY'
      AND time_hier.member_name = 'Dec-15'
      AND geography_hier.member_name = 'Mexico'
    ORDER BY
      time_hier.member_name,
      geography_hier.member_name;
    

  • Additional Information