Aggregation Browsing Backends¶

Backends for browsing aggregates of various data sources

SQL¶

SQL backend uses SQLAlchemy for generating queries. It supports all databases that the SQLAlchemy supports such as:

Drizzle
Firebird
Informix
Microsoft SQL Server
MySQL
Oracle
PostgreSQL
SQLite
Sybase

Browser¶

class cubes.backends.sql.browser.SnowflakeBrowser(cube, store, locale=None, metadata=None, debug=False, **options)¶

SnowflakeBrowser is a SQL-based AggregationBrowser implementation that can aggregate star and snowflake schemas without need of having explicit view or physical denormalized table.

Attributes:

cube - browsed cube
locale - locale used for browsing
metadata - SQLAlchemy MetaData object
debug - output SQL to the logger at INFO level
options - passed to the mapper and context (see their respective documentation)

Tuning:

include_summary - it True then summary is included in aggregation result. Turned on by default.
include_cell_count – if True then total cell count is included in aggregation result. Turned on by default. performance reasons

Limitations:

only one locale can be used for browsing at a time
locale is implemented as denormalized: one column for each language

aggregate(cell=None, measures=None, drilldown=None, split=None, attributes=None, page=None, page_size=None, order=None, include_summary=None, include_cell_count=None, aggregates=None, **options)¶

Return aggregated result.

Arguments:

cell: cell to be aggregated
measures: aggregates of these measures will be considered
aggregates: aggregates to be considered
drilldown: list of dimensions or list of tuples: (dimension, hierarchy, level)
split: an optional cell that becomes an extra drilldown segmenting the data into those within split cell and those not within
attributes: list of attributes from drilled-down dimensions to be returned in the result

Query tuning:

include_cell_count: if True (True is default) then result.total_cell_count is computed as well, otherwise it will be None.
include_summary: if True (default) then summary is computed, otherwise it will be None

Result is paginated by page_size and ordered by order.

Number of database queries:

without drill-down: 1 – summary
with drill-down (default): 3 – summary, drilldown, total drill-down record count

Notes:

measures can be only in the fact table

builtin_function(name, aggregate)¶: Returns a built-in function for aggregate

execute_statement(statement, label=None)¶: Execute the statement, optionally log it. Returns the result cursor.

fact(key_value, fields=None)¶

Get a single fact with key key_value from cube.

Number of SQL queries: 1.

facts(cell=None, fields=None, order=None, page=None, page_size=None)¶

Return all facts from cell, might be ordered and paginated.

Number of SQL queries: 1.

features()¶: Return SQL features. Currently they are all the same for every cube, however in the future they might depend on the SQL engine or other factors.

members(cell, dimension, depth=None, hierarchy=None, page=None, page_size=None, order=None)¶

Return values for dimension with level depth depth. If depth is None, all levels are returned.

Number of database queries: 1.

path_details(dimension, path, hierarchy=None)¶

Returns details for path in dimension. Can be used for multi-dimensional “breadcrumbs” in a used interface.

Number of SQL queries: 1.

set_locale(locale)¶: Change the browser’s locale

validate()¶

Validate physical representation of model. Returns a list of dictionaries with keys: type, issue, object.

Types might be: join or attribute.

The join issues are:

no_table - there is no table for join
duplicity - either table or alias is specified more than once

The attribute issues are:

no_table - there is no table for attribute
no_column - there is no column for attribute
duplicity - attribute is found more than once

class cubes.backends.sql.query.QueryBuilder(browser)¶

Creates a SQL query statement builder object – a controller-like object that incrementally constructs the statement.

Result attributes:

statement – SQL query statement
labels – logical labels for the statement selection

aggregate_expression(aggregate, coalesce_measure=False)¶

Returns an expression that performs the aggregation of measure aggregate. The result’s label is the aggregate’s name. aggregate has to be MeasureAggregate instance.

If aggregate function is post-aggregation calculation, then None is returned.

Aggregation function names are case in-sensitive.

If coalesce_measure is True then selected measure column is wrapped in COALESCE(column, 0).

aggregation_statement(cell, drilldown=None, aggregates=None, split=None, attributes=None, summary_only=False)¶

Builds a statement to aggregate the cell.

cell – Cell to aggregate
drilldown – a Drilldown object
aggregates – list of aggregates to consider
split – split cell for split condition
summary_only – do not perform GROUP BY for the drilldown. The
drilldown is used only for choosing tables to join and affects outer detail joins in the result

Algorithm description:

All the tables have one of the two relationship to the fact: master/match or detail. Every table connected to a table that has “detail” relationship is considered also in the “detail” relationship towards the fact. Therefore we have two join zones: all master or detail tables from the core, directly connected to the fact table and rest of the table connected to the core through outer detail relationship.

Depending on the query it is decided whether we are fine with just joining everything together into single join or we need to separate the fact master core from the outer details:

            +------+           +-----+
            | fact |--(match)--| dim +
            +------+           +-----+
Master Fact    |
===============|========================
Outer Details  |               +-----+
               +------(detail)-| dim |
                               +-----+

The outer details part is RIGHT OUTER JOINed to the fact. Since there are no tables any more, the original table keys for joins to the outer details were exposed and specially labeled as __masterkeyXX where XX is a sequence number of the key. The join_expression JOIN constructing method receives the map of the keys and replaces the original tables with connections to the columns already selected in the master fact.

Note

Limitation: we can not have a Cut (condition) where keys (path elements) are from both join zones. Whole cut should be within one zone: either the master fact or outer details.

append_condition(condition)¶: Appends condition to the generated statement.

attributes_for_cell(cell)¶: Returns a set of attributes included in the cell.

attributes_for_cell_cuts(cell)¶: Returns a list of tuples (cut, attributes) where attributes is list of attributes involved in the cut.

builtin_aggregate_expressions(aggregates, coalesce_measures=False)¶: Returns list of expressions for aggregates from aggregates that are computed using the SQL statement.

column(attribute, locale=None)¶: Returns either a physical column for the attribute or a reference to a column from the master fact if it exists.

condition_for_cell(cell)¶: Returns a SQL condition for the cell.

condition_for_point(dim, path, hierarchy=None, invert=False)¶: Returns a Condition tuple (attributes, conditions, group_by) dimension dim point at path. It is a compound condition - one equality condition for each path element in form: level[i].key = path[i]

conditions_for_cuts(cuts)¶: Constructs conditions for all cuts in the cell. Returns a list of SQL conditional expressions.

denormalized_statement(cell=None, attributes=None, expand_locales=False, include_fact_key=True)¶

Builds a statement for denormalized view. whereclause is same as SQLAlchemy whereclause for sqlalchemy.sql.expression.select(). attributes is list of logical references to attributes to be selected. If it is None then all attributes are used. condition_attributes contains list of attributes that are not going to be selected, but are required for WHERE condition.

Set expand_locales to True to expand all localized attributes.

fact(id_)¶: Selects only fact with given id

fact_key_column()¶: Returns a column that represents the fact key.

get_semiadditive_attribute(drilldown)¶: Returns an attribute from a semi-additive dimension, if defined for the cube. Cubes allows one semi-additive dimension.

measures_for_aggregates(aggregates)¶: Returns a list of measures for aggregates. This method is used in constructing the master fact.

members_statement(cell, attributes=None)¶: Prepares dimension members statement.

order(order)¶

Returns a SQL statement which is ordered according to the order. If the statement contains attributes that have natural order specified, then the natural order is used, if not overriden in the order.

order sohuld be prepared using AggregationBrowser.prepare_order().

dimension_levels is list of considered dimension levels in form of tuples (dimension, hierarchy, levels). For each level it’s sort key is used.

paginate(page, page_size)¶: Returns paginated statement if page is provided, otherwise returns the same statement.

range_condition(dim, hierarchy, from_path, to_path, invert=False)¶: Return a condition for a hierarchical range (from_path, to_path). Return value is a Condition tuple.

class cubes.backends.sql.mapper.SnowflakeMapper(cube, mappings=None, locale=None, schema=None, fact_name=None, dimension_prefix=None, dimension_suffix=None, joins=None, dimension_schema=None, **options)¶

A snowflake schema mapper for a cube. The mapper creates required joins, resolves table names and maps logical references to tables and respective columns.

Attributes:

cube - mapped cube
mappings – dictionary containing mappings
simplify_dimension_references – references for flat dimensions (with one level and no details) will be just dimension names, no attribute name. Might be useful when using single-table schema, for example, with couple of one-column dimensions.
dimension_prefix – default prefix of dimension tables, if default table name is used in physical reference construction
dimension_suffix – default suffix of dimension tables, if default table name is used in physical reference construction
fact_name – fact name, if not specified then cube.name is used
schema – default database schema
dimension_schema – schema whre dimension tables are stored (if different than fact table schema)

mappings is a dictionary where keys are logical attribute references and values are table column references. The keys are mostly in the form:

attribute for measures and fact details
attribute.locale for localized fact details
dimension.attribute for dimension attributes
dimension.attribute.locale for localized dimension attributes

The values might be specified as strings in the form table.column (covering most of the cases) or as a dictionary with keys schema, table and column for more customized references.

physical(attribute, locale=None)¶

Returns physical reference as tuple for attribute, which should be an instance of cubes.model.Attribute. If there is no dimension specified in attribute, then fact table is assumed. The returned tuple has structure: (schema, table, column).

The algorithm to find physicl reference is as follows:

IF localization is requested:
    IF is attribute is localizable:
        IF requested locale is one of attribute locales
            USE requested locale
        ELSE
            USE default attribute locale
    ELSE
        do not localize

IF mappings exist:
    GET string for logical reference
    IF locale:
        append '.' and locale to the logical reference

    IF mapping value exists for localized logical reference
        USE value as reference

IF no mappings OR no mapping was found:
    column name is attribute name

    IF locale:
        append '_' and locale to the column name

    IF dimension specified:
        # Example: 'date.year' -> 'date.year'
        table name is dimension name

        IF there is dimension table prefix
            use the prefix for table name

    ELSE (if no dimension is specified):
        # Example: 'date' -> 'fact.date'
        table name is fact table name

physical_references(attributes, expand_locales=False)¶: Convert attributes to physical attributes. If expand_locales is True then physical reference for every attribute locale is returned.

relevant_joins(attributes, expand_locales=False)¶: Get relevant joins to the attributes - list of joins that are required to be able to acces specified attributes. attributes is a list of three element tuples: (schema, table, attribute).

table_map()¶

Return list of references to all tables. Keys are aliased tables: (schema, aliased_table_name) and values are real tables: (schema, table_name). Included is the fact table and all tables mentioned in joins.

To get list of all physical tables where aliased tablesare included only once:

finder = JoinFinder(cube, joins, fact_name)
tables = set(finder.table_map().keys())

tables_for_attributes(attributes, expand_locales=False)¶: Returns a list of tables – tuples (schema, table) that contain attributes.

Slicer¶

class cubes.backends.slicer.SlicerBrowser(cube, store, locale=None, **options)¶: Browser for another Slicer server.

Mixpanel¶

class cubes.backends.mixpanel.browser.MixpanelBrowser(cube, store, locale=None, **options)¶

Creates a Mixpanel aggregation browser.

Requirements and limitations:

time dimension should always be present in the drilldown
only one other dimension is allowd for drilldown
range cuts assume numeric dimensions
unable to drill-down on year level, will default to month

features()¶: Return SQL features. Currently they are all the same for every cube, however in the future they might depend on the SQL engine or other factors.

Mongo DB¶

class cubes.backends.mongo2.browser.Mongo2Browser(cube, store, locale=None, metadata={}, url=None, **options)¶

facts(cell=None, fields=None, order=None, page=None, page_size=None, **options)¶: Return facts iterator.

features()¶: Return SQL features.

class cubes.backends.mongo2.mapper.MongoCollectionMapper(cube, database, collection, mappings=None, **options)¶

A mongo collection mapper for a cube. The mapper creates required fields, project and match expressions, and encodes/decodes using provided python lambdas.

Attributes:

cube - mapped cube
mappings – dictionary containing mappings

mappings is a dictionary where keys are logical attribute references and values are mongo document field references. The keys are mostly in the form:

attribute for measures and fact details
dimension.attribute for dimension attributes

The values might be specified as strings in the form field (covering most of the cases) or as a dictionary with keys database, collection and field for more customized references.

physical(attribute, locale=None)¶: Returns physical reference as tuple for attribute, which should be an instance of cubes.model.Attribute. If there is no dimension specified in attribute, then fact table is assumed. The returned object is a MongoDocumentField object.

Aggregation Browsing Backends¶

SQL¶

Browser¶

Slicer¶

Mixpanel¶

Mongo DB¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

Aggregation Browsing Backends¶

SQL¶

Browser¶

Slicer¶

Mixpanel¶

Mongo DB¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation