[Zope-CVS] CVS: Packages/pypes/doc - query-engine.txt:1.1
Casey Duncan
casey at zope.com
Wed Apr 14 01:25:59 EDT 2004
Update of /cvs-repository/Packages/pypes/doc
In directory cvs.zope.org:/tmp/cvs-serv20463/doc
Added Files:
query-engine.txt
Log Message:
Add initial prose describing the query engine
=== Added File Packages/pypes/doc/query-engine.txt ===
==================
Pypes Query Engine
==================
This document attempts to explain the pypes query engine and its query
execution strategy. It should be noted that pypes is a young system and its
query engine although entirely general, is infantile in its ability to generate
an optimal query plan when compared to other more mature and scientifically
rigorous database engines. This initial implementation should be regarded as a
"jumping off point" for future enhancement, assuming a continued and hopefully
growing interest in its ongoing development.
It is possible and probably likely that if you have made it this far, that
you know more about query engines than I do. Look at this document as a
plea for help as much as a description of the operation of the system ;^)
Query Constituents
------------------
Query definitions consist of the following parts:
- `Inputs`: one or more collections of objects (usually sets) which contain
the object being queried.
- `Criteria`: an executable expression which selects and joins the objects
from the inputs for the query output.
- `Order`: A declaration that determines the order of the query output.
Supported Features
------------------
- Inputs that are arbitrary iterables. Pypes extents and query result inputs
are handled most efficiently.
- Criteria consisting of a Python expression of arbitrary complexity.
- Capture of names from the calling context into the criteria expression.
- Optimization of criteria expressions to bind globals and builtins to
constants, pre-evaluate subexpressions lacking free variables and
boolean operations that can be reduced early.
- Joins of paired inputs with arbitrary operators, efficient support for
equi-joins, greater, lesser and "in" joins.
- Sort declarations consisting of multiple arbitrary Python expressions
with multiple simultaneous ordering.
- Result limits for efficient partial result evaluation.
- Flexible output selection.
- Analysis of the query execution plan.
Unsupported Features
--------------------
- Grouping
- Outer joins
- Efficient n-ary joins <shudder>
- Arbitrary projection expressions (only whole objects supported).
- Query hints
Query Execution
---------------
At the highest level, query execution proceeds as follows.
- Constituent definition (i.e., inputs, criteria, order)
- Query object instantiation
- Result selection
These stages can be executed at once or at different times, or even different
processes if the query itself is a persistent object.
Definition
~~~~~~~~~~
During definition, the following occurs:
- Inputs are named
- Criteria and sort expressions are parsed, optimized and compiled. Names
from the calling namespace and built-ins are bound.
Query Instantiation
~~~~~~~~~~~~~~~~~~~
- During instantiation the arguments are collected and sorted and any invariants
asserted.
Selection
~~~~~~~~~
Selection is where most of the heavy lifting occurs. During selection the
following occurs:
- Free names in criteria and sort expressions are gathered and unioned with
the names selected for output. These are the names used by the query. Any
inputs not used are ignored for this query run.
- Used inputs are sorted in order by their length to determine an order of
evaluation, where it is possible to compute it. Un-len-able inputs (i.e.,
iterators) are automatically last.
Criteria Evaluation
+++++++++++++++++++
- The product of the input lengths for inputs used by the expression is
computed. If sufficiently small, a scan of the cartesian product is performed
and the result is returned.
- If the criteria uses one input then request the registered index for the
input and expression. If one is found return the result derived from the
index.
- Split criteria on boolean OR operators. Criteria selection is performed for
each criteria sub-expression. The union of these results is returned.
- Split criteria on AND operators. The criteria list is sorted based on
estimated complexity calculated based on the number of inputs used for each
criteria, availability of indexes, the inputs' type and length product.
Criteria selection is performed for each sub-expression in-turn with each
result fed as the input for the next sub-expression evaluation. The
resulting intersection is returned.
- If the criteria is an equal, greater than, less than or in comparison joining
two inputs with one input in each operand then perform the join operation
using the appropriate join function and return the result.
- Scan the cartesian product of the inputs used and return the result.
More information about the Zope-CVS
mailing list