[Zope-CVS] CVS: Packages/pypes/doc - query-engine.txt:1.2
Casey Duncan
casey at zope.com
Mon Jun 7 00:14:26 EDT 2004
Update of /cvs-repository/Packages/pypes/doc
In directory cvs.zope.org:/tmp/cvs-serv29747/doc
Modified Files:
query-engine.txt
Log Message:
Bring the prose more in phase with reality and gain some self-confidence
=== Packages/pypes/doc/query-engine.txt 1.1 => 1.2 ===
--- Packages/pypes/doc/query-engine.txt:1.1 Wed Apr 14 01:25:58 2004
+++ Packages/pypes/doc/query-engine.txt Mon Jun 7 00:14:24 2004
@@ -10,10 +10,6 @@
"jumping off point" for future enhancement, assuming a continued and hopefully
growing interest in its ongoing development.
-It is possible and probably likely that if you have made it this far, that
-you know more about query engines than I do. Look at this document as a
-plea for help as much as a description of the operation of the system ;^)
-
Query Constituents
------------------
@@ -30,8 +26,9 @@
Supported Features
------------------
-- Inputs that are arbitrary iterables. Pypes extents and query result inputs
- are handled most efficiently.
+- Inputs are objects supporting the ISet interface (they do not need to
+ assert this interface, so builtin Python sets will work) or pypes
+ query result objects.
- Criteria consisting of a Python expression of arbitrary complexity.
@@ -73,7 +70,9 @@
- Constituent definition (i.e., inputs, criteria, order)
-- Query object instantiation
+- Criteria evaluation
+
+- Sorting
- Result selection
@@ -90,51 +89,67 @@
- Criteria and sort expressions are parsed, optimized and compiled. Names
from the calling namespace and built-ins are bound.
-Query Instantiation
-~~~~~~~~~~~~~~~~~~~
-
-- During instantiation the arguments are collected and sorted and any invariants
-asserted.
-
-Selection
-~~~~~~~~~
-
-Selection is where most of the heavy lifting occurs. During selection the
-following occurs:
-
-- Free names in criteria and sort expressions are gathered and unioned with
- the names selected for output. These are the names used by the query. Any
- inputs not used are ignored for this query run.
-
-- Used inputs are sorted in order by their length to determine an order of
- evaluation, where it is possible to compute it. Un-len-able inputs (i.e.,
- iterators) are automatically last.
-
Criteria Evaluation
-+++++++++++++++++++
-
-- The product of the input lengths for inputs used by the expression is
- computed. If sufficiently small, a scan of the cartesian product is performed
- and the result is returned.
-
-- If the criteria uses one input then request the registered index for the
- input and expression. If one is found return the result derived from the
- index.
-
-- Split criteria on boolean OR operators. Criteria selection is performed for
- each criteria sub-expression. The union of these results is returned.
-
-- Split criteria on AND operators. The criteria list is sorted based on
- estimated complexity calculated based on the number of inputs used for each
- criteria, availability of indexes, the inputs' type and length product.
- Criteria selection is performed for each sub-expression in-turn with each
- result fed as the input for the next sub-expression evaluation. The
- resulting intersection is returned.
-
-- If the criteria is an equal, greater than, less than or in comparison joining
- two inputs with one input in each operand then perform the join operation
- using the appropriate join function and return the result.
-
-- Scan the cartesian product of the inputs used and return the result.
-
+~~~~~~~~~~~~~~~~~~~
+Criteria is processed using partial result objects. If necessary, the criteria
+is divided into pieces and handled by multiple partial results which are
+ultimately combined to determine the final result. There are three kinds of
+partial results used by pypes:
+
+Cartesian Products -- These are the generalist brutes that can handle any query
+regardless of complexity, so long as you are patient enough. Cart. products are
+used with very small inputs where optimizations are not helpful, or when the
+complexity of the query requires them. They are often used in combination with
+the other partial results when selecting multiple outputs.
+
+Joins -- Joins handle efficient relational joining of pairs of objects. They
+are invoked by comparing objects from multiple inputs with one another.
+
+Actualized Results -- These results are processed as simple filters that
+each act on a single input. They are used to filter and reduce the input sets
+so there is less to process in later query stages.
+
+Queries use partial results in the following manner to process query criteria:
+
+- If the calculated magnitude, the maximum size of the query output, is below
+ a designated threshold, then a cart. product is returned as the result.
+
+- If possible, an actualized result is created and is returned as the result.
+
+- If possible, a join is created and returned as the result.
+
+- The criteria is split on boolean OR operators. Each sub-expression is
+ processed separately and the union of the partial results is returned.
+
+- The criteria is split on boolean AND operators. Each sub-expression is
+ processed in sequence and the intersection of the partial results is
+ returned.
+
+- If no more division of the criteria is possible, then a cart. product
+ is returned as the result.
+
+Sorting
+~~~~~~~
+
+If an order is specified by the query then it is evaluated using the results of
+criteria evaluation. The first order expression specified is fully evaluated to
+create an ordered result iterator. The second sort order expression (if any) is
+processed only when duplicate sort keys of the first order expression are
+encountered during result iteration. Remaining order expressions are evaluated
+similarly. Consequently, if the first order expression generates unique sort
+key values then no further expression are evaluated for the query run.
+
+A limit may be specified on the result size which is evaluated during sorting.
+Alternate sorting algorithms may be employed to more efficiently evaluate
+a limited query result.
+
+Result selection
+~~~~~~~~~~~~~~~~
+
+Selection determines the output of the query run. The output of a pypes query
+is an iterable of objects or tuples of joined objects. The output "columns"
+coorespond to the query inputs. When a single output is selected, the results
+consist of single objects. If multiple output columns are selected then the
+results consist of tuple "rows" of objects selected from the inputs by the
+criteria.
More information about the Zope-CVS
mailing list