[ZODB-Dev] RFE: Spec for ZODB Indexing
Thomas Förster
t.foerster@biologie.hu-berlin.de
Fri, 7 Jun 2002 11:10:33 +0200
On Thursday, 6. Juni 2002 23:24, Christian Reis wrote:
> For retrieval to work, there must be a way to specify which
> object(s) we want. There are a couple of scenarios here:
>
> 1. We want all objects in the database
> 2. We want a subset of objects in the database, which possess
> attributes according to a specified condition.
> 3. We want a specific object, identified by a reference number
> which would be something like an oid. XXX: does such a thing
> exist for OO programming?
>
> The first is easily provided by a dump()-like call. The last can be
> provided by the existing mechanisms for ZODB retrieval. However, the
> most common form of request is the 2nd, and for that an API for
> specifying the instances desired must be created.
>
> One way to implement the API for retrievals is to provide a
> simplified query language, where the attributes and conditions could
> be specified. OQL is quite probably overkill for this, but a
> simplified, pythonesque language could be devised.
Don't think so. OQL basically consists of only one statement, thus providing a
common frontend to all three of these cases. This makes life much easier for
application developers.
> 2.1. Common conditions
>
> When retrieving objects, we'd like to be able to specify the
> attributes that identify the object in a certain way. The basic
> Python conditional operators provide a mecahnism we could use:
>
> - Equality, and the "is" operator
> - Greater/Less than comparisons
> - The in operator for lists
> - XXX: anything else?
fuzzy/sub string matching, soundex, expressions, function calls (e.g.
obj.enddate < now()+14*days ...)
> 2.2. Joins and totals
> ...
> However, in the case of making summaries of information stored in
> the ZODB, there is still need for the suggestion of a good solution.
> For a simple example, we'll have a database with a collection of
> Product instances, each with individual stock quantities. How do we
> provide a means to discover the total stock in the collection?
> Should the application handle this by caching this information based
> on updates to individual stock?
How about:
select sum(p.stock) from Product p
which is correct OQL syntax (if I remember correctly, I don't have the spec at
hand right now). Well, this form relies on the concept of class extents,
meaning you can ask the DB for all objects of a given class. At the moment
ZODB only provides named entry points, so this has to be reformulated to
select sum(p.stock) from ProductCollection.allProducts() p
with ProductCollection being the name (i.e. db.root['ProductCollection']) of
an instance of a product collection class.
> 2.3 Query Language
> ...
A query string is just fine. ZODB should provide a facade for all this
"complicated" querying stuff. So running a query should just be
result = db.query('select ....' % (some, params))
> 3. Implementation
> ....
> A concept for instance aggregation has to be provided too. This
> aggregator would hold a collection or instance references and the
> indexes, and would provide both query interface and index
> maintenence functionality. This aggregator would also be a ZODB
> persistent object.
I don't get the point here. I thought this is the reason for querying a
database, to aggregate instances based on constraints.
> 4. API example
>
> class Catalog(Persistent):
> XXX: Entity?
> def __init__(self, XXX):
> XXX: define what kind of instance it stores
> XXX: instance meta-data acquisition?
> XXX: index auto-creation?
> def dump(self):
> XXX: return list of instances
> def query(self, q):
> XXX: process query
> def init_index(self, instance_attr):
> XXX: specify only indexes you want to avoid bloat?
This should be provided by the data base. I don't want to implement these
functions for every class seperately. I don't want to do more than giving
metadata and calling a base class __init__ explicitly, like:
class Catalog(Persistent):
_zodb_indexes = (.....)
def __init__(self, ...):
# has to be called for automatic indexing,
# adding to extents...
Persistent.__init__(self)
custom __init__
use it like that:
all_catalogs = db.extent('Catalog') # similar to your dump method
all_items = db.query('select c.items() from Catalog c')
> 5. Development steps
>
> - Stabilize requirements for conditionals, summaries and joins
> - Think up Catalog API
> - Design query language
=> already done, just take spec and implement an OQL interface for ZODB,
making it more ODMG 'compliant'.
Writing a parser shouldn't be that hard, as BNF is given in ODMG spec.
The API then consists of only three functions, two of them being just
shorthands:
def query(querystring) #performs OQL query on data base
def entry(objectName) # returns named object
def extent(className) #returns collection of all instances of given class and
subclasses
the latter two may be also provided via __getattr__, for a more pythonish way.
The className may as well be exchanged with a class object.
> - Implement basic, brute-force queries
> - Implement simple indexing
> - Link queries into indexes
> - Implement complete indexes for more data types
I think these are the crucial points here. Having implemented indexes in ZODB
is a good thing in any case, be there a query language or not.
Kind Regards
--
Thomas Förster
Dept. of Animal Physiology
Humboldt University of Berlin, Germany