[ZODB-Dev] RFE: Spec for ZODB Indexing

Thomas Förster t.foerster@biologie.hu-berlin.de
Fri, 7 Jun 2002 11:10:33 +0200


On Thursday, 6. Juni 2002 23:24, Christian Reis wrote:
>     For retrieval to work, there must be a way to specify which
>     object(s) we want. There are a couple of scenarios here:
>
>         1. We want all objects in the database
>         2. We want a subset of objects in the database, which possess
>            attributes according to a specified condition.
>         3. We want a specific object, identified by a reference number
>            which would be something like an oid. XXX: does such a thing
>            exist for OO programming?
>
>     The first is easily provided by a dump()-like call. The last can be
>     provided by the existing mechanisms for ZODB retrieval. However, the
>     most common form of request is the 2nd, and for that an API for
>     specifying the instances desired must be created.
>
>     One way to implement the API for retrievals is to provide a
>     simplified query language, where the attributes and conditions could
>     be specified. OQL is quite probably overkill for this, but a
>     simplified, pythonesque language could be devised.

Don't think so. OQL basically consists of only one statement, thus providing a 
common frontend to all three of these cases. This makes life much easier for 
application developers.

> 2.1. Common conditions
>
>     When retrieving objects, we'd like to be able to specify the
>     attributes that identify the object in a certain way. The basic
>     Python conditional operators provide a mecahnism we could use:
>
>     - Equality, and the "is" operator
>     - Greater/Less than comparisons
>     - The in operator for lists
>     - XXX: anything else?

fuzzy/sub string matching, soundex, expressions, function calls (e.g. 
obj.enddate < now()+14*days ...)

> 2.2. Joins and totals
> ...
>     However, in the case of making summaries of information stored in
>     the ZODB, there is still need for the suggestion of a good solution.
>     For a simple example, we'll have a database with a collection of
>     Product instances, each with individual stock quantities. How do we
>     provide a means to discover the total stock in the collection?
>     Should the application handle this by caching this information based
>     on updates to individual stock?

How about:

select sum(p.stock) from Product p

which is correct OQL syntax (if I remember correctly, I don't have the spec at 
hand right now). Well, this form relies on the concept of class extents, 
meaning you can ask the DB for all objects of a given class. At the moment 
ZODB only provides named entry points, so this has to be reformulated to

select sum(p.stock) from ProductCollection.allProducts() p

with ProductCollection being the name (i.e. db.root['ProductCollection']) of 
an instance of a product collection class.

> 2.3 Query Language
> ...

A query string is just fine. ZODB should provide a facade for all this 
"complicated" querying stuff. So running a query should just be 

result = db.query('select ....' % (some, params))

> 3. Implementation
> ....
>     A concept for instance aggregation has to be provided too. This
>     aggregator would hold a collection or instance references and the
>     indexes, and would provide both query interface and index
>     maintenence functionality. This aggregator would also be a ZODB
>     persistent object.

I don't get the point here. I thought this is the reason for querying a 
database, to aggregate instances based on constraints.

> 4. API example
>
>         class Catalog(Persistent):
>             XXX: Entity?
>             def __init__(self, XXX):
>                 XXX: define what kind of instance it stores
>                 XXX: instance meta-data acquisition?
>                 XXX: index auto-creation?
>             def dump(self):
>                 XXX: return list of instances
>             def query(self, q):
>                 XXX: process query
>             def init_index(self, instance_attr):
>                 XXX: specify only indexes you want to avoid bloat?

This should be provided by the data base. I don't want to implement these 
functions for every class seperately.  I don't want to do more than giving 
metadata and calling a base class __init__ explicitly, like:

class Catalog(Persistent): 
	_zodb_indexes = (.....)

	def __init__(self, ...):
		# has to be called for automatic indexing,
		# adding to extents...
		Persistent.__init__(self) 

		custom __init__

use it like that:

all_catalogs = db.extent('Catalog') # similar to your dump method
all_items = db.query('select c.items() from Catalog c')

> 5. Development steps
>
>     - Stabilize requirements for conditionals, summaries and joins
>     - Think up Catalog API
>     - Design query language

=> already done, just take spec and implement an OQL interface for ZODB, 
making it more ODMG 'compliant'.

Writing a parser shouldn't be that hard, as BNF is given in ODMG spec.

The API then consists of only three functions, two of them being just 
shorthands:

def query(querystring) #performs OQL query on data base
def entry(objectName) # returns named object
def extent(className) #returns collection of all instances of given class and
	subclasses

the latter two may be also provided via __getattr__, for a more pythonish way. 
The className may as well be exchanged with a class object.

>     - Implement basic, brute-force queries
>     - Implement simple indexing
>     - Link queries into indexes
>     - Implement complete indexes for more data types

I think these are the crucial points here. Having implemented indexes in ZODB 
is a good thing  in any case, be there a query language or not.

Kind Regards
-- 
Thomas Förster
Dept. of Animal Physiology
Humboldt University of Berlin, Germany