[ZODB-Dev] New to ZODB, how to make a db efficently?

Christian Theune ct at gocept.com
Tue Aug 19 03:00:21 EDT 2008


Hi,

On Mon, 2008-08-18 at 20:17 +0300, Markus wrote:
> I'm new here, so hi! ;-)
> 
> I'm looking to create a database of persons and events, later to
> search persons by names, events by dates and locations (participants
> of events are already in an attribute of the event and instances of
> Person, which inherits from Persistent)
> 
> At first I made a PersistentList of all the events and a
> PersistentMapping of all the people by an id, but later found out,
> that searching through a list with a for-loop is very slow (there are
> about 200 000 people and 100 000 events). And so as I've looked around
> here a bit (the docs and
> the wikis are mostly outdated or empty -- there's also talk about the
> bad documentation in this mailinglist) I've found, that I should be
> using OOBTree for making the indexes.

Yes, the documentation situation is less than desirable for
beginners. :/

> So what I'm asking is, is it reasonable to create the db like this:
> persons in root['persons'],
> which is a OOBTree, mapping names to Person-objects and events in
> root['events'], which also an OOBTree, mapping dates to Event-objects?
> And if I want to map locations to events, I should do it at the same
> time, when creating the events, so I don't have to loop through all of
> them again?

Here's what I do:

Create a physical structure that models your data in a 'natural' way.
This can e.g. be:

- A root object representing the application, in case you may want to
  hold multiple instances of your application within a single database.

- BTrees for storing large lists of objects, like you do. But mainly
  with a single lookup direction, e.g. for you the name-to-person
  mapping.

  Some times, those lists just work with arbitrary IDs for the objects,
  much like primary keys in tables.

  Alternatively, if you have a VFS-like structure, you might want to use
  the folder/item metaphor for the main structure of your database.

- Add an indexing/searching framework for orthogonal queries. This is
  called `cataloging` in the Zope/ZODB universe. Some (more or less)
  standalone solutions are found in the proximity of `zope.catalog`.

  Use those to create tabular views on your data (independent of the
  physical structure) that are queryable by indexed arguments. Those are
  fast.

> If I have a OOBTree-mapping of dates to events, what should the values
> of it be? PersistentLists? I've read something about Buckets or Sets,
> but I'm not sure what they are good
> for, Bucket seems to behave like the equivalent BTree (OO, or IO or OI
> or IF or ....), but Set seems to be a set... Is that true?

I'd go with a flat structure. See my note on 'arbitrary' IDs above.

> What's the difference between a PersistentMapping and a OOBTree or
> OOBucket? Only the "back-end", because on the front they all seem like
> dictionarys? Should I be using OOBTrees and OOBuckets for what I'm
> doing, because strings and dates are "O"s and not "I"s or "F"s or...

A PM is a persistent dictionary that loads all of its data at once.

A bucket is an internal node of a BTree.

A BTree is a (key-)sorted(!) data structure that provides a key/value
interface like dictionaries do. Due to that, the lookup of items in a
BTree is fast and also memory efficient, as only individual buckets of
the BTree need to be activated for a lookup (optimally only O(logn)
buckets).

Christian

-- 
Christian Theune · ct at gocept.com
gocept gmbh & co. kg · forsterstraße 29 · 06112 halle (saale) · germany
http://gocept.com · tel +49 345 1229889 7 · fax +49 345 1229889 1
Zope and Plone consulting and development
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://mail.zope.org/pipermail/zodb-dev/attachments/20080819/99e110a9/attachment.bin 


More information about the ZODB-Dev mailing list