[Zope] How do I efficiently join catalog results with other data?

J. Cone jcone@g8labs.com
Sat, 28 Jul 2001 12:29:06 +0100


Hello All,

I'm not sure you have a problem you can solve lasily here.  Simplifying the
problem immensely, if you have two one-to-one mappings, involving three
completely disjoint orderable datatypes:
  A <-> B    B <-> C
and you want to produce a list of C in the order suggested by A, there are
two reasonable ways of doing it:
  - merge then sort
      - get both lists sorted in B order (assumed free)
      - walk down finding B's that match (you can tell what to discard 
                             because it's less than the head of the other one)
                                          (linear)
      - sort on A (log-linear; not lazy)
  - sort then lookup
      - get the first list in A order (assumed free)
      - get the second list as a dictionary of B->C (assumed free)
      - walk the first list looking up C values in the second
                   (log per lookup; log-linear in total; not lazy on B->C)

If you want to do this in python, I think sort-then-lookup might be more
natural, and that the small-code way of doing it might be:
  - make the lookup into a lambda function (see www.python.org)
  - map the lambda function down the A<->B list

The key step in merge-then-sort is the merge.  It would need about 10 lines
of horrible ksh.  I don't believe there's any natural way of doing it in
python.

HTH,
J. Cone

At 15:17 28/07/01 +1000, Jay, Dylan wrote:
>> -----Original Message-----
>> From: Casey Duncan [mailto:cduncan@kaivo.com]
>> Sent: Saturday, 28 July 2001 8:14 AM
>> To: Jay, Dylan
>> Cc: 'zope@zope.org'
>> Subject: Re: [Zope] How do I efficiently join catalog results 
>> with other
>> data?
>> 
>> 
>> "Jay, Dylan" wrote:
>> > 
>> > Here is my problem:
>> > 
>> > I have a dynamic per user map of data about various 
>> objects. I want to do a
>> > catalog search, combine the results with my other dynamic 
>> data and then sort
>> > on the dynamic data.
>> 
>> What type of "dynamic data"? Integers, dates, strings, etc?
>
>a tuple of mostly integers and floats.
> 
>> > 
>> > Has anyone tried doing this?
>> >  How do I do it efficiently, still getting the benifits of 
>> low memory usage
>> > of lazy result sets?
>> 
>> You should make the dynamic data returned from a script and 
>> add a field
>> index to the catalog that calls the script. Then you can use the
>> "sort_on" argument to the catalog to sort on the dynamic data.
>
>The set of data is different for every user so this won't work unless I was
>willing to have a seperate catalog for every user. Having a seperate catalog
>for each user would be fine if I could combine two to searchs of two
>different catalogs togeather. and by combine I don't the _plus_ operator you
>can use for search results. What I need is a relational join, ie only the
>results from both sets where a certain field has the same value on both. 
>
>Someone must have tried to do something like this before?
>
>_______________________________________________
>Zope maillist  -  Zope@zope.org
>http://lists.zope.org/mailman/listinfo/zope
>**   No cross posts or HTML encoding!  **
>(Related lists - 
> http://lists.zope.org/mailman/listinfo/zope-announce
> http://lists.zope.org/mailman/listinfo/zope-dev )
>
>