[Zope-dev] __record_schema__ of Brains (Was: Record.pyd)

Sat, 10 Aug 2002 21:28:57 -0400

On Saturday 10 August 2002 11:25 am, Johan Carlsson [Torped] wrote:
> At 08:59 2002-08-09 -0400, Casey Duncan said:
> >__record_schema__ is simply a dictionary which maps field names to col=
umn
> >positions (ints) so that the record knows the index of each field in t=
he
> >record tuples.
> >
> >See line 154 of Catalog.py to see how it is initialized to the Metadat=
a=20
> >schema
> >plus a few extra columns for catalog rid and scores.
>=20
>=20
> Hi Casey (and zope-dev),
> Thanks!
> After some experimenting I realized that :-)
>=20
> One of the reasons I was because I am thinking about
> how to implement a "SELECT col1 as 'name', ... "type
> of feature for ZCatalogs.
>=20
> I'm not entirely sure it's an good idea to start with,  but
> I'm thinking in the line of large ZCatalogs (by large I mean
> allot of columns in the self.data structure).
> If all columns are copied the brains would grow larger as well
> and by selecting explicitly which columns should be copied to
> the brain they would be lighter.
>=20
> Now that I understand how the data tuples are copied to the brain
> I'm not at all sure adding a filter when copying the tuple will optimiz=
e
> thing, because of the overhead in the filter process.

This occurs lazily so the savings would be heavily dependant on the=20
application. For most web apps presenting small batches of records, the=20
savings in limiting columns returned would be pretty minimal.

The general usage is to put a minimal set of columns in metadata, only en=
ough=20
to create a results page and load the objects in cases where either large=
,=20
dynamic or otherwise arbitrary data elements are needed.

> (The way that I "solved" the group/calc part of my "project", I don't t=
hink
> it will lead to memory bloat. I'm going to implement a LacyGroupMap
> which take an extra parameter (a list of IISet). Each brain created
> in the LacyMap will have methods for calculations directly on the self.=
data
> in the Catalog. The data it self will not be stored.
> There will most probably be a pre calculate method that calculate all
> variables that are applicable and caches the result.)

Sounds like a pretty good solution. However, I would be hesitant in creat=
ing=20
direct dependancies on the internal Catalog data structures if you can he=
lp=20
it (sometimes you can't though).
=20
> One way to reduce memory consumption in wide Catalogs would be
> to have LacyBrains (vertical lacyness, there might be reasons
> why that would be a bad idea, which I'm not aware of)

That would pretty much require a rewrite of the Catalog as the data struc=
tures=20
would need to be completely different. It would introduce significant=20
database overhead since each metadata field would need to be loaded=20
individually. I think that would negate whatever performance benefit meta=
data=20
might have over simply loading the objects.
=20
> Another way would be to have multiple data attributes in the Catalog, l=
ike
> tables, and to join the tuples from them with a "from table1, table2"=20
> statement.
> In this way it would be possible to control the width of the brains.
> It would also be possible for the object indexing it self to tell the=20
catalog
> in which "tables" it should store meta data.

Yes, this would be better. You could have different sets of metadata for =
each=20
catalog record. You would select which one you wanted at query time.
=20
> There have been some proposals (ObjectHub et al) which I read some
> time ago. I didn't feel then that we what I was looking for.
> Please tell me if there's been any proposals or discussions regarding t=
his.

I don't think so. If you feel strongly about this, write up a proposal an=
d=20
provide some use cases for discussion.

>=20
> Regards,
> Johan Carlsson

-Casey