[Zope] ZCatalog Update Benchmark: external method vs direct to zodb

Jonathan Hobbs toolkit at magma.ca
Tue Jul 13 08:27:38 EDT 2004


For those who asked...


Our objective was to determine the fastest way to upload a large number of
records into a zcatalog.

The test data to be uploaded consisted of 832,923 records and required 2.9G
of disk space (for the input data files). Each input data record will become
a single zclass instance and a single entry in the zcatalog.  ZClass
instances are being stored in a BTreeFolder2 folder.

The update process is split into 120 update batches.  The update batches
vary in size from 244 records to 24,360 records.


Approach #1 - External Method

In this approach a standard zope external method is used to read the input
data files, create zclass instances and update the zcatalog.  The zclass
instance is 'catalog-aware', therefore the zcatalog is automatically updated
when the zclass instance is created.  After the field data is added to the
zclass instance, a second update of the zcatalog (done via reindex_object)
is required to cause the additional fields to become indexed.

The total time required to process the 120 update batches was 400,438
seconds.  This does not include time to pack the zodb, which occured
automatically after every 10 update batches.


Approach #2 - Direct to zodb

In this approach a stand-alone python routine was used to access the
DBData.fs file (mounted via DBTab) directly (ie. zope is not running during
the update process).  The ZClass instances in this approach are NOT
'catalog-aware' (ie. catalog_object was called after the ZClass instance was
created and the field data added - the reason was to eliminate the initial
automatic zcatalog update).

The total time required to process the 120 update batches was 270,903
seconds.  This represents a time savings of 32.35%, additionally the
automated zodb pack process appears to run faster than in Approach #1 (the
longest pack time in Approach #2 was about 2.5 hours vs over 4 hours in
Approach #1).



Here are the specs on the server dedicated to the updates (same server used
for both tests)

Zope 2.6.1
Python 2.1.3
Linux 2.4.20-28.8 (Red Hat 8.0 3.2-7)
Processor 1Ghz (VIA - PIII clone, 64k cache, 1998.84 bogomips)
Ram 1.25G
Disk: RAID 1 (3 spindles, 5ms, 10000rpm)



Jonathan




More information about the Zope mailing list