[Zope] Folder with one million Documents?

sean.upton@uniontrib.com sean.upton@uniontrib.com
Fri, 25 Jan 2002 13:53:15 -0800


This will be taxing on Zope, so you need to be willing to be patient enough
to optimize your application a bit.  BTreeFolder works well for this,
provided you are willing to consider bypassing use of the ObjectManager APIs
and read/write to BTreeFolder._tree directly or use BTreeFolder._setOb() and
BTreeFolder._getOb() instead of ObjectManager._getObject()...

You also will REALLY need some nice hardware.  I would suggest the fastest
box you can get with LOTS of RAM.  I would look at something along the lines
of a Dual Athlon 2000+ (P-rated, not MHz) box with 3-4 GB RAM, and a striped
RAID volume of fast disks.  

I have a BTreeFolder-derived folder and have populated it with about a
third-of-a-million Cataloged objects, with each object using an underlying
relational datastore, and about 8 Cataloged indexes, mostly field indexes
index the result of a relational query; bulk adding these objects from a RDB
datasource with cataloging takes about 2-3 hours on a P4 1.4GHz, and right
now with my application, the Catalog is broken until a bulk-reindex on the
advanced Tab of the Catalog - another 2 hours.

I don't think BTreeFolder is a problem, but I would suspect that reindexing
a Catalog with 3 million documents with full-text search setups would take
you over 10-15 hours on a fast computer, longer if there is a complex amount
of filtering document formats involved.

Sean

-----Original Message-----
From: Thomas Guettler [mailto:zopestoller@thomas-guettler.de]
Sent: Friday, January 25, 2002 6:56 AM
To: zope@zope.org
Subject: [Zope] Folder with one million Documents?


  Hi!

I am developing a simple DMS. Up to now I use a python product with a 
BTreeFolder which
contains all the documents. Every document gets an ID with 
DateTime().millis(). There will
be up to 50 users working at the same time. And in the end I will have 
up to 3 million documents.

Is there a better class than BTreeFolder for such mass storage?

For the curious, here is one result of the benchmarks:

I benchmark it with httperf:
 +5 requests per second
 +10MBit connection between client and server
 +every request creates a document

Anzahl der Dokumente: 2159
httperf.exe: warning: open file limit > FD_SETSIZE; limiting max. # of 
open files to FD_SETSIZE
httperf.exe --timeout=5 --client=0/1 --server=prophet --port=8080 
--uri=/a/benchmarks/create_new_doc --rate=5 --send-buffer=4096 
--recv-buffer=16384 --add-header='Authorization: Basic em9wZTp6b3Bl\n' 
--num-conns=1000 --num-calls=1
Maximum connect burst length: 1

Total: connections 1000 requests 1000 replies 571 test-duration 204.814 s

Connection rate: 4.9 conn/s (204.8 ms/conn, <=26 concurrent connections)
Connection time [ms]: min 110.0 avg 977.2 max 5257.0 median 289.5 stddev 
1315.9
Connection time [ms]: connect 0.5
Connection length [replies/conn]: 1.000

Request rate: 4.9 req/s (204.8 ms/req)
Request size [B]: 120.0

Reply rate [replies/s]: min 0.0 avg 2.9 max 5.2 stddev 2.2 (40 samples)
Reply time [ms]: response 976.4 transfer 0.0
Reply size [B]: header 216.0 content 79.0 footer 0.0 (total 295.0)
Reply status: 1xx=0 2xx=562 3xx=0 4xx=0 5xx=9

CPU time [s]: user 83.96 system 117.15 (user 41.0% system 57.2% total 98.2%)
Net I/O: 1.4 KB/s (0.0*10^6 bps)

Errors: total 429 client-timo 429 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
Anzahl der Dokumente: 3075
--end:  Fri Jan 25 14:25:34 2002



_______________________________________________
Zope maillist  -  Zope@zope.org
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )