need advice on mass data processing
I have a data file that has over 110000 entry of 3 column data (string, float, float) currently I have written my program so it will do an entry by entry processing with zope. This operation is like this 1. read data (the data file) 2. create product (a python product that store three field data: one string and two float data) 3. update product (update the three field entries) when I first tried it out with the first 1000 entries it took about 30 seconds. That means its going to take 50 ~ 60 minutes for 110000 entries. It not every day that you have to process over 110000 data entries but processing over 60 minutes is still kind of long. So I was wondering if anyone could propose a different method of doing this..... Love to heard any replies... Allen __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
--On 8. Januar 2007 19:28:32 -0800 Allen Huang <swapp0@yahoo.com> wrote:
I have a data file that has over 110000 entry of 3 column data (string, float, float) currently I have written my program so it will do an entry by entry processing with zope. This operation is like this 1. read data (the data file) 2. create product (a python product that store three field data: one string and two float data) 3. update product (update the three field entries)
Please name things the right way. A "Product" is basically a Zope/Python package that contains definitions of classes, scripts, templates etc. You mean instances of a particular class?
when I first tried it out with the first 1000 entries it took about 30 seconds. That means its going to take 50 ~ 60 minutes for 110000 entries.
You're creating 110k instances for storing a string and two floats? If yes, that's stupid idea. You can persistent large amounts of data within a single instances by using Zope BTrees.
It not every day that you have to process over 110000 data entries but processing over 60 minutes is still kind of long.
What kind of processing? -aj
Allen Huang wrote at 2007-1-8 19:28 -0800:
currently I have written my program so it will do an entry by entry processing with zope. This operation is like this
1. read data (the data file) 2. create product (a python product that store three field data: one string and two float data) 3. update product (update the three field entries)
when I first tried it out with the first 1000 entries it took about 30 seconds. That means its going to take 50 ~ 60 minutes for 110000 entries.
You need to be especially careful in which container you dump your entries. You *MUST NOT* use a standard folder for this. It is not fit for larger amounts of entries. Use instead a "BTreeFolder2". Another way to speed up the creation process is to batch the creation and commit a transaction only for every X entries. Using these techniques, I expect that you will be able to create about 100 entries per second. -- Dieter
participants (3)
-
Allen Huang -
Andreas Jung -
Dieter Maurer