Hi. I am working with the CMFBTreeFolder for the first time. I thought it would make sense since I am wanting to use this type for an uploads folder. There may be thousands of objects. Once files have been uploaded to the folder, I am running a script to pull out images using a for statement like this: # Iterate over folder contents for name, obj in start_dir.objectItems(['Portal Image']): and then executing some logic on the images afterwards. I am planning to cron the script unless I can figure a way to execute the logic as files are coming into the folder (which would be the best - so any ideas here would be great). In any case, what happens is that about half the objects get processed with each run of the script. For example if I had 1030 objects about half gets processed, and then half of that and so on. I may take 4 runs to get everything processed in the folder. On a normal Folder or CMF Portal Folder, all objects get processed in one run. Maybe this is what it is supposed to do and I should loop my logic and test each time or something. Can someone give me some insight on this. Is this normal ? Or should I be using anything else to process all objects of a certain type in my BTreeFolder (so I know I am getting them all) on one pass. Regards, David
David Pratt wrote:
Hi. I am working with the CMFBTreeFolder for the first time. I thought it would make sense since I am wanting to use this type for an uploads folder. There may be thousands of objects. Once files have been uploaded to the folder, I am running a script to pull out images using a for statement like this:
# Iterate over folder contents for name, obj in start_dir.objectItems(['Portal Image']):
and then executing some logic on the images afterwards. I am planning to cron the script unless I can figure a way to execute the logic as files are coming into the folder (which would be the best - so any ideas here would be great).
First of. Maybe your script runs into an error, and stops. It's hard to tell without seing the script what the problem is. It's most likely not a BTreeFolder issue. If you need to do processing on an image, you should subclass the CMF Image class, and overwrite the methods necessary for postprocessing your images. That is a better solution than a cron job. -- hilsen/regards Max M, Denmark http://www.mxm.dk/ IT's Mad Science
Hi Max. I am getting no errors. I am asking the question because I have no experience with CMFBTreeFolder and the script works fine in a regular folder or a CMF Portal folder. When images come into the uploads folder, I am resizing each into 3 sizes using PIL and moving this data into an external file type to store file on the filesystem and discarding the original file. I have this working except images are not processed immediately when they come into uploads folder - so I was looking to cronning to see if anything is in the uploads folder each 15 or 30 min. Clients will upload images from a form, FTP, or WebDAV. I did not want to give people server accounts to upload their data using scp but in the end I am going to have to get SFTPGateway product working so there is some security with this in any case. I just received Paul's message. I think he may be right on what is happening when getting the items from the folder. I think the workflow idea could be the solution to solve the immediate processing issue so will also look into this. Regards, David On Monday, February 28, 2005, at 10:46 AM, Max M wrote:
David Pratt wrote:
Hi. I am working with the CMFBTreeFolder for the first time. I thought it would make sense since I am wanting to use this type for an uploads folder. There may be thousands of objects. Once files have been uploaded to the folder, I am running a script to pull out images using a for statement like this: # Iterate over folder contents for name, obj in start_dir.objectItems(['Portal Image']): and then executing some logic on the images afterwards. I am planning to cron the script unless I can figure a way to execute the logic as files are coming into the folder (which would be the best - so any ideas here would be great).
First of. Maybe your script runs into an error, and stops. It's hard to tell without seing the script what the problem is. It's most likely not a BTreeFolder issue.
If you need to do processing on an image, you should subclass the CMF Image class, and overwrite the methods necessary for postprocessing your images. That is a better solution than a cron job.
--
hilsen/regards Max M, Denmark
http://www.mxm.dk/ IT's Mad Science
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
On Feb 28, 2005, at 17:15, David Pratt wrote:
When images come into the uploads folder, I am resizing each into 3 sizes using PIL and moving this data into an external file type to store file on the filesystem and discarding the original file. I have this working except images are not processed immediately when they come into uploads folder - so I was looking to cronning to see if anything is in the uploads folder each 15 or 30 min.
The behavior you see is simple to explain. Your problem is that you are mutating the list of items as you go through it - always a bad idea. Try something like this: ids = list(folder.objectIds()) # This creates a *copy* of all IDs for id in ids: ob = folder._getOb(id) <do stuff with ob> jens
Many thanks Jens for this advice. It is much appreciated. Regards, David On Monday, February 28, 2005, at 12:55 PM, Jens Vagelpohl wrote:
On Feb 28, 2005, at 17:15, David Pratt wrote:
When images come into the uploads folder, I am resizing each into 3 sizes using PIL and moving this data into an external file type to store file on the filesystem and discarding the original file. I have this working except images are not processed immediately when they come into uploads folder - so I was looking to cronning to see if anything is in the uploads folder each 15 or 30 min.
The behavior you see is simple to explain. Your problem is that you are mutating the list of items as you go through it - always a bad idea. Try something like this:
ids = list(folder.objectIds()) # This creates a *copy* of all IDs
for id in ids: ob = folder._getOb(id) <do stuff with ob>
jens
_______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
On Mon, Feb 28, 2005 at 10:07:48AM -0400, David Pratt wrote:
Hi. I am working with the CMFBTreeFolder for the first time. I thought it would make sense since I am wanting to use this type for an uploads folder. There may be thousands of objects. Once files have been uploaded to the folder, I am running a script to pull out images using a for statement like this:
# Iterate over folder contents for name, obj in start_dir.objectItems(['Portal Image']):
and then executing some logic on the images afterwards. I am planning to cron the script unless I can figure a way to execute the logic as files are coming into the folder (which would be the best - so any ideas here would be great).
In any case, what happens is that about half the objects get processed with each run of the script. For example if I had 1030 objects about half gets processed, and then half of that and so on. I may take 4 runs to get everything processed in the folder. On a normal Folder or CMF Portal Folder, all objects get processed in one run.
I'm just guessing, but it may be that in CMFBtreeFolder, objectItems() returns not a list but some kind of iterator that lazily gets the next batch of images. I don't know what you're *doing* with those images you find, but if the result is to change the result of what objectItems() would return, you might be getting into undefined behavior territory. Easy ways to find out would be 1) read the source of CMFBTreeFolder and maybe base classes. 2) try looping over list(start_dir.objectItems(['Portal Image']) instead and see if that just works. But maybe unacceptable performance if the list is really really huge. FWIW, I'd take a different approach (I think... hard to be sure since I don't know what you want to DO with these images): Use workflow. DCWorkflow provides for "automatic" transitions, where an object in one state can automatically transition to another state whenever some guard condition is met. So, write a script like "postprocessImage" that does what you want to *one* Image, add it to your Images workflow as a script in an automatic transition that always occurs from the workflow's initial state. The result is that this script will get called on every image exactly once just after it's created. There's a good doc on DCWorkflow somewhere, PDF I think. Google should find it. -- Paul Winkler http://www.slinkp.com
Hi Paul. Thank you for your reply. I think you are right on what is happening with BTree and will try what you have suggested. I was just going to try looping over the data but thought of this as strange and should consult the list. I think the workflow approach to the problem is likely the better one for sure since I already have the logic for what I want to do with each image. Can this approach work on images on the uploads folder only without affecting the behavior of images across the entire portal? For example, if I set a boolean property on the uploads folder and tested for it, can something like this be used to determine whether the postprocessing script is run with workflow? I will look for the DCWorkflow docs in the meantime. Many thanks. David. On Monday, February 28, 2005, at 11:31 AM, Paul Winkler wrote:
On Mon, Feb 28, 2005 at 10:07:48AM -0400, David Pratt wrote:
Hi. I am working with the CMFBTreeFolder for the first time. I thought it would make sense since I am wanting to use this type for an uploads folder. There may be thousands of objects. Once files have been uploaded to the folder, I am running a script to pull out images using a for statement like this:
# Iterate over folder contents for name, obj in start_dir.objectItems(['Portal Image']):
and then executing some logic on the images afterwards. I am planning to cron the script unless I can figure a way to execute the logic as files are coming into the folder (which would be the best - so any ideas here would be great).
In any case, what happens is that about half the objects get processed with each run of the script. For example if I had 1030 objects about half gets processed, and then half of that and so on. I may take 4 runs to get everything processed in the folder. On a normal Folder or CMF Portal Folder, all objects get processed in one run.
I'm just guessing, but it may be that in CMFBtreeFolder, objectItems() returns not a list but some kind of iterator that lazily gets the next batch of images. I don't know what you're *doing* with those images you find, but if the result is to change the result of what objectItems() would return, you might be getting into undefined behavior territory.
Easy ways to find out would be 1) read the source of CMFBTreeFolder and maybe base classes.
2) try looping over list(start_dir.objectItems(['Portal Image']) instead and see if that just works. But maybe unacceptable performance if the list is really really huge.
FWIW, I'd take a different approach (I think... hard to be sure since I don't know what you want to DO with these images): Use workflow.
DCWorkflow provides for "automatic" transitions, where an object in one state can automatically transition to another state whenever some guard condition is met.
So, write a script like "postprocessImage" that does what you want to *one* Image, add it to your Images workflow as a script in an automatic transition that always occurs from the workflow's initial state. The result is that this script will get called on every image exactly once just after it's created.
There's a good doc on DCWorkflow somewhere, PDF I think. Google should find it.
--
Paul Winkler http://www.slinkp.com _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
On Mon, Feb 28, 2005 at 12:28:15PM -0400, David Pratt wrote:
Hi Paul. Thank you for your reply. I think you are right on what is happening with BTree and will try what you have suggested. I was just going to try looping over the data but thought of this as strange and should consult the list. I think the workflow approach to the problem is likely the better one for sure since I already have the logic for what I want to do with each image. Can this approach work on images on the uploads folder only without affecting the behavior of images across the entire portal?
The workflow will apply to all images, but:
For example, if I set a boolean property on the uploads folder and tested for it
That should do the trick. Use that as your guard condition for the automatic transition. It occurs to me that the workflow will basically keep trying that guard condition for all Portal Images in your site and won't stop until they leave their initial state. There might be performance implications there :-) There's ways around this, assuming that performance testing reveals it is a problem (which should not be assumed). One way would be to add another state to your workflow, let's call it NewImageState or something. Set this state to be the default initial state. Have only one transition from it - the automatic transition we discussed. If the image is in the Uploads folder, the automatic transition calls your processing script and deletes the image, aborting the transition. Otherwise, the transition's target state is whatever your old initial state was (e.g. "private"). This way, the transition is called exactly once. Note I have not tested this; deleting the object in the middle of the transition strikes me as a bit funky - it might "just work" or not. Maybe this should be done as the transition's "before" script.
I will look for the DCWorkflow docs in the meantime. Many thanks.
First hit on google: http://www.zope.org/Members/hathawsh/DCWorkflow_docs They're old but AFAIK still accurate. THere's also a good chapter in Andy McKay's Plone book, but IIRC it doesn't cover automatic transitions. -- Paul Winkler http://www.slinkp.com
This is very helpful. Thank you Paul. I will give this a go and let you know how it works out. Overall, if I can get this to work for me, it will also reduce RAM consumption on the server - a much better solution than trying to handle everything in a batch process. :-) Regards David On Monday, February 28, 2005, at 01:56 PM, Paul Winkler wrote:
On Mon, Feb 28, 2005 at 12:28:15PM -0400, David Pratt wrote:
Hi Paul. Thank you for your reply. I think you are right on what is happening with BTree and will try what you have suggested. I was just going to try looping over the data but thought of this as strange and should consult the list. I think the workflow approach to the problem is likely the better one for sure since I already have the logic for what I want to do with each image. Can this approach work on images on the uploads folder only without affecting the behavior of images across the entire portal?
The workflow will apply to all images, but:
For example, if I set a boolean property on the uploads folder and tested for it
That should do the trick. Use that as your guard condition for the automatic transition.
It occurs to me that the workflow will basically keep trying that guard condition for all Portal Images in your site and won't stop until they leave their initial state. There might be performance implications there :-) There's ways around this, assuming that performance testing reveals it is a problem (which should not be assumed).
One way would be to add another state to your workflow, let's call it NewImageState or something. Set this state to be the default initial state. Have only one transition from it - the automatic transition we discussed. If the image is in the Uploads folder, the automatic transition calls your processing script and deletes the image, aborting the transition. Otherwise, the transition's target state is whatever your old initial state was (e.g. "private"). This way, the transition is called exactly once.
Note I have not tested this; deleting the object in the middle of the transition strikes me as a bit funky - it might "just work" or not. Maybe this should be done as the transition's "before" script.
I will look for the DCWorkflow docs in the meantime. Many thanks.
First hit on google: http://www.zope.org/Members/hathawsh/DCWorkflow_docs They're old but AFAIK still accurate.
THere's also a good chapter in Andy McKay's Plone book, but IIRC it doesn't cover automatic transitions.
--
Paul Winkler http://www.slinkp.com _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
participants (4)
-
David Pratt -
Jens Vagelpohl -
Max M -
Paul Winkler