importFromURL not working?
Hi everyone, I feel like I'm spamming the list, but I'm just about ready to roll out my brand-new, shiny, Zopified Web site for my school. A couple students and I have been working hard, and this is the last thing on our "To-Do" list before saving our version and sitting back to enjoy the "oohs" and "ahhs" that will undoubtedly come. :-) I'm using Edd's SiteSummary product to import news to our site. So far, so good. It seems to work like a champ. I've created a shellscript to update the news periodically via cron. The script looks like: #!/bin/sh lynx -source http://localhost/headlines/freshmeat.net/importFromURL?url=\ http://freshmeat.net/backend/fm.rdf lynx -source http://localhost/headlines/xmlhack.com/importFromURL?url=\ http://xmlhack.com/rss.php lynx -source http://localhost/headlines/Zope.org/importFromURL?url=\ http://www.zope.org/SiteIndex/news.rss Now I'm trying to get some news from Moreover.com, but the import fails. I'm wondering if it has to do with the filename of the XML source (I'm getting desparate). Does anyone know of a reason that importFromURL would choke on a URL like http://www.moreover.com/cgi-local/page?index_environment+rss I know the file itself isn't the problem because I saved it to a local disk with a different name and it imported fine. Ideas? -Tim -- Timothy Wilson | "The faster you | Check out: Henry Sibley H.S. | go, the shorter | http://slashdot.org/ W. St. Paul, MN, USA | you are." | http://linux.com/ wilson@visi.com | -Einstein | http://www.mn-linux.org/
Timothy Wilson wrote:
lynx -source http://localhost/headlines/Zope.org/importFromURL?url=\ http://www.zope.org/SiteIndex/news.rss
[snip]
http://www.moreover.com/cgi-local/page?index_environment+rss
I know the file itself isn't the problem because I saved it to a local disk with a different name and it imported fine. Ideas?
How are you specifying it in your cron file? Looking at the two lines above, I can see a potential problem if you are writing: lynx -source http://localhost/headlines/Zope.org/importFromURL?url=\ http://www.moreover.com/cgi-local/page?index_environment+rss ...since this will give you a URL with two '?' in it, and that'll break one way or another. You can try URL-quoting the address, but passing one URL as a query argument to another is inherently fragile. Have you considered packing up all of your fetches into a single DTML Method '/headlines/getAll' along the lines of: <dtml-with Zope.org><dtml-call expr="importFromURL(this(), REQUEST, url='http://www.zope.org/SiteIndex/news.rss')"></dtml-with> <dtml-with moreover><dtml-call expr="importFromURL(this(), REQUEST, url='http://www.moreover.com/cgi-local/page?index_environment+rss')"></dtml-with> ... etc Then, in your cron file, just do: lynx -source http://localhost/headlines/getAll Cheers, Evan @ 4-am
participants (2)
-
Evan Simpson -
Timothy Wilson