Hi Toby,
I'm terribly sorry to bother you like this. I tried mailling lists but no one seems to have the answer :-)
CC zope@zope.org, in case anyone else is interested.
I already posted this days ago on zope@zope.org, but no answer yet ;-)
Have you had any trouble like this?
Yes I have seen that, but I am suprised it applies to you.
You are using ICP. right?
Yes.
A dead peer means that the ICP replies are getting lost. Lost, not just delayed. Is squid really saying "dead peer" in cache.log?
Yes, 'Detected Dead Parent.' All three clients are pronounced dead ( immediatley, I guess, after packing starts. ). Every 10 minutes, a little script runs to check cache.log to prevent this state from going on, and when Squid gets restarted, everthing goes back to normal.
Do you have network packet loss?
No. All machines are hooked up in a dedicated 10M lan using a 100M switch. The ZEO server comes with dual 1000bit ethernets. No packet loss at all.
Or, maybe squid isnt actually declaring the peers as dead... you would get a similar effect if you neglected to put the "round-robin" directive on your cache_peer lines, which controls how it resonds to delayed ICP responses (if this is the case then you may want to tweak icp_query_timeout for performance, but thats a different question)
cache_peer 192.168.250.3 parent 8080 3130 no-digest no-netdb-exchange round-robin cache_peer 192.168.250.4 parent 8080 3130 no-digest no-netdb-exchange round-robin cache_peer 127.0.0.1 parent 8080 3130 no-digest no-netdb-exchange round-robin 'round-robin' is right there :-) I tried 1 to 20 seconds for icp_query_timeout; none makes any difference.
I am still suprised that this applies to you at all, because you ZEO server is on it own machine. right?
Sure. No bandwidth or horsepower problem. It's the baddest machine in the whole setup: 2way P4 Xeon with 4G RAM and stuff.
Is there anything of interest in cache.log?
Nothing unusual. It pronounces all three clients dead and keeps complaining that it failed to select source for objects. 2003/05/09 01:02:32| Detected DEAD Parent: 192.168.250.4/8080/3130 2003/05/09 01:02:32| Detected DEAD Parent: 192.168.250.3/8080/3130 2003/05/09 01:02:32| Detected DEAD Parent: 127.0.0.1/8080/3130 2003/05/09 01:02:32| Failed to select source for XXXX 2003/05/09 01:02:32| always_direct = 0 2003/05/09 01:02:32| never_direct = 1 2003/05/09 01:02:32| timedout = 0 2003/05/09 01:02:32| Failed to select source for XXXX The ZODB packer script runs exactly at 1 AM. Clients are deemed dead at 1:02. This is from last night's cache.log From what I understood... I thought Squid would keep polling the clients and redeclare they're alive and kicking, but it never happens. Before I installed that little cron script, service came to a full stop all night until I realized something went wrong.
What about if you turn on full debugging in squid.conf?
Haven't tried it yet. I'll give it a try.
Can you post a before and after of "Peer Cache Statistics" page in squid cachemanager CGI
Haven't tried it either. Thought it was a **simple** misconfiguration or something. It might not be the case after all. I'll look into this more thoroughly then. Thanks again for your input. Cheers, Wankyu Choi --------------------------------------------------------------- Wankyu Choi CEO/President NeoQuest Communications, Inc. http://www.zoper.net http://www.neoboard.net ---------------------------------------------------------------