sean.upton@uniontrib.com wrote:
I'm deploying a new (eventually to be very-heavily loaded) ZEO storage server on a DP Xeon/Prestonia box (running Linux 2.4.18 compiled from source from a RedHat Errata kernel source RPM with low latency scheduling enabled); the box has hardware-multi-threading capabilities; ZEO will be serving what I anticipate will be a FileStorage storage of several gigabytes sitting on a 8-spindle RAID 10. All ZEO traffic will be coming from networked ZEO clients on a Copper-GB network.
One of the thoughts that I had was that it seemed at least theoretically possible that if the box is very loaded, that the HMT would help in terms of using spare CPU cycles in handling interrupts from the GB NIC and RAID Controller, but I worry that enabling HMT will compromise Python performance. Any thoughts on whether HMT/HyperThreading will help or hinder ZEO performance given Python's GIL? Could I get an accurate picture of any degradation by simply running pystone tests with HyperThreading disabled and enabled?
Well, if I had to make a guess, I would guess your performance will be worse with HyperThreading enabled. The reason for that is HyperThreading is most efficient when able to dispatch to different functional units within the CPU; this basically translates with being able to run an integer and a floating point context simultaneously. It's not QUITE that limited, but it is no where near the same as having two separate CPUs. Python however, is going to only be able to function down the 'systems programming' pipe (lots of integer work, pointer manipulation, and branching). This means that the second hyperthread CPU is effectively useless for Python. Thus, the problem of python latency is compounded because again, you schedule work for a CPU that cannot perform it. In this case, the CPU has to switch contexts, and THEN fail to acquire the GIL. Remember, that's only a guess. I dont *know* that to be how it works, but its an opinion formed on how I think hyperthreading works. Running the pystone bench will NOT tell you how bad things get, because it is a serial benchmark. You could try to finesse it by tweaking the pystone source to dispatch multiple pystone threads, and then adding the results together. You can't run multiple pystones (well you COULD but it wont be the same) because each pystone process will hold its own GIL.
In comparing kernel source, it looks that, unlike the stock kernel, RedHat's 2.4.18-4 kernel source has a reworked kernel/sched.c that has set_cpus_allowed() call, which should, in theory, prevent migration of a task from a CPU. Has anyone tried this with Python?
I think this is a patch that has been around for a while to allow you to set the 32 bit CPU dispatch mask (one bit for each CPU in the system). It may or may not also be coupled with a /proc filesystem patch. -- Matt Kromer Zope Corporation http://www.zope.com/