Erik,
[Jason Spisak]
| I am running on a big machine though. If anyone wants those changes | there's really easy. Just mail me directly, since it's a long file | to post.
Hi. I would be interested in the file :-).
Okay, here's the diff. It truely is nothing more than cutting out the two parts that eliminate single letter words and numbers: *** Zope-2.2.4-src/lib/python/SearchIndex/Splitter.c --- Zope-2.2.4-src/lib/python/SearchIndex/Splitter_Old.c *************** *** 169,192 **** len = PyString_Size(word) - 1; len = PyString_Size(word); - /*if(len < 2) Single-letter words are stop words! - { - Py_INCREF(Py_None); - return Py_None; - } */ - - /************************************************************* - Test whether a word has any letters. */ for (; --len >= 0 && ! isalpha((unsigned char)cword[len]); ); - /*if (len < 0) - { - Py_INCREF(Py_None); - return Py_None; - } - - * If no letters, treat it as a stop word. - *************************************************************/ Py_INCREF(word); --- 169,176 ----
Would you also be willing to share some statistics on how many objects you have in how many indexes, and how much time "complex" searches take? I do understand if this is not possible, but it'd be appetiated if it was possible. :-)
Thanks.
Well, here's the some output of the "Status" tab in the Catalog. Subtransactions are Disabled Subtransactions --------------------------------------------------------- Index Status * 48205 object are indexed in bobobase_modification_time * 48205 object are indexed in calendar_date * 48205 object are indexed in calendar_day * 48205 object are indexed in call_date * 48205 object are indexed in curators * 48205 object are indexed in data * 48205 object are indexed in id * 48205 object are indexed in meta_type * 48205 object are indexed in resume_in * 48205 object are indexed in status * 48205 object are indexed in users_calendar The only TextIndex is the 'data' index though. It is the one that gets hammered. Let's see...time stats...hmmm I put a REQUEST.set with the ZopeTime at the top of the search page and at the bottom after the 'in' tag for the Catalog. Search terms are: los and angeles and C++ and MFC and 310 Subtracting the float of the two times I get 1.85400104523 I'm not sure what that comes out to, I think it's part of a day though because of DateTime. The server stats: Dual Intel 400mhz Xenon w/ 1MB cache each LVD RAID 5 7200 RPM disk array 1GB RAM RedHat Linux 6.1 with some kernel updates... And the best piece of open source software I know: Zope 2.2.4 binary release Hope that helps. All my best, Jason Spisak CIO __ ___ ______ __ / // (_)_____/_ __/__ ____/ / ___ _______ __ _ / _ / / __/ -_) / / -_) __/ _ \(_-<_/ __/ _ \/ ' \ /_//_/_/_/ \__/_/ \__/\__/_//_/___(_)__/\___/_/_/_/ 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.