[Zope-dev] Personalization (was RE: [Zope-CMF] List of subject/metadata sets?)

Bjorn Stabell bjorn@exoweb.net
Wed, 6 Jun 2001 10:43:15 +0800


Ok, time to throw in my RMB 0.166 (US$ 0.02).


SITE VOCABULARY

In other CMS's, e.g., SiteServer, Spectra (Cold Fusion), and Vignette
the main use of the subject/keywords is for user profiling and
personalization; although they call it Vocbulary, Site Categories, and
Category:Keywords respectively.  I see the restriction of
subject-keywords in CMF as the beginning of standard personalization in
Zope... something we sorely lack.  Vignette's category:keywords are only
two-level, whereas the others are usually multi-level, i.e., a tree.
Have a look at http://www.siteserver101.com/book/chapter7/pm16.asp and
related pages for info on how SiteServer does it.

Basically, you're right in that each site's keywords will be different,
depending on its use (intranet, vertical of some sort, etc).  An example
of this is the DocumentLibrary product that has a legal hierarchical
index.  Over time we should collect example indexes to easy
personalization for websites.


USER PROFILING

Obviously, we know quite a lot about the visitor before they do anything
on the website; we know which languages they prefer (or we think we do
from Accept-Language), we know what time they accessed the website, if
they have accessed our website before (if we put a cookie on their
browser), we know where they accessed from (not for sure, of course)
etc.  This is good, but it doesn't really give us any indication of what
the interests of this visitor are, so we can't make our site a good shop
keeper recommending things.  We need to create a user profile.

One way to do user profiling is to tag the visitor with the keywords of
all the pages he or she views, probably also tracking how many times a
visitor has seen pages with this or that keyword.  This kind of data
gathered from the visitors behavior is called implicit data.  It is very
useful, especially since the visitor doesn't have to log in for it to be
effective (you can just use cookies to give the user a 'psudo user').
Later, when the visitor logs in, we can merge the profile of the "pseudo
user" with the real user.

Another way is to simply ask a logged in user to fill out a form
selecting which information he or she is interested in.  Data gathered
through forms in this way is called explicit data.

In any case, you end up with a list of keywords that the user is
interested in, maybe even weighted (if it is implicitly collected).


PERSONALIZATION RULES

Now we need a set of rules, hopefully editable by a normal content
manager, to control the personalization on miscellaneous pages.  The
fact that "business rules" were editiable by normal users, especially
useful for shops that want to run many different types of promotions,
was one of the major selling points of Broadvision.

The personalization rules are basically different queries on the user
profiles.  Let's say we have a sports site with a (stripped down) site
subject vocabulary like this:

	/sports/
	/sports/basketball/
	/sports/basketball/redbulls
	/sports/football/
	/sports/football/manchesterunited
	/sports/golf

Let's say a visitor has an implicitly gathered (through tracking
browsing) user profile like this:

	KEYWORD			"HITS"
	/sports/				11
	/sports/basketball/		4
	/sports/basketball/redbulls	4
	/sports/football			6
	/sports/football/manchesterunited	6
	/sports/golf			1

We can have rule like this:

	Show [the sport the visitor likes the most]
	=3D which of /sports/* have the most hits =3D football
	(I'm from Europe)

Based on this we can have an area on our site template dedicated to
promoting products that fit the user profile, in this case it would
probably show footballs and football collectibles etc.  We could even go
as far as show collectibles for the team we know he or she is interested
in.


IMPLEMENTATION IDEAS

I'd like to see a new CMF tool, portal_pzn, that would store the
personalization keyword hierarchy and the business rules.  I think that
in the beginning, we can just let personalization rules be small python
expressions returning some results.  We could also group rules into rule
sets, a la SiteServer, to make them more manageable.

Let's say the tool was like this:

portal_pzn/
	vocabulary/
		/sports/
			basketball/
				redbulls
			football/
				manchesterunited
			golf/
	rules/
		set1/
			favorite_3_sports =3D "return
user.behavior.top('/sports/*', 3)"
			discount =3D "if user.type =3D=3D 'VIP': return 20"
etc.

And in our code we would write:

	<dtml-in "portal_pzn.set1.favorite_3_sports(user)"
		><dtml-var "promotions[_['sequence-index']"
	></dtml-in>
		=09
Each user object is extended with two profile objects:

	behavior =3D implicit data; each keyword has an associated "hit"
	interests =3D explicit data; just a list of keywords the user has
explicitly said he's interested in

Just like with sessions, we could store personalization in the ZODB but
it'll probably be too slow for many uses.  We should use a storage that
is fast both for writes and reads, which means it'll probably not be
very persistent.  We could have it synchronize regularly with a more
persistent storage, e.g., ZODB.

Interests are more like subscriptions, and like subscriptions, have
their uses.

This is still simple.  I'm sure, e.g., Broadvision offers much more when
it comes to personalization.

These are just thoughts; I don' thave any plans to implement
personalization yet.  I'd be happy to help others, though :)

Bye,
--=20
Bjorn Stabell <bjorn@exoweb.net>
Exoweb

-----Original Message-----
From: Jon Edwards [mailto:jon@pcgs.freeserve.co.uk]
Posted At: Tuesday, June 05, 2001 23:17
Posted To: Zope CMF
Conversation: [Zope-CMF] List of subject/metadata sets?
Subject: RE: [Zope-CMF] List of subject/metadata sets?


(Note to self : Remember to engage brain before typing!)

Yes, you're right - and as Marc pointed out, you can have everything you
need in one catalog.

I guess where I'm coming from, is that the catalog is serving two
purposes,
and sometimes it gets confusing to a relative newbie like me -

1. For the end user it allows them to search the whole system, and is
quite
often used to "construct" the pages - topics and sub-topics - they are
viewing. I'm assuming it will also be used in some way to construct
composite-documents. With a large site, containing many sub-sections (or
a
large organisation with quasi-autonomous sub-organisations/departments)
it
needs a lot of planning to allow all the possible searches users are
likely
to want.

2. Behind the scenes it is interacting with workflow/workgroups to allow
people to process documents from creation to publication. ...and I'm
still
trying to wrap my brains round how Composite Documents and Versioning
will
fit into it all. This is assuming you have a site with multiple
editors/contributors working on many sections/subsections/pages in
different
workgroups.

I guess at some scale the complexity reaches the point where you need to
separate the two functions... and my needs are somewhere in that
vicinity!
:-)

i.e. you have two portals/catalogs, one for the internal organisation to
do
its workflow, collaboration, discussions, document-sharing (perhaps on
their
intranet)... and one where the finished results are published, and the
public can view, search, discuss. But you still want some two-way
connection
between the two.

Is anyone else reaching similar levels of complexity? Maybe I should
re-factor and "divide and conquer"!!

Sorry if this makes no sense! I think I'm having a bad-brain day!

Cheers, Jon

--------------------------------------------------
> Going off at a tangent, does anyone ever get the feeling that you
maybe
need
> two portal_catalogs? One for end-user searching, topics, etc. and one
for
> the internals, composite-docs and that sort of stuff?

How do you mean?  Wouldn't that mean you end up with lots of objects
being indexed in two places?

seb


_______________________________________________
Zope-CMF maillist  -  Zope-CMF@zope.org
http://lists.zope.org/mailman/listinfo/zope-cmf

See http://www.zope.org/Products/PTK/Tracker for bug reports and feature
requests