[Zope-CMF] Dublin Core Subject Qualifier Implementation

Wed, 20 Feb 2002 19:06:28 +0800

Hi

I would like to add my 2c worth on the general topic of Metadata etc..
whilst the topic is currently raised.

We have been extending the defaultDublinCore implementation with Hotfix's
to add additional elements such as those found in the AGLS, but I have 
to say
it seems to be about the only way to transparently add it to all content 
objects.

I find this starts to show deficiencies in the metadata model as implemented
in current CMF because we see inside objects such as all the init 
methods for content types in
CMFDefault actually specifying all of the core dublic core elements as 
arguments (whoops).

Which means that if you really wanted to take advantage of their use in 
init
methods I would then have to override a whole bunch of init methods which
would not be fun. Thankfully we can go through metadata edit etc...

But what it does show is that we need to have a different model of 
defining and registering
metadata schemas to be included with content objects.

At a simplistic level using property sheets might work, but as this 
discussion
on IPTC subject qualifiers, and my own experience shows,  I need (at least)
to be able to have custom getter and setter methods for each of the elements
to enforce validity of contents etc,...

My wish would be to have a metatdata registry (a bit like the types 
registry)
where one could add metadata elements, plus bind getter and setter methods
to each element.

ohh I should have checked in the Zope3 discussions to see if this sort 
of thing
or a similiar dynamic binding model has been discussed. It's late, and I 
should be home.

Any thoughts on this?

Because I may have to look at doing this pretty soon.

Tim

seb bacon wrote:

>Sean, 
>
>That's a really interesting idea.  It would be a great thing to
>integrate with the CMF.  
>
>Here's some of my thoughts, since you asked ;-)
>
>The namespace qualifier seems like a good idea.  
>
>The language aspect should be dealt with by l18n structures rather than
>on the application level, e.g. the system locale (I've never looked at
>ZBabel etc so I'm not up on the accepted way of doing this).
>
>The UI problem of selecting a subject from 1000s has been discussed on
>the list before - have a search around for ideas.  My feeling is that
>the best way of doing this is to arrange the subjects heirarchically. 
>For example, there are 17 categories in the IPTC subjects.  The UI
>should allow you to select an entire category as well as its
>subdivisions.
>
>The internal representation should be an XML-like tree, which you could
>manipulate in a similar way to XML (like a SAX parser, for example). 
>The tool could have an 'import' function, so people can load in
>specialist vocabularies - possibly from an XML format?
>
>The job of mapping between id and name shouldn't be tricky - you should
>only ever specify an id to the tool, and it could always return (id,
>name) tuples.  I noticed that a lot of subjects in the specs you mention
>have descriptions too - you could make it a (id,name,description) tuple
>or, something similar, to expose this.   
>
>Regarding vocabulary, you could optionally supply a vocabulary id to
>each method, or you could rely on a default vocabulary which can be set
>by the user.
>
>I'd be tempted to miss out the icon thing, although it's a nice idea. 
>It's only any use if the application requires it, and someone has the
>time to generate 1000s of icons - wouldn't this be a minority of cases? 
>Anyway, here's my take on the interface:
>
> getSubject(subject_id, vocabulary=None):
>   "return (id, name) tuple"
>
> searchSubjects(search_term, vocabulary=None):
>   """do a text search of subject names,
>      return list of (id, name) tuples"""
>
> getChildSubjects(subject_id, vocabulary=None):
>   "return list of children of subject_id"
>
> getParentSubject(subject_id, vocabulary=None):
>   "return parent of subject_id"
>
> getSiblingSubjects(subject_id, vocabulary=None):
>   "return siblings of subject_id"
>
> getRootSubjects(subject_id, vocabulary=None):
>   "return list of root subjects"
>
> setDefaultVocabulary(vocabulary):
>   "set a default vocabulary, return None if it doesn't exist"
> 
> setSubject(subject_id, subject_name, vocabulary):
>   "add a new subject to vocabulary"
>
> getVocab(subject_id):
>   "return a (id, name, description) tuple for the volcabulary of the
>specified subject"
>
>
>
>On Tue, 2002-02-19 at 20:46, sean.upton@uniontrib.com wrote:
>
>>Hey everybody,
>>I am looking at implementing Dublin Core Qualifiers for Subject metadata as
>>a means of expressing subjects within multiple controlled and standardized
>>vocabularies (namely, IPTC subjects for news and sports stuff, and NAICS or
>>SIC codes for Business information), in addition to supporting plain-text
>>subject vocabularies as well.  Is there any established pattern or syntax
>>for dealing with subject codes this way in the CMF?  I haven't found
>>anything, so I have been thinking about a solution... my thoughts are below.
>>
>>The Dublin Core Qualifiers spec has several recommended element encoding
>>schemes for LC and medical subjects, but nothing excludes other
>>industry-standard subject vocabularies, such as IPTC (news/media, worldwide)
>>or NAICS (used by North American governments, business/economic news, and
>>yellow pages), or market names (stock tickers).
>>	http://www.dublincore.org/documents/dcmes-qualifiers/#subject
>>	http://www.iptc.org/
>>	http://www.census.gov/epcd/www/naics.html
>>
>>My first hunch is that the best way to convey a namespace/qualifier for a
>>subject code system is with a colon in the text, separating the vocabulary
>>"NAME" (in dcmes-qualifers terms).  My second hunch is that I need to create
>>a subject lookup tool that performs lookups for "human-readable"
>>counterparts for codes, so that codes with qualifiers can get a description
>>that makes sense to a content user.  I also think such a framework might be
>>useful for content creators if the user interface for metadata entry enabled
>>efficient lookup with these codes (the biggest UI issue is that number of
>>these codes may be in the order of thousands, something like a popup
>>search/browse dialog might be appropriate).
>>
>>Example lookup/translation input/output:
>>
>>	NAICS:511110  --> "Newspaper Publishers"
>>	IPTC:01016000 --> "Television"
>>	NASDAQ:MSFT --> "Microsoft Corporation"
>>	Media Companies --> "Media Companies" (verbatim translation of
>>unqualified text)
>>
>>This tool should support internationalization (or is it localization?) of
>>description lookup, because these vocabularies are often defined by
>>multi-national organizations (thus multi-lingual lookup tables might exist,
>>for example IPTC supports most Eurpoean languages, Turkish, and Arabic);
>>this isn't to say that one need implement every language a vocabulary
>>supports to satisfy this, but that the interface for this tool should
>>support a language encoding parameter for this purpose, so that a
>>multi-lingual site can support multiple languages with one vocabulary
>>(SignOnSanDiego publishes content in English and some Spanish).
>>
>>In use of this tool, there would still be interfacing issues to make this
>>work with the metadata tool and content types, both in terms of suporting a
>>user interface for massive amounts of subject codes, as well as determining
>>when to display the code and when to display the lookup description...
>>
>>I'd be interested to see what people think about this.  I wrote some
>>interface documentation, which is pasted below that might help in explaining
>>my idea.  Thoughts?
>>
>>Thanks,
>>Sean
>>
>>#####################################
>>##################################### 
>>
>>import Interface
>>
>>class portal_subjectlookup(Interface.Base):
>>      """
>>        Interface for registry of subject code qualifier
>>        vocabularies.  Among other things that a tool 
>>        implementing this interface should do is provides the
>>        ability to query with a code, language, and
>>        vocabulary, and get descriptions.
>>      """
>>
>>      def getDescriptionFromCode(code, vocabulary=None, language='en-US'):
>>          """
>>            Lookup code in registry specified by vocabulary for language 
>>            specified in language.
>>
>>            Pre-condition:  code is a string object and is not None
>>            Post-condition: a string is returned with a human-readable
>>                            text description (string) for a code in
>>                            the language specified, if available.
>>
>>                            If a registry implementation in the tool is not
>>                            available in the language specified, a default 
>>                            language should be used.
>>
>>                            If no viable option can be found in lookup, 
>>                            method should return None.
>>
>>            Notes: sorry about the ethnocentrism in the language default.
>>          """
>>
>>       def findCodeByKeyword(query, vocabulary=None, language='en-US'):
>>          """
>>            Used primarily by content producers, or agents on their behalf.
>>
>>
>>            This method is used to find a correct code, for a piece of
>>content
>>            when the code is unknown, but the subject matter is.  This
>>            allows a query, which can be either a single string keyword, or
>>a
>>            sequence of keyword strings.  The query is an "or" query, so
>>that if
>>            query == ['foo','bar'] topic codes with descriptions matching
>>both
>>            should be returned.
>>
>>            Pre-condition:  query is a string or a sequence of strings and
>>                            is not None.  If query is a sequence, a query
>>                            will be performed for all terms as specified
>>                            above.  If vocabulary is specified, only search
>>                            that vocabulary, otherwise a 'search all' is
>>                            assumed.
>>
>>            Assumptions:    it is assumed that the query that is passed to
>>this
>>                            method should match with a wildcard on the end 
>>                            of each keyword, so that a query of
>>                            ['bio','tech','medi'] would find biotechnology,
>>                            technology, medicine, technical, medical, etc...
>>
>>            Post-condition: a sequence of matches is returned, where a match
>>                            is a tuple of vocabulary, code, and description
>>                            in the language of choice.
>>
>>                            If a registry implementation in the tool is not
>>                            available in the language specified, a default 
>>                            language should be used.
>>
>>                            If no viable option can be found in lookup, 
>>                            method should return None.           
>>          """
>>
>>       def listAllCodes(vocabulary=None, language='en-US'):
>>          """
>>            This method lists all entries in lookup tables for subject
>>            vocabulary codes, either globally, or within a particular
>>            vocabulary.  Output is similar to findCodeByKeyword()...
>>            
>>            Assumptions:    If vocabulary is not None, then search 
>>                            globally across all vocabularies present
>>                            in this tool.
>>                           
>>            Post-Condition: a sequence of entries is returned, where an
>>                            entry is a tuple of vocabulary, code, and
>>                            description in the language of choice.
>>
>>                            If a registry implementation in the tool is not
>>                            available in the language specified, a default 
>>                            language should be used.
>>
>>                            If no viable option can be found in lookup, 
>>                            method should return None.
>>          """
>>       def getIconPathForSubject(code, vocabulary=None):
>>          """
>>            Attempts to find an icon path registered for a code/vocabulary
>>            combo.  Since vocab is optional, this could potentially need
>>            to look through the registry for entries in all vocabularies.
>>
>>            Returns a list of "wrapped-icons" where a wrapped-icon is a 
>>            tuple containing the icon width, icon height, and icon path
>>            as a list; example: 
>>            [ (32,32,['path','to','images','subj32.png']),
>>              (16,16,['path','to','images','tiny','subj16.png']) ]
>>          """
>>
>
>
>
>
>_______________________________________________
>Zope-CMF maillist  -  Zope-CMF@zope.org
>http://lists.zope.org/mailman/listinfo/zope-cmf
>
>See http://www.zope.org/Products/PTK/Tracker for bug reports and feature requests
>