[Checkins] SVN: topia.termextract/trunk/s * Add example.txt to the documentation.
Stephan Richter
srichter at gmail.com
Sat May 30 12:10:45 EDT 2009
Log message for revision 100560:
* Add example.txt to the documentation.
* Improve text in example.txt.
Changed:
U topia.termextract/trunk/setup.py
U topia.termextract/trunk/src/topia/termextract/example.txt
-=-
Modified: topia.termextract/trunk/setup.py
===================================================================
--- topia.termextract/trunk/setup.py 2009-05-30 15:55:46 UTC (rev 100559)
+++ topia.termextract/trunk/setup.py 2009-05-30 16:10:45 UTC (rev 100560)
@@ -35,6 +35,8 @@
+ '\n' +
read('src', 'topia', 'termextract', 'README.txt')
+ '\n\n' +
+ read('src', 'topia', 'termextract', 'example.txt')
+ + '\n\n' +
read('CHANGES.txt')
),
license = "ZPL 2.1",
Modified: topia.termextract/trunk/src/topia/termextract/example.txt
===================================================================
--- topia.termextract/trunk/src/topia/termextract/example.txt 2009-05-30 15:55:46 UTC (rev 100559)
+++ topia.termextract/trunk/src/topia/termextract/example.txt 2009-05-30 16:10:45 UTC (rev 100560)
@@ -1,6 +1,6 @@
-==============
-A News Article
-==============
+===========================
+An Exmaple - A News Article
+===========================
This document provides a simple example of extracting the terms of a BBC
article from May 29, 2009. We will use several term extraction tools to
@@ -348,15 +348,21 @@
area NN area
. SENT .
+As you can see, the identification of TreeTagger is pretty good, but the
+output would need some analysis to produce a useful set of terms. Furthermore,
+TreeTagger is not free for commercial use.
-Topia POS Tag
--------------
+Topia's Term Extractor
+----------------------
-Topia POS Tag tries to produce results somewhere between a simple tagger like
-TreeTagger and Yahoo Keyword Extraction. We try to achieve that by first using
-a POS Tagger followed by applying a simple term constructor and relevance
-calculation,
+Topia's Term Extractor tries to produce results somewhere between a POS
+tagger like TreeTagger and Yahoo Keyword Extraction.
+Since we are only interested in nouns, a very simple POS tagging algorithm can
+be deployed, which will provide good results most of the time. We then use
+some simple statistics and linguistics to produce a narrow but strong list of
+terms for the content.
+
>>> from topia.termextract import extract
>>> extractor = extract.TermExtractor()
More information about the Checkins
mailing list