[Zope] zope.org inaccessible -- KebasData, anyone?

Keith J. Farmer kfarmer@thuban.org
Mon, 7 Jan 2002 16:16:55 -0800


Well, it's nifty, but I'm having difficulty constructing a correct re to
handle this sort of expression:

<TR VALIGN=3D"TOP" ALIGN=3D"left"><TD><FONT SIZE=3D-2
FACE=3D"Arial">1072&nbsp;&nbsp;</FONT></TD><TD><FONT SIZE=3D-2 =
FACE=3D"Arial">
1:16PM&nbsp;&nbsp;</FONT></TD><TD NOWRAP><FONT SIZE=3D-2 =
FACE=3D"Arial"><A
HREF=3D./ii.asp?Center=3DGGCC&LogNumber=3D1072D0107 =
TARGET=3D"ii">Traffic
Collision - Property Damage</A>&nbsp;&nbsp;</FONT</TD><TD><FONT =
SIZE=3D-2
FACE=3D"Arial">SR1 JNO TAM JUNCTION&nbsp;&nbsp;</FONT></TD><TD
NOWRAP><FONT SIZE=3D-2 FACE=3D"Arial">Marin&nbsp;&nbsp;</FONT></TD></TR>

I want the text (minus the tags and entities), and I want the url from
the <a> tag.  Technically, I want the data from all the cells, but since
I've had difficulty, I'm attempting to do it by parts.

The following doesn't quite do it at the <td> level:

<TD[^>]*>(?:<[^>]*>)*<A HREF=3D(\S+).*?>([^<>]*?)(?:<[^>]*>)*</TD>

The source URL is:
http://cad.chp.ca.gov/sa_stcc.asp?centerin=3DGGCC&style=3Dl

I get the following data:

Data: [('./ii.asp?Center=3DGGCC&LogNumber=3D1477D0107', '&nbsp;&nbsp;'),
('./ii.asp?Center=3DGGCC&LogNumber=3D1463D0107', '&nbsp;&nbsp;'),
('./ii.asp?Center=3DGGCC&LogNumber=3D1460D0107', '&nbsp;&nbsp;'),
('./ii.asp?Center=3DGGCC&LogNumber=3D1457D0107', '&nbsp;&nbsp;'),
('./ii.asp?Center=3DGGCC&LogNumber=3D1447D0107', '&nbsp;&nbsp;'),
('./ii.asp?
.. etc

----------
Keith J. Farmer
kfarmer@thuban.org
http://www.thuban.org