RE: [Zope] zope.org inaccessible -- KebasData, anyone?
Well, it's nifty, but I'm having difficulty constructing a correct re to handle this sort of expression: <TR VALIGN="TOP" ALIGN="left"><TD><FONT SIZE=-2 FACE="Arial">1072 </FONT></TD><TD><FONT SIZE=-2 FACE="Arial"> 1:16PM </FONT></TD><TD NOWRAP><FONT SIZE=-2 FACE="Arial"><A HREF=./ii.asp?Center=GGCC&LogNumber=1072D0107 TARGET="ii">Traffic Collision - Property Damage</A> </FONT</TD><TD><FONT SIZE=-2 FACE="Arial">SR1 JNO TAM JUNCTION </FONT></TD><TD NOWRAP><FONT SIZE=-2 FACE="Arial">Marin </FONT></TD></TR> I want the text (minus the tags and entities), and I want the url from the <a> tag. Technically, I want the data from all the cells, but since I've had difficulty, I'm attempting to do it by parts. The following doesn't quite do it at the <td> level: <TD[^>]*>(?:<[^>]*>)*<A HREF=(\S+).*?>([^<>]*?)(?:<[^>]*>)*</TD> The source URL is: http://cad.chp.ca.gov/sa_stcc.asp?centerin=GGCC&style=l I get the following data: Data: [('./ii.asp?Center=GGCC&LogNumber=1477D0107', ' '), ('./ii.asp?Center=GGCC&LogNumber=1463D0107', ' '), ('./ii.asp?Center=GGCC&LogNumber=1460D0107', ' '), ('./ii.asp?Center=GGCC&LogNumber=1457D0107', ' '), ('./ii.asp?Center=GGCC&LogNumber=1447D0107', ' '), ('./ii.asp? .. etc ---------- Keith J. Farmer kfarmer@thuban.org http://www.thuban.org
participants (1)
-
Keith J. Farmer