[Zope] FW: BASE and IMG and [X]HTML

Charlie Reiman creiman@kefta.com
Mon, 23 Sep 2002 14:40:15 -0700


I asked the www-validtor mailing list what's up with the img vs. base tag
thing in XHTML. Here is the explanation. His comments make sense and I
really have nothing to add other than it looks like I need to switch
hardcore to XHTML if I want anything to validate.

If you want to pester Kynn  then it would be good form to join the
www-validator mailing list.

Charlie.

-----Original Message-----
From: www-validator-request@w3.org
[mailto:www-validator-request@w3.org]On Behalf Of kynn@idyllmtn.com
Sent: Monday, September 23, 2002 2:01 PM
To: Charlie Reiman
Cc: www-validator@w3.org
Subject: Re: BASE and IMG and [X]HTML



Charlie Reiman asked:
> We've been having a discussion on the zope mailing list regarding the
> validator's behavior with <base ... />. In particular, an HTML 4.01
> transitional document is not allowed to use <base ... />. Instead, it is
> expected to use <base ...>.
>
> Well, okay. I don't like it but I accept the reasoning. But why does it
not
> complain about <img ... />? Isn't this the same situation?

Hi, Charlie, it goes like this.  You and I both know that HTML 4.01 is
html written according to SGML rules, and XHTML 1.0 is html written to
XML rules.

In the SGML-HtML rules, the closing > on a tag is actually optional in many
cases, and when it's not there, the assumption is that it's meant to be
there, and anything else is just character data.

Why is that important?  Well, it makes sense when you combine it with
someone else -- a slash / can't appear inside the tag in SGML-HTML.

So when an SGML-HTML application (such as the validator) sees the
following:

     <img src="blah.jpg" alt="Blah!" />

It reads it as:

<
     Okay, the start of a tag.
img
     Aha, this is the image element
src="blah.jpg"
     Okay, this is an attribute
alt="Blah!"
     This is another attribute
/
     Wait, what the heck is this?  This can't appear inside this tag.
     Oh, I get it.  The tag actually closed after the last valid
     attribute, they just didn't include the >.  Okay, so the / is
     some character text data after the tag.
>
     Hmm, I guess this is still character text data.

So really it reads it as:

     <img src="blah.jpg" alt="Blah!">/&gt;

Now, the browsers out there aren't really SGML applications.  So they
don't follow the SGML rules properly, and won't read it that way.
Instead they'll read it as:

     <img src="blah.jpg" alt="Blah!" [SOMETHING I DON'T KNOW SO I WILL
       IGNORE]>

...which means it will display as you'd expect.

Okay, so what's the problem with <base /> not being allowed but
<img /> is?

Simple:  The HTML specs don't allow "raw" character data text to
         appear inside the <head>, but they do allow it to appear
         inside the <body>.

When you write <img />, that extra /&gt; -- as the validator reads it --
follows the <img> and is within the <body> text, where character text
is perfectly valid.  When you write <base />, the /&gt; appears in
the <head> element, and that's NOT allowed, so the browser throws an
error atcha.

--Kynn