Closed Bug 241085 Opened 20 years ago Closed 18 years ago

blogspot.com - Garbage rendered on first load; fixed by reloading

Categories

(Tech Evangelism Graveyard :: English US, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: bugzilla, Unassigned)

References

()

Details

(Keywords: regression, top500)

Attachments

(10 files)

The mentioned url just renders garbage the first time you load it, pressing
reload fixes the layout. 

Pressing shift + reload shows the garbage again.

Steps to reproduce:
1. go to http://amleft.blogspot.com/
2. see garbage, if not, press shift + reload
Attached image Screenshot
I also noticed that (while displaying garbage) CPU goes to 100% if you resize
the window. Rezising it is very sluggish.
As you can see from the website source (but which i didn't realise until AFTER i
uploaded it) the page source is exactly the same both times so its definatly
something to do with html rendering. Another interesting thing is that when run
from my hard drive, no garbage appears. I'll leave it for a CSS guru to ponder
over....

Reproducible on Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.7b)
Gecko/20040415 Firefox/0.8.0+
Moreover in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7b)
Gecko/20040414, Mozilla Suite that is, I can't get the normal right click menu
on this page to work, I only get the "View Selection Source" right click menu
even though I have nothing selected on the web page.
(In reply to comment #5)
> Moreover in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7b)
> Gecko/20040414, Mozilla Suite that is, I can't get the normal right click menu
> on this page to work, I only get the "View Selection Source" right click menu
> even though I have nothing selected on the web page.
That could be Bug 240404.
Well, just before it all becomes all trash, I see for half a second or so the
picture I have attached here
These are the headers I get from the httpheaders extension, when I visit the
page and I get trash.
These are the httpheaders from the good version

The page has an encoding of UTF-8, by the way:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
With a debug build of Mozilla I get an assertion when it is rendered as garbage: 
unexpected progress values 'progress<=progressMax', file
c:/mozilla/debug/netwerk/protocol/http/src/nsHtppChannel.cpp, line 3488

When it is rendered normally, I don't get this assertion
Confirming on latest-1.7 build 2004042109 (just before 1.7 RC1), Windows 98. No
duplicate found.

Especially considering the previous comment, I believe this is HTTP related.  

--> HTTP.

I see this a lot on blogs. Maybe this is blogspot.com related.

1.7 nomination as blog surfers see this a lot.

A bug that is suspiciously similar to this one (reload fixes page display) is
bug 222069. See also bug 187612 for another possibility.
Assignee: general → darin
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking: HTTP
Ever confirmed: true
Flags: blocking1.7?
QA Contact: general → core.networking.http
Summary: Garbage rendered on first load, reloading fixes it → Garbage rendered on first load; fixed by reloading
BTW, the problem tends to be intermittent.
This is quite reproducible for me at
http://corrente.blogspot.com/archives/2004_04_18_corrente_archive.html All I
have to do is a shift+reload or a first visit to the page after clearing my
cache and I can reproduce this problem. I agree that this seems to be widespread
among blogspot-hosted sites. 

This is likely to be high-profile given that blogspot is one of the largest
hosts in the blogging world. We should try to get this fixed for 1.7.

I also see the http1.x 206 partial content that shows up in the log in comment
#8 when visiting the corrente.blogspot.com url.
Flags: blocking1.7? → blocking1.7+
one additional datapoint. if I disable disk cache, I can no longer reproduce the
problem. with disk cache enabled, I can reproduce 100% of the time doing a
shift+reload. widh disk cache disabled, I didn't reproduce after 30 shift+reloads.
Severity: normal → major
hmm... an http log would be nice...

The reason that the page is loaded twice seems to be the following: the page
sends two content-type headers (as shown by /usr/bin/GET -e -d
http://amleft.blogspot.com/):
Content-Type: text/html
Content-Type: text/html; charset=UTF-8

the livehttpheaders log indicates that only the former is used.
Now, the page also has a meta tag for the charset:
                <meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" />

Which means we have to reload the page once this tag is encountered (if the
charset mozilla assumed is different from UTF-8, of course).

Disk cache has the partial source, of course. So Mozilla sends a Range request.
If disk cache is disabled, the whole page is relaoded...

(I'm not 100% sure that that's what happens, but it seems reasonable given the
information in this bug)

what all this does not answer is why it doesn't work... again, maybe an http log
would help: http://www.mozilla.org/projects/netlib/http/http-debugging.html
alternatively/additionally, maybe a network sniffer log...

*** Bug 241672 has been marked as a duplicate of this bug. ***
Bug 225292 has a screenshot that looks sort of like the symptom of this bug, but
it is not from blogspot.com.
Wow, this is really reproducible, even on 1.7RC1 (Build ID: 2004042109).
Shift-reloading can make the site http://amleft.blogspot.com/ go nuts. Same
behaviour with http://afamilyinbaghdad.blogspot.com. The HTML source seems OK,
as stated.

I've made an HTTPLOG and a tcpdump of traffic from my Linux gateway (logging
the SAME connection, i.e. they should correlate).
wow this is interesting.

So when Mozilla first rqeuests the page, it gets a response 200 ok, with
gzip-encoded content.
when it then re-requests it with a range, it gets 206 partial content (which is
fine so far), but this is no longer gzip-encoded. 

I find it likely that this is what is causing problems for mozilla.
Two more guesses. Maybe this is related to 

bug 123662 ("File / cannot be found. Please check the location and try again") 

or 

bug 180831 (The page loads completely blank).
Status: NEW → ASSIGNED
Keywords: regression
Target Milestone: --- → mozilla1.7final
I think I've seen this bug before.  In fact, it is very similar to bug 176222.

Here we are sending a byte range request for a portion of the compressed
document.  However, the server sends us a range from the uncompressed document.
 When Mozilla combines the files, it runs into trouble because part of the file
is compressed and part of the file is uncompressed.

I have a message somewhere from one of the HTTP/1.1 spec authors about the
vagueness of the spec here.  This is in my opinion a site problem because the
response from the server is pretty much useless.

However, we might be able to workaround the problem by re-issuing the request
without the If-range header to force a full response.  In bug 176222, I was
loathe to do so because 1) this seemed to be limited to older versions of IIS,
and 2) the likelihood of introducing a regression was high.  Now that we are up
against 1.7, we are basically confronted with the same situation.

I'll take a look at the code to see what I can come up with.
Whiteboard: [DUPEME]
It might be worth trying to get in touch with blogspot, too. If we can
evangelize them, maybe we can save ourselves some pain. If I can find a contact
there, what do I tell them? 
Hi there,

Asa contact us (the Blogger team), and I took the ball to get back with you. 
I'm the engineering manager for the Blogger team, but I also happen to be an 
Apache HTTPD developer. I think that I've got the right cross section of 
knowledge to help out here :-)

FWIW, we also saw this problem. I didn't follow up because I had thought 
somebody said that it had been fixed in Mozilla. Apparently not :-(  So... I'm 
quite happy to help out here.

We did an analysis of the problem, and came to the same conclusion as Darin: 
Mozilla gets confused because the second response is uncompressed (while the 
first response was compressed). I will note that this is not an Apache 
problem. If you take a look at the HTTP log that Garrit posted, you'll see 
that the first response is advertised as compressed (see the Content-Encoding: 
gzip header in the response). The second response is not compressed (there is 
no Content-Encoding header). This is just as it should be.

I'm not sure that I understand the statement about the HTTP/1.1 spec being 
vague here. The Content-Encoding, Content-Length, and Content-Range headers in 
the responses are consistent in this case, and also clear in the spec. It can 
get a bit hairy to try and figure out whether the C-R applies pre- or post- 
compression, but the spec is clear that it applies to the "entity-body" which 
is pre-compression. The C-L field is the "how much was transferred", which is 
post-compression.

In any case... spec aside, our analysis was that the client was 
misinterpreting the range request, and treating it as compressed data [when it 
was plaintext]. Looking at Garrit's log, I actually see where the client says:

0[243f70]: converter installed from 'gzip' to 'uncompressed'

Which isn't correct for that second response.

Please let me know if there is anything that I can do to help. Needless to 
say, we're also interested in having this fixed in Firefox :-)
Hi Greg,

Thanks for commenting on this bug.  It would be good if you could also review
the comments in bug 176222, specifically:

  bug 176222 comment 33

in which I demonstrate Apache treating the Range header as applied to the
compressed message body.  In that example, if Mozilla issued a range request
using offsets relative to the uncompressed entity, then the wrong data would be
sent by Apache.

Therefore, I believe that perhaps the best solution is for Mozilla to not issue
range requests for partially cached documents that were originally sent with a
Content-Encoding header.  This is the solution that Bradley Baetz suggested in
bug 176222 comment 28 (3rd to last paragraph).

It is apparent that servers do not implement range request for compressed
entities in a consistent manner.  Therefore, Mozilla cannot support range
requests made on a compressed document.
Attached patch v1 patchSplinter Review
This patch disables issuing a range request to complete a partially cached
document that was originally sent with a Content-Encoding header.

There are still problems that can come up with Mozilla's implementation (after
this patch lands).  Namely, if "Content-Encoding: gzip" is applied to a 206
response, then Mozilla will incorrectly try to ungzip the cached portion of the
document as well as the new content range.  That deserves a separate bug and
will be harder to fix.	Hopefully, it is a less common scenario.
Whiteboard: [DUPEME]
In this message from Jeffrey Mogul, you'll find his explanation for why the
HTTP/1.1 spec is 'wrong when it associates an "entity tag" with an "entity"'
and he goes on to make the case that Mozilla's implementation is correct
because 'the use of If-Range in the second request should be enough to protect
against the "disappearing gzip encoding" problem.'

He proposes some workarounds that Mozilla could possibly implement, but I feel
that the workaround in the v1 patch is probably the safest solution for
Mozilla.

It is in large part because of his email message that bug 176222 was assigned
to Tech Evang.

Also, there is a section in "Clarifying the fundamentals of HTTP"
[http://www3.interscience.wiley.com/cgi-bin/abstract/107063479/ABSTRACT]
in which Jeffrey Mogul goes into further detail on this subject.  It looks like
you can download the full PDF after registering with wiley.com (although I have
not yet tried to do so.)
I believe that Jeff is mistaken in his commentary on entity tags. The key fact 
is that an entity tag is based on the *entity* (as the name suggests). What is 
returned in the *message* is an entirely different matter. In his message, he 
posts two different responses which have the same entity tag. This is just 
fine: the underlying entity has not changed. What *has* changed is what is 
being returned by the server. In one case, it is the full entity in a gzip'd 
form. In the other case, it is a range of the entity.

Nothing about the entity has apparently changed, based on the etag value (and 
because I'm assuming his test would not allow them to change, not to mention 
the 2-second gap between responses).

In short, the entity is about the original entity which resides on the origin 
server. It has NOTHING to do with what is returned. Intermediate proxies are 
free to return subranges, alternate codings, or whatever. The etag has nothing 
to do with what is in the message, but everything to do with the original.

Like you: I'm not going to register with wiley.com to fetch some paper, 
especially given that I'm predisposed to believe the paper has an incorrect 
basis.

Now, back to the current question: should Mozilla issue range requests when it 
believe that the server might support the gzip content coding? We know that 
Apache 1.3 is operating fine (based on my observations of blogspot.com). Next 
question is whether IIS and Apache 2.0.x operate correctly. Your post about 
the 2.0.40 response is unclear on what might be happening in there (i.e. is 
that httpd's mod_deflate, the 3rd party mod_gzip, maybe precompressed 
responses, what got negotiated, etc; I know the blogspot config, so it was 
possible to form an accurate picture). But assuming the worst, I would not 
necessarily base a Mozilla change on Apache 2.0.40 behavior [from a year ago]. 
If that version of Apache is buggy (and note that it is not necessarily pure 
2.0.40 -- RH prolly patched it up plenty), then you don't want to tweak 
Mozilla to support those bugs; let the server operators upgrade to a newer 
2.0.x.

From the various bugs, it looks like IIS might have some problems (tho I 
haven't done an in-depth read). If you want to compensate for that: feel 
free :-). Again, I'd just push back on the server.

I would support changes to clients and servers that are broken, but would not 
support working around broken clients/servers. If httpd 2.0.x is borken in 
certain cases, then we should fix httpd [instead of mozilla].
If the problem is the webserver, wouldn't an error message rather than "random"
junk (gobbling up system resources) be a better idea? When loading this junk
memory usage skyrockets and Mozilla is unusable for minutes.

And is it a coincidence I have never seen this on older Mozilla versions?
'wouldn't an error message rather than "random" junk'

Mozilla doesn't really know that it is random junk - it's assuming it's part of
an HTML document. I would imagine that something would have to be added in order
to detect that and give an error. If Mozilla (and BTW this code is shared, so
any change here will also affect Firefox) is going to be changed, then avoiding
the problem in the first place would be the way to go, so, FWIW, Darin's patch
seems like a good idea to me, at least for the 1.7 branch.

'And is it a coincidence I have never seen this on older Mozilla versions?'

I would guess so - as mentioned in the other bug, this has happening since at
least since the time of 1.2/1.3alpha time.
Well, if google used Transfer-Encoding rather than C-E, we'd all be happy. Which
is what mod_gzip should be doing for the dynamic-compression case.

The problem here is that C-E is being used where its not really a content
encoding. Its not really possible to fix that unless you add useragent sniffing
into mod_gzip and other webservers.

I need to think about this a bit more, but a quick thought is that theres
nothing stopping a webserver sending a .foo.gz file as Content-Type: text/foo,
Content-Encoding: gzip (if configured to do so because its appropriate in that
specific case) - Theres no requirement that that webserver be able to understand
gzip encoding to send those headers, but if Range applied to the uncompressed
values then that webserver would have to.

Even if the webserver did understand it, the comment in the RFC that
"frequently, the entity is stored in coded form, transmitted directly, and only
decoded by the recipient." would make this massively inefficient.

Also, what if you had a content encoding which you couldn't uncompress until you
had the entire data stream? How would Range work then if it referred to the
uncompressed data? Would range be unsupportable for C-E: pkzip?

A smaller issue would be the possible requirement to overrequest on the range
for subsequent requests, eg Content-Encoding: bzip2 would mean that you couldn't
request a partial (very large) block, because it couldn't be uncompressed at all
without all the block, so you'd have to throw the partial bit away.
Note that my answer sort of ignores the issue of proxies which I previously
raised, and which Jeffrey responds to. I'm mostly happy with agreeing that its a
spec bug, although I agree that thats a bit of a cop out. I also think that the
case of multiple proxies with different content-encoding configurations is a bit
of a corner case. I'm not actually aware of any proxies which do
content-encoding trasnformations like the spec permits.
C-E vs T-E isn't really the issue. Both cases are well-defined. The use of C-E 
is just fine: the server is stated that the resource lives in an uncompressed 
state and that it has compressed it before providing it to you. Any range 
requests will, therefore, be satisfied against the uncompressed form.

Your example is valid: the server could have precompressed documents and serve 
them up as if the resource was uncompressed. And yes: the server is going to 
have a bit of difficulty if a Range request comes in. It would be possible 
to "fall back" to an uncompressed form and serve the ranges from that.

In the 2.0.40 example that was provided, I feared that this kind of setup was 
being used (note the negotiation that seemed to be used), and am surmising 
that it wasn't configured/prepped properly. Thus, I wouldn't want to see 
Mozilla change for that example, until getting more access to that test case 
scenario to verify that the server *was* acting per the specification. (or to 
verify that it was borken)

And regarding non-streamy compression schemes: the server definitely could 
serve up a range for an pkzip'd document if it was precompressed (since the 
server has the whole thing). It certainly wouldn't return it in a compressed 
form, though.

Trying to compress a range request is where T-E is best (in fact, it would 
probably be impossible to mix a range request and C-E), but I think the 
historic problem has always been whether clients understand T-E well enough. 
In any case, servers using C-E which *is* a valid choice. It is just more 
limiting -- not all responses can be compressed.

But all of this is just so much spec discussion. If 1.7 is close, then Darin's 
patch will get Mozilla over the 1.7 hump, and into a time where a proper fix 
can be built.
(In reply to comment #33)
> In the 2.0.40 example that was provided, I feared that this kind of setup was 
> being used (note the negotiation that seemed to be used), and am surmising 
> that it wasn't configured/prepped properly.

I think you're overlooking the fact that this configuration is extremely common.
   It is far from an edge case.  Take Mozilla's own HTTP server used to serve up
the Mozilla releases:

http request [
  GET
/pub/mozilla.org/mozilla/releases/mozilla1.7rc1/mozilla-i686-pc-linux-gnu-1.7rc1.tar.gz
HTTP/1.1
  Host: ftp.mozilla.org
  Accept-Encoding: gzip,deflate
  ...
]

http response [
  HTTP/1.1 200 OK
  Server: Apache/2.0.48 (Debian GNU/Linux)
  Last-Modified: Wed, 21 Apr 2004 22:37:36 GMT
  Etag: "2bb800b-d1bd7d-70e94c00"
  Accept-Ranges: bytes
  Content-Length: 13745533
  Content-Type: application/x-gzip
  Content-Encoding: gzip
  ...
]

<user cancels download, and resumes>

http request [
  GET
/pub/mozilla.org/mozilla/releases/mozilla1.7rc1/mozilla-i686-pc-linux-gnu-1.7rc1.tar.gz
HTTP/1.1
  Host: ftp.mozilla.org
  Accept-Encoding: gzip,deflate
  Range: bytes=426240-
  If-Range: "2bb800b-d1bd7d-70e94c00"
  ...
]

http response [
  HTTP/1.1 206 Partial Content
  Server: Apache/2.0.48 (Debian GNU/Linux)
  Last-Modified: Wed, 21 Apr 2004 22:37:36 GMT
  Etag: "2bb800b-d1bd7d-70e94c00"
  Accept-Ranges: bytes
  Content-Length: 13319293
  Content-Range: bytes 426240-13745532/13745533
  Content-Type: application/x-gzip
  Content-Encoding: gzip
  ...
]

In fact, this is the behavior of Apache by default whenever you simply place a
.tar.gz file on the server.  You can agrue that the server is misconfigured, but
I don't think you can deny the ubiquity of Apache's default configuration. 
That's just the hard reality of the web :-(
That isn't the default; you've modified your 2.0 config. The default *does* 
put in a Content-Encoding: x-gzip, however. (also dumb, but I see it there)  
Thus, while the short-term hack will make things work, I really don't think it 
is the right long-term answer. In a normal config, Apache will serve it up 
properly; the Q is then how well Mozilla can take advantage of this stuff. 
(and IIS is a whole 'nother matter, but I'm not familiar with its 
characteristics and flexibility).

If your server is creating those responses, then you probably ought to fix 
your config :-)
[ look for an AddEncoding line(s) ]
WThe real problem is that we don't have a solution that works with both the
mod_gzip style content encoding and with the .ps.gz responses.

I don't think theres any option apart from disabling Range on Content-Encoded
results
> If your server is creating those responses, then you probably ought to fix 
> your config :-)

Greg, it's not just the Mozilla.org server.  It is many many servers, which I do
not have any control over.  The fact is, this configuration is very common. 
Just look at the number of bugs filed about user's downloading .tar.gz files
only to discover that they have been automatically decompressed by the browser
(on account of a "Content-Encoding: [x-]gzip" header being inserted in the
response).

By the way, Mozilla treats "C-E: x-gzip" equivalently to "C-E: gzip", so I don't
know what you mean by a "normal config" being ok.  I think a "normal config" is
usually equivalent to the default config, and as you pointed out the default
config will lead to the situation I described in comment #34.

We don't have the resources to evangelize the tens of thousands (maybe more) of
"misconfigured" web servers.  We just don't have those kinds of resources.  I
also can't expect that every website using Apache will be updated to use the
latest version of Apache.  If we code Mozilla to treat range requests on
compressed entities in the manner you have recommended, I believe that we will
break more websites than we currently do.  NOTE: Your site is only the second
that I have heard of where this is a problem (albeit a topsite).

Also, keep in mind that most of the browser's out there are Internet Explorer,
and IE does not automatically complete partial cache entries using range
requests.  So, whatever we do here must not break compatibility with IE-tested
websites.  I cannot easily convince a website to fix itself when it works just
fine with IE.

Moreover, I still question your interpretation of RFC 2616.  I have sent an
email to Jeff Mogul asking for his response to your comments here.  I find it
hard to believe that his interpretation of the standard -- which he helped
author -- wouldn't be the correct interpretation! ;-)
Comment on attachment 148170 [details] [diff] [review]
v1 patch

bbaetz: can you please mark r= on this patch?  thx!
Attachment #148170 - Flags: review?(bbaetz)
Oh, I hear you. Believe me, when the wall is there, I'm the same: return to
pragmatism :-)  I just find it kind of a sad state. I'll bring up removing that
x-gzip thing -- I thought it only affected old Mozilla browsers. Argh.

(and sorry to imply that you'd need to evangelize to all those servers out
there... yah, that would be icky :-)

Thanks for sending the email to Jeff. I look forward to understanding his
thinking that an entity tag is message-based rather than based on the original
resource (entity).

And all that said: please let me know if you think there is something we could
do with the Blog*Spot servers. While I believe it is serving "properly", I'm
always open to hearing how you think we could serve "better" :-)
greg: well, in comment 15 I noticed you are sending *two* content-type headers.
I suggest only sending one :)
Ah. Missed that part. Thankfully, we updated our server configuration a couple
weeks ago, and it doesn't do that any more :-)

Thanks for calling it out, Christian, so that I could check it!
OK, I decided to go through RFC 2616 more carefully to see the exact language
used, and these are some excerpts that I found particularly relevant:


From section 14:11 on Content-Encoding:

  "The content-coding is a characteristic of the entity identified by the 
   Request-URI. Typically, the entity-body is stored with this encoding and is 
   only decoded before rendering or analogous usage."


From section 14.13 on Content-Length:

  "The Content-Length entity-header field indicates the size of the entity-body"


From section 14.16 on Content-Range:

  "The Content-Range entity-header is sent with a partial entity-body to specify 
   where in the full entity-body the partial body should be applied."


From section 14.35.1 on Byte Ranges:

  "Byte range specifications in HTTP apply to the sequence of bytes in the  
   entity-body (not necessarily the same as the message-body)."


From section 4.3 on Message Body:

  "The message-body differs from the entity-body only when a transfer-coding has 
   been applied"


Hence, because we are concerned with responses that do not have any
transfer-coding applied, the entity-body for our purposes here is the
message-body.  A logical extension of this is that the entity-body is defined
after content-codings have been applied, which is consistent with Jeff Mogul's
statements.


Greg Stein wrote:
> And all that said: please let me know if you think there is something we could
> do with the Blog*Spot servers. While I believe it is serving "properly", I'm
> always open to hearing how you think we could serve "better" :-)

Based on these excerpts, I believe that Blog*Spot is NOT serving properly.

Given that we've only encountered one other site with this problem, and given
that it is an advantage to Mozilla users to be able to resume partial downloads
of compressed entities, I would prefer to see Blog*Spot fix their servers.
While I agree with Darin on following the RFCs, I'm also sure that a user who
has Mozilla "choke" on a page and then subsequently loads the page in Internet
Explorer without errors will assume Mozilla is a **** browser. My first
assumption was that this was a Mozilla was bug, too (that's why I'm here).

Despite extra code, I think Mozilla should still provide an error message,
blaming the web server for example ;-)? It is then at the user's discretion to
reload the page. 
Of course, if there really only are two webservers which are
broken/misconfigured in this way I can see the point of solving this that way.
--> All/All
OS: Windows 2000 → All
Hardware: PC → All
Comment on attachment 148170 [details] [diff] [review]
v1 patch

OK. If a server uses C-E on the second response, but not the first, then they
deserve to fail :)
Attachment #148170 - Flags: review?(bbaetz) → review+
Greg - you could use Transfer-Encoding not C-E....
Comment on attachment 148170 [details] [diff] [review]
v1 patch

sr=dveditz
a=dveditz for 1.7 -- please add the fixed1.7 keyword when this has been checked
into the branch
Attachment #148170 - Flags: superreview+
Attachment #148170 - Flags: approval1.7+
dveditz: Are you sure this is a patch we want to take?  Please see comment #42.

I know that RC2 is right around the corner, so I can understand that working
around the Blog*Spot servers may be our only solution in the near-term.

gstein: Can you please respond to comment #42?  I'd like to know whether or not
you find fault in my reading of RFC 2616.  If I have indeed convinced you, then
I'd really like to know when we will be able to expect a solution on your end. 
It will help us make the right decision here.  Thanks!
Well, the section sentence of section 14.11 (Content-Encoding) reads:

    "When present, its value indicates what additional content codings
     have been applied to the entity-body, ..."

This seems to imply the entity-body is the original, uncompressed form. However,
section 4.3 clears it all up quite quickly in terms of message body vs entity body.

However, I'm reading it as each of the two responses are consistent: one
response defines an entity that is compressed, and the other is uncompresed.
Yah, it is a bit wonky to have the resource change from one request to another
(doubly bad because the entity-tag remains consistent despite the entity
appearing to change), but the responses are self-consistent. The fact that
Mozilla attempts to decompress the second response, *despite* that response
saying it is uncompressed, seems to be a clear bug on Mozilla's part.

When this first came up on the Blog*Spot servers, we did discuss disabling the
compression, but that would have impacted ALL users. That didn't seem like a
worthwhile trade, especially because we were under the (mistaken) belief that
the bug had already been fixed by the Mozilla team.

I agree that Transfer-Encoding would be the most proper header to use, but none
of the compression modules out there use that header (which makes it hard to
"flip a switch"). Given that the current C-E response is consistent, then I
think we're okay until we can get modules to use T-E. I think this issue makes
it very clear about the best mechanism for on-the-fly compression.

So... my takeaway is that the Blog*Spot servers, through its use of mod_gzip, is
mildly broken (the etag should have changed). But much more important: Apache's
compression modules need to switch to T-E whenever possible (some browsers won't
like it, so we may have to do some user-agent detection).

Does Mozilla handle T-E well?

Thanks!
> The fact that Mozilla attempts to decompress the second response, *despite* 
> that response saying it is uncompressed, seems to be a clear bug on Mozilla's 
> part.

I agree 100% with you on this.  It is a bug we need to fix.


> When this first came up on the Blog*Spot servers, we did discuss disabling the
> compression, but that would have impacted ALL users.

I agree that disabling compression is a bad idea.  The best solution would be to
fix the buggy implementation of the compression module.


> So... my takeaway is that the Blog*Spot servers, through its use of mod_gzip, 
> is mildly broken (the etag should have changed).

This is an understatement.  Mozilla sends an If-Range header, which means that
if the server cannot return the specified range on the specified entity, then
the server should return a 200 response.  It is not valid for the server to
return the specified range on a different entity.  That just makes no sense.

From section 14.27 on If-Range:

  "If the entity tag given in the If-Range header matches the current entity tag 
   for the entity, then the server SHOULD provide the specified sub-range of the 
   entity using a 206 (Partial content) response. If the entity tag does not 
   match, then the server SHOULD return the entire entity using a 200 (OK) 
   response."


> Does Mozilla handle T-E well?

No, it is not yet supported by Mozilla.  Moreover, it is not supported by
Internet Explorer, which is the main reason why it is not used more on the
internet (I suspect).  I think Opera may be the only major browser that supports
T-E.


It sounds like it may be difficult for you to repair the buggy compression
module in a timely manner.  That suggests that we probably should land our
workaround patch.  Unfortunately, this workaround solution does nothing to help
older Mozilla-based browsers (including Netscape 7.1).  For this reason, I
strongly encourage you to consider fixing your server's compression module.
> > Does Mozilla handle T-E well?
> 
> No, it is not yet supported by Mozilla.  Moreover, it is not supported by
> Internet Explorer, which is the main reason why it is not used more on the
> internet (I suspect).  I think Opera may be the only major browser that supports
> T-E.

Actually, Mozilla handles "Transfer-Encoding: chunked" (since it is required for
HTTP/1.1 compliance), but we do not handle any other transfer-codings such as
gzip, which is I presume what you were asking about.
> Actually, Mozilla handles "Transfer-Encoding: chunked" (since it is required for
> HTTP/1.1 compliance), but we do not handle any other transfer-codings such as
> gzip, which is I presume what you were asking about.

Really? I thought we did, which is why I suggested it. Is there a particular reason?
> Really? I thought we did, which is why I suggested it. Is there a particular
reason?

Nope.  See bug 68517.  Patches welcome ;-)
Comment on attachment 148170 [details] [diff] [review]
v1 patch

I don't think we want to take this kind of risk into 1.7 this late in the game.
Removing driver approval.
Attachment #148170 - Flags: approval1.7+ → approval1.7-
(In reply to comment #54)
> (From update of attachment 148170 [details] [diff] [review])
> I don't think we want to take this kind of risk into 1.7 this late in the game.
> Removing driver approval. 
> 

If we don't want to land this on 1.7, we should consider landing the patch on
AVIARY_1_0_20040515_BRANCH , as this branch will still get a considerable amount
of testing before we ship a final product based on it.

I realise that it's not really our problem to begin with, but that doesn't mean
that we shouldn't try and get a fix in for Firefox 1.0. Shipping Firefox 1.0 and
having it mess up on Blog*Spot would be /bad/ PR. If a fix is ready, we should
take it. There is still plenty of time before Firefox 1.0 ships.
This shouldn't land for Firefox and I don't think I'd allow it there either.
Breaking a valuable feature to work around a site problem is not the right way
to handle this bug. 
If this is really /only/ a site problem, and Necko is not at fault here at all,
then perhaps we should consider moving this out of Networking and over to Tech
Evangelism and work with Blog*Spot to have them correct their configuration in a
timely manner before we ship our next generation flagship products.
Ali:

That is exactly what is happening.

If you read the last couple posts from me you will find that I spelled out what
is wrong with the Blog*Spot servers.  The only thing we can do is try to recover
from the error, but that is an involved and risky patch even for the Aviary 1.0
branch.  The only option for us is to consider disabling range requests on
compressed content, but as Asa said range requests on compressed content is a
useful feature.  Moreover we have implemented it per spec, so it would be better
to have the servers fix their bug.  They can simply not advertize support for
range requests on compressed content.  They do not have to disable compressed
content.  Disabling range requests would not impact IE users since IE does not
issue range requests.

Asa is in contact with the folks who run Blog*Spot.  I'm sure he'll update this
bug report when we know more about their plans for fixing their servers.

-> Tech Evang
Assignee: darin → english-us
Status: ASSIGNED → NEW
Component: Networking: HTTP → English US
Product: Browser → Tech Evangelism
QA Contact: core.networking.http → english-us
Target Milestone: mozilla1.7final → ---
Version: Trunk → unspecified
Flags: blocking1.7+
Working on it... :-)
page has another bug, but also shows this one:

Bug 208354 blogspot.com - character entities improperly encoded
http://www.analisiscatolico.blogspot.com/

maybe a dupe:
Bug 245151 	{inc} Page contents completely garbled
http://no-pasaran.blogspot.com/
*** Bug 245151 has been marked as a duplicate of this bug. ***
Darin: how can you possibly say that you implemented it per spec? Mozilla is 
trying to decompress content that was NOT advertised as compressed (in the 
range response). I'm sorry, but that is just plain wrong.

Mozilla is broken. Period. A fix should be made to Mozilla at some point. This 
isn't just an evangelism issue.

The only problem with the Apache/mod_gzip combination [used by blogspot among 
many others on the internet] is that the etag doesn't vary in the responses. 
There should be one etag for the uncompressed form, and one for the compressed 
form. That would put the server into spec.
(In reply to comment #62)
> Darin: how can you possibly say that you implemented it per spec? Mozilla is 
> trying to decompress content that was NOT advertised as compressed (in the 
> range response). I'm sorry, but that is just plain wrong.
> 
> Mozilla is broken. Period. A fix should be made to Mozilla at some point. This 
> isn't just an evangelism issue.

I never said that Mozilla wasn't broken.  See comment #50.  But I see that my
statements in comment #58 are overly vague.  I meant, that Mozilla is issuing a
proper range request per spec.  I did not mean that Mozilla is interpreting the
server response correctly.  The fact of the matter is that the result from the
server is bogus, and what Mozilla is not doing -- as you correctly point out --
is recovering from that situation.  I agree that that is a bug that needs to be
fixed on our end.  However, this is clearly an evangelism issue because this
whole problem would go away for all versions of Mozilla (past, present, and
future) if the server correctly implemented range requests.


> The only problem with the Apache/mod_gzip combination [used by blogspot among 
> many others on the internet] is that the etag doesn't vary in the responses. 
> There should be one etag for the uncompressed form, and one for the compressed 
> form. That would put the server into spec.

Right, and I think it is important that this get fixed as soon as possible.  At
least web sites that encounter this problem with old versions of Mozilla and
Netscape (assuming some future version of Mozilla avoids this problem some how)
will be able to upgrade their server software.
I filed bug 247334 on improving Mozilla's error handling behavior in this situation.
Okay. We're beginning a test cycle and roll out of a new version of Apache. 
Unfortunately, there wasn't a way to simply disable ranged responses on Apache 
1.3, so we made a code level patch. The blogspot servers should be remedied 
within the next few days or so.

Fixing mod_gzip and Apache 2.0's mod_deflate is another story (w.r.t. etag 
issue). Not much I can do about mod_gzip except to notify the authors; I can 
get Apache 2.0 fixed though (but have not had time to start that yet).
Relic of a bygone era:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=MozillaSource&branch=HEAD&branchtype=match&dir=mozilla%2Fapache&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=all&mindate=&maxdate=&cvsroot=%2Fcvsroot

Eric Bina, author of NCSA Mosaic and Netscape 1.0-4.x layout engines, hacked
this up before we reset mozilla.org around the new layout engine (nee' Raptor,
now Gecko).

/be
Conforming summary to TFM item 10 at 
http://www.mozilla.org/projects/tech-evangelism/site/procedures.html#file-new
Summary: Garbage rendered on first load; fixed by reloading → blogspot.com - Garbage rendered on first load; fixed by reloading
Keywords: top500
(In reply to comment #13)
> This is quite reproducible for me at
> http://corrente.blogspot.com/archives/2004_04_18_corrente_archive.html All I

WFM with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2)
Gecko/20050602 Firefox/1.0+
WFM Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8.1b1) Gecko/20060802 SeaMonkey/1.1a

As the last comment a year ago was WFM, new Apache versions and a lot of changes on the mozilla side took place, I resolve this WFM.

Feel free to reopen if you see problems on 1st load or Shift-Reload of URLs mentioned in this bug, I didn't.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WORKSFORME
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: