Google Web Accelerator: Hey, not so fast - an alert for web app designers 06 May 2005

225 comments Latest by online casino btdino

Google’s web accelerator seems like a good thing for the public web, but it can wreak havok on web-apps and other things with admin-links built into the UI. How’s that?

The accelerator scours a page and prefetches the content behind each link. This gives the illusion of pages loading faster (since they’ve already been pre-loaded behind the scenes). Here’s the problem: Google is essentially clicking every link on the page — including links like “delete this” or “cancel that.” And to make matters worse, Google ignores the Javascript confirmations. So, if you have a “Are you sure you want to delete this?” Javascript confirmation behind that “delete” link, Google ignores it and performs the action anyway.

We discovered this yesterday when a few people were reporting that their Backpack pages were “disappearing.” We were stumped until we dug a little deeper and discovered this Web Accelerator behavior. Once we figured this out we added some code to prevent Google from prefetching the pages and clicking the links, but it was quite disconcerting.

This wouldn’t be much of a problem on the public web since it’s pretty tough to be destructive on public web pages, but web apps, with their admin links here and there, can be considerably damaged. If you have a web app, it might be worth returning a 403 when the HTTP_X_MOZ is set to “prefetch” header is sent. This will keep Web Accelerator from clicking destructive links. Here’s an FAQ on prefetching for more information.

Update: If you use Ruby on Rails for your web-apps, here’s some code to just say no to Google Accelerator.

225 comments (comments are closed)

Mike G 06 May 05

I’ve also been digging for informaiton on whether or not this will affect web analytics software. Will a prefetch be counted as a page view even if the user never visits that page? My second question was whether or not this would affect referrer reporting as requests are being made on behalf of Google. But apparently your IP is still forwarded. I’m still not sure if this will actually reatin the referrer data.

Great idea, but hope it is not at the expense of accurate reporting. It’s hard enough to get accurate data as it is.

Anonymous Coward 06 May 05

And what about ads? Will this pad “impression” numbers? Loading ads that people will never see?

Stephen 06 May 05

I can see that this could be a serious problem for unprotected web apps—I can’t comment on Backpack as I haven’t tried it—but aren’t most content or project management apps password protected anyway? For instance, in WordPress, links to edit/delete a post are not available unless you are logged in to the admin section. The web accelerator may be the best reason yet to secure the sections of a site that are still left open to random visitors.

Andrew 06 May 05

This is not a new problem. Search engines will try to follow links to spider pages. Although they aren’t pre-fetching the pages, the behavior is the same as the one you describe.

It’s always a bad idea to put critical, data-damaging functions behind a regular old a-tag and http GET. It would be safer to do these operations with a POST.

Anonymous Coward 06 May 05

but aren�t most content or project management apps password protected anyway?

Doesn’t matter — Google Web Accelerator is software you install in your browser so it’s seeing whatever you are seeing (even if it’s password protected).

Ryan Tomayko 06 May 05

Yea but if you follow web standards you won’t have to worry about this and it’s all upside. GET requests should be safe and idempotent. If you can delete items with a GET request, you mind as well use tables for layout and Active X controls too.

Unless google is following links with a POST, then they are following web standards and the issue is on your side. Caching proxies are an important part of the web and the HTTP specification lays out exactly what is respectful and disrespectful.

Sorry if this came off as an attack, my intention is to be informative. Let me know if you want to hash this out further.

Mark Rowe 06 May 05

Quoting from section 9.1.1 Safe Methods of the HTTP 1.1 RFC (2616):

Implementors should be aware that the software represents the user in their interactions over the Internet, and should be careful to allow the user to be aware of any actions they might take which may have an unexpected significance to themselves or others.

In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered “safe”. This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.

Put simply, GET requests should be idempotent and safe.

Anonymous Coward 06 May 05

There’s the “if everyone followed the spec and did the right thing all the time” world, and there’s the real world. Even the browser developers can’t follow the same spec. Do you expect tens of thousands of web app developers to follow all the rules all the time?

GET shouldn’t be used for destructive actions, but it is. And it is in a lot of places and in a lot of applications. I appreciate this alert.

Jamie 06 May 05

I wonder what Google accelerator does with gmail. There are lots of GETs in that interface.

Ryan Tomayko 06 May 05

Mark: I’ve always interpreted that as meaning that GET requests should not be capable of causing side-effects but I have to admit after reading more carefully that you may have a point here. The spec seems to be putting the responsibility on the client where I’ve always thought it was on the server. This sucks. I really want you to be wrong about this. Hrmm.. Let me read up a bit and get back with you.

Ryan Tomayko 06 May 05

No no no.. scratch my last comment. That blurb is indeed saying that it shouldn’t be possible for a client to cause side-effects using GET.

David Heinemeier Hansson 06 May 05

The problem with the Accelerator is that its going places that used to be safe heaven from spiders. Namely behind authentication schemes that require humans at the steering wheel.

This safe heaven has lead a large number of applications to interpret the SHOULD NOT exactly as a should and not as a must. Meaning that if you have “good reasons”, you can diverge from the recommendation. And being behind authentication has by many been interpreted as “good enough” reason.

So while you may rail on application developers for diverging with reasons you don’t deem to be good enough, it’s hardly news that this is the fact. Just like web developers can’t disregard the browser bugs in IE6, we have to work with and deal with what’s real.

Google is disregarding real and are thus blatantly disrespecting precedence. That’s where the problem is. Moreover, there’s no warning here. Shit will just start breaking left and right leaving users and application developers lost.

Having a standard is not an excuse for not looking out the windows.

Jacob Harris 06 May 05

Okay, the point that GET should be idempotent is well-taken (although hardly realistic in many websites these days). However it should also be noted that prefetching can really hammer servers that play by the HTTP rules. It’s not really fair to assume that all GETs are created equal, and even some idempotent links (to saved searches for instance) may exert considerable load on the server. In essence, Google becomes a force multiplier for a user’s load. So, even if you stick to the HTTP guidelines on GETs, it’s probably a good idea to disable Google’s prefetching if you create dynamic content and don’t want to see a magnification of your user’s effects.

matthew 06 May 05

so it appears that google is going past the standards of “prefetching”, at least as described by mozilla. that faq makes a point that “URLs with a query string are not prefetched” and “https:// URLs are never prefetched for security reasons”

so if google is prefetching query strings, is it also prefetching https?

matthew 06 May 05

or, i could look myself

http://webaccelerator.google.com/webmasterhelp.html

Dan Boland 06 May 05

Would this be grounds for legal action against Google?

geeky 06 May 05

ok, i’m a bit confused and i’ll be the first to admit i don’t know much about this prefetching business. but i do know i make a lot of web apps, and your post has me a little concerned about the safety of those web apps now.

after looking at the “webmaster help page”, i have a question - were the delete links in backpack <link rel=”prefetch”> ? from what i understand, the web accelerator will only crawl those links. or am i just confused?

Dirty Lawyer 06 May 05

How would it ever be grounds for legal action???

rick 06 May 05

From what it sounds, the GWA crawls all links except for HTTPS links. You can also use the prefetch to give GWA a few hints on what’s actually important.

It seems like there should be some opt-out Meta tag or it should respect robots.txt.

Mike 06 May 05

After trying the Google Web Accelerator I noticed a few problems of this nature. We were getting quite a lot of DNS/Page Not Found errors on which the error page stated it was created by the Google Web Accelerator.

In addition I have found it impossible to use my web mail with this running because as soon as I sign in Google Web Accelerator is “clicking” on the sign-out link, and killing my session.

You’d have thought Google would think these things through a bit more, considering all the great minds they’re supposed to have working for them

Chris Rimmer 06 May 05

Of course, if you were being RESTful, then HTTP GETs would not change state and you wouldn’t hit any problems. Unless the web accelerator also POSTs forms for you too… ;-)

Jacob Harris 06 May 05

Echoing geeky, I’m confused too, since they’re claiming they;re sticking to <link> tags only. Is Google fetching <a> tags as well? Can you provide an example of the delete links Google is prefetching?

Ryan Tomayko 06 May 05

David: I see your point and agree that it would be in google’s—and definitely our—best interest to scale this back a bit because it isn’t practical right now.

But I want to point out that your argument isn’t very different than the ones people presented for using IE specific DOM/CSS/Active X five years ago. When you use non-standard approaches like non-safe/non-idempotent GET it’s the same as using <blink>, IE’s proprietary CSS filters, or bad date formats in RSS. The result is always unecessary limitations on the types of user agents that can interact with your resources and barriers to new and different types of intermediation.

Google should add, as a minimum, some kind of heuristic to look for common cases of GET misuse (sort of a quirks mode for HTTP) but I don’t think we can just wipe our hands clean either. This type of service by intermediation is only going to grow. Ensuring that your resources can handle basic types of interaction like this is important.

David Heinemeier Hansson 06 May 05

Geeky, Jacob: Google follows all ahrefs (that isn’t on https), not just the link tag. The link tag is only there to make it smarter about that stuff.

Ryan: Using idempotent GETs is discouraged, but certainly not forbidden. Thus, its to be expected that there will be cases of GETs that trigger stuff. Google is ignoring that reality.

So for your parallel to CSS/DOM, I’d rather use the example of browser bugs. Yes, a web developer may choose to follow the standard to the dot, but if IE6 renders it like snot, there’s little but academic joy to be had.

It’s grossly irresponsible of Google to ignore this. Just like Safari would be a dead browser in the water if it choose only to follow the standard (it does a lot of mumbo-jumbo to make less than perfect sites work).

Jacob B�tter 06 May 05

I experienced this using an administration webapp for our company database this morning too. It “accidently” deleted some of our very important leads. GWA is gone and will never be re-installed. I never thought Google would do anything like this terrible product..

Jacob Harris 06 May 05

Has anybody definitively identified how Google Web Accelerator treats the HTTP Cache-Control: and or Pragma: headers? SomethingAwful.com has already seen an issue where cookie-based authentication was being bypassed and users were being sent cached copies of other user’s screens when looking at forums. I am not aware of what caching headers they were using, but I am concerned about if Google is respecting caching directives, since that could really throw a wrench in cookie or ip-based authentication schemes (thankfully, their proxy still presents the originating IP in the header or it would be even worst for IP authentication). Or is this question too off topic from this particular web accelerator problem?

JohnO 06 May 05

Not to rain on your parades for standardizing things. A GET request (even if RESTful, and idempotent) is still used productively. For instance, in systems I’ve built, requesting info(with the intent to change it) will lock info so other users can’t also request it. Obviously this would be bad with google’s prefetching, since the first person on the system would lock up everything.

So, moral of the story, stop Google from prefetching by testing for the header, and responding with 403 if you find it.

Richard@Home 06 May 05

I did a bit of testing today with the GWA running and pointing tail at my access logs.

I could only get it to prefetch my (PHP, but no query strings) pages by explicitly adding a [link rel=”prefetch” href=”foobar”] to my document [head]

I am confused: The HWA webmaster FAQ says:

— quote (a) —
What does Google Web Accelerator mean for my site?

It means that you don’t need to modify your website in order for your users to enjoy a faster experience.
— end quote —-

and

— quote (b) —
Can I specify which links Google Web Accelerator will prefetch on my pages?

Yes, you can. For each link you’d like us to prefetch, simply add the following snippet of code somewhere in your page’s HTML source code:
—-quote

Which seems to read (a) will spider you without you doing anything and (b) we will only spider pages which you link explicitly.

My own experiments point to (b) being correct…

Pete Prodoehl 06 May 05

The solution is simple: Sue Google! ;)

Will Hayworth 06 May 05

Semi-stupid question: what about nofollow? I’ve looked in the Google docs and Mozilla specs and haven’t found anything about whether nofollowed links will be prefetched.

matthew 06 May 05

there’s definitely more going here. i tested this with a couple of admin pages that had edit/delete get links on it with no tragic effects. the data was not deleted.

i do have a meta name=”robots” content=”noindex” tag set, so perhaps it’s obeying that tag?

geeky 06 May 05

i am beginning to suspect that the FAQ is just poorly worded. it seems that web accelerator should only pre-fetch links on your site that you make < link rel=”prefetch” >, and that if someone else wants to pre-fetch your page (say, from a bookmark) and you want to let them do that, there is no effort needed on your part.

now, perhaps something else is actually happening, but i imagine that is what the FAQ is trying to convey.

Ben 06 May 05

to stepan:

You can specify a tag in the of your html document, for example —

More on the LINK element : http://www.euronet.nl/~tekelenb/WWW/LINK/

Ben 06 May 05

haha let’s try that again

You can specify a tag in the of your html document, for example �

Simon Proctor 06 May 05

Could someone point out to me how I put something on a page with HTML the does a PUT or DELETE request?
Oh and how to do a POST request with a simple link without using Javascript?
Or am I missing something?

Thomas Baekdal 06 May 05

Just a suggestion for future web applications - use Post. This used to be hard and not very practical, but now we got XMLHttpRequest.

In Google’s case I think there is a huge problem with 2 things:

1: Why is anything sent to Google? Should the webaccelerator not be able to do everything it needs locally? This is a huge privacy issue.

2: I am worried by the reports about Google showing someone else’s cashed information. This is not just a privacy issue, but a catastrophic privacy issue.

In any case web acceleration is probably going to stay. It is a great concept which will solve one of the oldest problems in the internet - inefficient use of our connections.

stepan 06 May 05

Yes, I know about LINK tags in the HEAD section, but that’s not the issue here - I mean some of them are expected to be fetched (stylesheets, for example). This article claims that GWA is fetching links in anchor (A) tags.

I think it’s a bit pedantic to keep quoting Section 9.1.1 of RFC 2616 (anyone else notice the ironic section number :-). The fact is, there are plenty of web pages to use anchor tags for actions. I challenge you to name one Forum Software that doesn’t use a “Logout” anchor.

walkingmantis 06 May 05

god forbid if anyone uses this while in phpmyadmin

all the drop tables, and just about everything is a get request and since it ignores javascript confirmations, you can wipe your entire server in the name of saving a few seconds and letting NasdaqNM:GOOG have your webviewing statistics and purchasing trends

they should really have a robots.txt or meta tag that will tell GWA to not prefetch anything

johan 06 May 05


(although hardly realistic in many websites these days).

then the website is broken. end of story

stepan 06 May 05

FYI, there seems to be a problem with how GWA caches pages on its server.

This would be a more sensible explanation for the behavior that Backpack users see (i.e. something gets deleted by someone else clicking on a “Delete” link on an incorrectly cached page). This assumes that Backpack doesn’t send a correct “Cache-Control: private” header.

Shane Allen 06 May 05

Apache Server Admins:

you can use the following to forbid google’s web accelerator, based on the HTTP_X_MOZ header:

RewriteEngine On
RewriteCond %{HTTP_X_MOZ} ^prefetch$
RewriteRule ^.*$ - [F]

You’ll need to determine where the relevant code block is that needs to have this config added.

Damien 06 May 05

Just to reitterate the issue, what the *mozilla* foundation says about how *they* implement prefetching has *nothing* to do with how Google Accelerator works, they are completely different things, though related. Google really fudged this one up.

Google 06 May 05

There is now an update to the Google Accelerator to fix these problems.

Josh 06 May 05

“then the website is broken. end of story”

Apparently, you are the only one that thinks it is the “end of story”. Personally, I think people will be posting about this story on websites for days, if not weeks. The real world is not black & white. The story will not end no matter how hard you stomp your feet and scream that everyone else is wrong.

Jacob Harris 06 May 05

Sigh, I make some aside comment on “(although hardly realistic in many websites these days)” in a greater post about how even mass-fetching of idempotent GETs can be a hazard for sites (is your site ready to handle 25x server load overnight?), but what is the point that gets commented on?

I agree that in an ideal world, it would be great if GET were only for idempotent requests, but I think it is counter-productive to blame the victim in this case. The problem is that POST is an unweildy mechanism in many cases, so sites have opted for GETs for actions that alter state. While I agree that sites should not just allow the user to click on a link to do something really big and horrendous (“click here to destroy the universe”), links are tempting for users because they provide in a sense a way to embed dozens of tiny self-contained FORMs in a page if you will (how you would emulate with POSTs) and I think that’s useful for smaller, less-destructive actions (you could have just one form of course, but it would require other work). Honestly, if there were a way to do POST without using forms (like in a URL), I think a lot of people would embrace it, but it’s the unwieldiness of the framework that pushs people away.

But this is besides the point I was making. Even if we all live in a world where websites don’t use GETs for non-idempotent calls, GETs still use system resources and the problem with a dumb prefetch is that this can be a sudden and extreme multiplier on system load and requirements. If I told you that your site had to handle 20x traffic overnight, would you be able to do it?

Jacob Harris 06 May 05

Correction (haste leads post errors [plus spelling mistakes and repitition]):
“links are tempting for *developers* (not users)”

Sorry about that

Damien 06 May 05

Another idea…

The notion of blocking all requests ignores legitimate requests, e.g. to images and other cachable media. I propose simply blocking code pages, as follows:

PHP:
if(array_key_exists($_SERVER[‘HTTP_X_MOZ’]))
{
if(strtoupper($_SERVER[‘HTTP_X_MOZ’]) == ‘prefetch’)
{
header(“HTTP/1.x 403 Forbidden”);
header(“Content-Type: text/html; charset=iso-8859-1”);
header(“Expires: Mon, 26 Jul 1997 05:00:00 GMT”);
header(“Cache-Control: no-store, no-cache, must-revalidate”);
header(“Cache-Control: post-check=0, pre-check=0”, FALSE);
header(“Pragma: no-cache”);
header(‘Accept-Ranges:’);
exit();
}
}

CFML:





Damien

J. Macey 06 May 05

I was doing some testing last night with it (win XP, Firefox) and turned on the “style cached links” option. If I hover over a link for about half a second, it got precached. That was the only way I could get it to precache an tag.

Makes me curious what the “fixed problems” are in the update, but not curious enough to reinstall it.

GREACEN 06 May 05

robots.txt anyone? If your site is organized properly, you should be able to block spiders. Does the google accelerator obey robots.txt? (ok I’ll go rtfm)

c:\>curl -i www.google.com/robots.txt
HTTP/1.1 200 OK
Content-Type: text/plain
Last-Modified: Fri, 06 May 2005 02:03:58 GMT
Set-Cookie: PREF=ID=7dcb4198c97120e3:TM=1115403432:LM=1115403432:S=1LCd2kvo-SqQU
QK-; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Server: GWS/2.1
Cache-Control: private, x-gzip-ok=”“
Transfer-Encoding: chunked
Date: Fri, 06 May 2005 18:17:12 GMT

User-agent: *
Allow: /searchhistory/
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalog_list
Disallow: /news
Disallow: /nwshp
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /sorry/
Disallow: /imgres
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /advanced_search
Disallow: /googlesite
Disallow: /preferences
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /wml
Disallow: /xhtml
Disallow: /imode
Disallow: /jsky
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /hws
Disallow: /bsd?
Disallow: /linux?
Disallow: /mac?
Disallow: /microsoft?
Disallow: /unclesam?
Disallow: /answers/search?q=
Disallow: /local?
Disallow: /local_url
Disallow: /froogle?
Disallow: /froogle_
Disallow: /print?
Disallow: /scholar?
Disallow: /complete
Disallow: /sponsoredlinks
Disallow: /videosearch?
Disallow: /videopreview?
Disallow: /videoprograminfo?
Disallow: /maps?
Disallow: /translate?
Disallow: /ie?

c:\>


jpack 06 May 05

Based on my playing with it (YMMV), it fetches links (any href, not just what you declare with link tags) based on your clicking behavior + what you mouse over.

It hits the local proxy on your machine like:
http://localhost:9100/mousehover?url=http://www.ursite.com/linkyoumousedover.html
For everything you mouse over and will in many cases then pre-fetch that url you request. It of course also fetches other things, not just the mouse-over stuff, but that’s an interesting trick.

Besides doing nasty things with GETs (it does fetch most any GET AFAIK, including things with query strings) it could also screw up your logs something awful with duplicate requests or pages that were never actually seen by the user.

One easy(?) way to fix both things at an apache level would be to do something like this:

RewriteEngine On
SetEnvIfNoCase X-Forwarded-For .+ proxy=yes
SetEnvIfNoCase X-moz prefetch no_access=yes

# block pre-fetch requests with X-moz headers
RewriteCond %{ENV:no_access} yes
RewriteRule .* - [F,L]

# write out all proxy requests to another log
CustomLog logs/ursite.com-access_log combined env=!proxy
CustomLog logs/ursite.com-proxy_log combined env=proxy

Then any prefetching would be hit with a 403 error and all requests from the web accelerator would end up in another log. Of course all your other proxied requests (squid, etc.) would end up in that other log too, so you might be better off adding a rule to look for google IPs if you just want google in the new proxy logfile. I can’t say I’ve really tested this much.

It would be nice if they added an X-header to identify their proxy specifically if this becomes popular

Anonymous Coward 06 May 05

“Google ignores the Javascript confirmations”

Does this mean that google’s web accelerator automatically sends a confim “Yes” to the javascript? Does this work even if I have a “javascript:void(0)” or “#” in the href?

Jacob Harris 06 May 05

David, good point about the load. You’re right, it really should not be a problem for most sites. However, my company has aggregated searches where a user can type in keywords and then get a page with links to specific data sources and a count of how many matches there are. Clicking on a link on that page displays the matches for a particular source and since it’s a more rich information than just getting a count of matches, it adds some additional load. In addition, each user has a unique session key on login that expires after a day. As a result, unless a user redoes a search he did before that day, there will be no use of the cached content, so in this case, we still get the multiplied load. No wonder I got freaked out. ;-)

The problem here is that Google is essentially piggy backing web spidering on top of user interactions. This isn’t so bad for sites serving static content on public pages, since they probably have been optimized for higher volume anyway. But as has been noted, the real problem is that it bypasses site’s protection with login screens, javascript execution (in essence, breaching a firewall of sorts that keeps automated access out). We just don’t expect all of these links to be clicked simultaneously, since we’re assuming only users will be doing the clicking.

Anonymous Coward 06 May 05

It�s always a bad idea to put critical, data-damaging functions behind a regular old a-tag and http GET. It would be safer to do these operations with a POST.

Why is it a bad idea to use GET, and how exactly is POST safer? Sounds like a load of BS to me.

Damien 06 May 05

Corrected CFML code:

<!—- block Mozilla Web Accelerator —->
<cfif structKeyExists(cgi, ‘HTTP_X_MOZ’)>
<cfif cgi.HTTP_X.MOZ EQ ‘prefetch’>
<cfheader statuscode=”403” statustext=”Google Web Accelerator requests are forbidden.” />
<cfabort />
</cfif>
</cfif>

Ben 06 May 05

Maybe I’m dense, but can someone tell me why a GET that deletes something is not idempotent? Deleteing something N times is the same as deleting it once.

Mark 06 May 05

A POST is safer than a GET because, typically, the GET is a URL with parameters on the QueryString, which is all right there to be duplicated or spoofed. A POST’s URL is a Form Action, which, while it may be duplicated, doesn’t contain parameters..those are contained in the Form’s Fields.

Spyware and browser plugins can catch complete GET operations and duplicate them. On a Form POST all they can get is the Action URL; the rest of the data (which you should be checking for server-side) is incomplete and the request can be deemed invalid.

Alan 06 May 05

Why is it a bad idea to use GET, and how exactly is POST safer? Sounds like a load of BS to me.

Because a GET request can be easily created by any user. All they need to do is enter a URL into a web browser. A user has to go out of their way to create a POST request. The chances of someone accidentally POSTING to a form that will change/nuke a database are slim.

Sean 06 May 05

6. Will Google Web Accelerator affect traffic for ads on my site?

No. We do not prefetch ads.

My question related to that would be…

how does it know what is an ad…

just as an example… sponsored links on google.com

its a plain href… whats to stop it from prefetching those and driving up the cost for someone who has sponsored links?

you’d think that if you put something out even as a beta ( and with google that beta could make its way to a lot of users )… you would at the very list put up more than a tiny little pithy for wenbmasters guide.

Anonymous Coward 06 May 05

Why is it a bad idea to use GET, and how exactly is POST safer?

One of the reasons is security. Suppose your web app allows GET requests with side effects (like deleting an account). Then I can create a malicious page that contains one of your GET URLs in an img or iframe tag. When your users visit my page, they will have their accounts deleted without their knowledge or consent.

Browser security prevents pages on one site from initiating POSTs to another site without explicit action from the user (like clicking a form submit button). This protection doesn’t apply to GET requests.

Google itself is vulnerable to this type of attack. For example, I can change your default Google language to Pig Latin just by adding the following code to my page:

rick 06 May 05

The only way to POST a form with a link is using javascript. The GWA has no built in javascript engine, so all it sees are standard HREFs. If you have href=”#” for your javascript URLs, you’re probably safe as well.

I’d post a fix for ASP.net, but all all standard LinkButtons use javascript to post a form, so it’s immune to the GWA. Huzzah!

Adam Michela 06 May 05

Might I just say that it’s pretty funny to see people offering XMLHttpRequest (a non-standardized javascript object) as a solution for using GET as POST or DELETE :chuckle:

… not hating on my old friend Ajax. Simply saying you can’t complain about broken standards and then offer a non-standard approach.

Matt Brubeck 06 May 05

Arg. One more time, the code is:

<iframe src=”http://www.google.com/setprefs?hl=xx-piglatin”>

Robert Sayre 06 May 05

Disobeying SHOULD-level requirements makes things break.

pb 06 May 05

I don’t think that’s quite right. Spyware and browser plug-in’s obviously have the same access to POSTs as they do GETs.

The only reason that POST is safer is because you aren’t *supposed* to do dangerous things with POSTs, not that you *cannot* do dangerous things. Spiders and accelerators *could* access POSTs if they wanted to.

Doing a “form GET” is probably a bit safer than an “a href” since presumably the accelerator doesn’t access forms.

matthew 06 May 05

i’m just going to re-iterate that i think there’s more going on here than we realize. i’ve tested several of my apps now, attempting to recreate the mass deletion i’ve been reading about, to no avail. and, yes, the apps have the “delete” link via GET

at first i thought it was because my pages have the meta tag — name=�robots� content=�noindex�; but, then i tested on a page without that tag and it’s still not working.

Bernhard Seefeld 06 May 05

The ‘delete campaign’ link for logged in AdWords customers is also a simple link. Is Google seeing a sudden, inexplicable drop in the number of campaigns running? :-)

Anonymous Coward 06 May 05

Maybe there should be a new keyword to put in the “rel” property of “link” and “a” tags, like ‘no-prefetch’, so that links that shouldn’t be prefetched won’t.

But actually, administrative actions should not take place over GET. That’s what POST is for. GET is supposed to be a stateless operation to fetch a particular resource, while POST is supposed to be an operation to make a permanent change on a server.

Jough Dempsey 06 May 05

Whatever happened to “Don’t be evil.”?

Even ignoring how destructive the GWA can be if it clicks on multiple GET links at once, isn’t filtering otherwise private information through Google’s servers a huge invasion of privacy?

When people log in to a web site they have at least a moderate expecation of privacy. Surely there’s a way to stop malicious programmes like this from caching and possibly sharing logged-in activities?

I mean, it is only through the goodness of Google’s heart that they don’t prefetch https: requests and offer a UA header. What about a malicious application that ignores robots.txt, nofollow, https, and just tries to grab everything?

All of our security measures are based on the prerequisite that browsers will behave properly. Once someone installs a browser/accelerator (whether knowingly or as bundled with some other application) the game is OVER.

David Grant 06 May 05

I’d prefer that Google man up on this, rather than a million developers sprinkling little “Google_dont_prefetch_this” vapor-attributes everywhere.

Anonymous Coward 06 May 05

Solution: dont use google accelerator

ToddG 06 May 05

Conspiracy time! Google’s in bed with Thawte and others to sell lots more SSL certificates! Everyone’s gotta use HTTPS for web-apps now! Think of all the new hardware needed to handle the extra load! PROFIT!!!

Sigh. I wonder if this is Google beating Darth Vader to the dark side by two weeks. Or preferably a misstep that they’ll correct shortly in some way (i.e. opt-in or meta tags).

Lorenzo 06 May 05

Can it be excluded by a robots.txt?

confused 06 May 05

There is some serious confusion here somewhere.

Google says that prefetching *only* occurs on html ‘link’ elements (not on anchor elements (<a href=’…’>asdf</a>).

People here are saying that all the anchors on a page are being prefetched by Google.

These can’t both the case.

Weiran 06 May 05

Doesn’t this mean that traffic is also increased on the target site?

Christian Romney 06 May 05

There are two real issues here: how to design web applications in such a way that you minimize problems and how to create new tools responsibly.

DHH is right, Google’s web accelerator doesn’t play nice on the web as it exists today (where countless sites and apps ignore the best practices endorsed by TAG).

However, I don’t really think this absolves developers of web applications from the responsibility of implementing safe GETs. It’s ironic that we (as a community) preach the value of web standards when it comes to HTML and CSS and then disregard it on an even more basic level.

Today, it’s Google. Tomorrow, it’ll be someone else’s tool. Disregarding best practices usually comes back to bite someone in the ass.

Chris 06 May 05

Doesn’t it just pre-fetch links not all element references ?!?

Chris 06 May 05

* Doesn’t it just pre-fetch <link rel=”prefetch” href=”http://url/to/get/” /> (link tags with rel=prefetch) references not all <a> (anchor element) links ?!?

Jon 06 May 05

Or to summarise:
Something the people with clue kept warning about is now popular and the idiots have finally noticed but are blaming google instead of themselves.

Matt 06 May 05

“Does this mean that google�s web accelerator automatically sends a confim �Yes� to the javascript?”

No. It just ignores the javascript. The relevant javascript is generally an onClick handler, which doesn’t work unless the browser has JavaScript turned on. Google will just ignore it.

The way to get around this is to make the link not do anything unless JS is turned on (i.e. use the JS to set the link). However, then you lose the fail over behavior when JS is off. Further, if you are going to that trouble, you might as well take the time to make the link a post.

Matt 06 May 05

“Maybe I�m dense, but can someone tell me why a GET that deletes something is not idempotent? Deleteing something N times is the same as deleting it once.”

What if the second delete throws an error (no such item)? That is correct within the meaning of delete, but that would make a delete not idempotent.

You may be correct in the thrust of the criticism though. The issue is not that actions aren’t repeatable. The issue is that some actions are undesirable to occur once. If I’m browsing a list of products on an ecommerce site, I don’t want to delete every one that I browse.

pb 06 May 05

That should have been:

GWA does *NOT* pre-fetch GETs or “a hrefs”.

jimmy 06 May 05

GWA does NOT pre-fetch GETs or a hrefs.



This is incorrect. It does prefetch a href. Check the logs of your site while using it.

David Grant 06 May 05

Google is claiming it pre-fetched my mom. I’m steamed.

Ping 06 May 05

Anyone’s seen the UserAgent of GWA yet?

Bill de hOra 06 May 05

“Apparently, you are the only one that thinks it is the ‘end of story’. Personally, I think people will be posting about this story on websites for days, if not weeks. The real world is not black & white. The story will not end no matter how hard you stomp your feet and scream that everyone else is wrong.”

People are talking about reality v specs here, but the reality is that the apps have been broken by spec compliant behaviour. Technical specs like HTTP are not there to support apps with design flaws, they are there to help you avoid these kinds of situations, you cut around them at your own peril. Badly designed browsers give you some plausible deniability, as do stupid decisions like subsetting HTTP methods in HTML forms, but the short and long of it is that people will have to fix things up, because an app that got busted by GWA arguably has security flaws.

The entire point of using GET/HEAD is that technically speaking, the client cannot be held accountable for side effects incurred. Yes, we’re all using GET links to log people out and some of us at least bounce that through javascript to get a POST, but in exposing an action with a side effect as a GET, the server side is accountable for that decision’s consequences.

Nik Cubrilovic 06 May 05

Guys guys guys, the BIG issue here is why does doing a GET to a URL on your application actually *delete data* that is poor practice and a weakness. Since your applications allow this, in theory I could setup a HTML page with links to Backpack, blogger, Wordpress sites etc. that delete entries and then just wait for the Google bot or proxy to come past and click on them for me! Wow.

I could even include some exploit URLs in the same page and have Google proxy click on them for me.

Nik Cubrilovic 06 May 05

Guys guys guys, the BIG issue here is why does doing a GET to a URL on your application actually *delete data* that is poor practice and a weakness. Since your applications allow this, in theory I could setup a HTML page with links to Backpack, blogger, Wordpress sites etc. that delete entries and then just wait for the Google bot or proxy to come past and click on them for me! Wow.

I could even include some exploit URLs in the same page and have Google proxy click on them for me.

Bill 06 May 05

Since your applications allow this, in theory I could setup a HTML page with links to Backpack, blogger, Wordpress sites etc. that delete entries and then just wait for the Google bot or proxy to come past and click on them for me! Wow.

Uh, no you couldn’t. Backpack, Blogger, Wordpress admins are password protected. This is the point. What used to be off-limits to a spider is now fair game because GWA allows the spider behind the password curtain.

Nik Cubrilovic 07 May 05

Bill - pre-fetching will send your session information, if its prefetched via GWA or Firefox prefetch support. I didnt mean to mention bot, but that can happen in some cases as well.

Don Wilson 07 May 05

I’m glad to see that someone’s pointing out that google doesn’t always make godly applications.

Peter Cooper 07 May 05

I’m no PHP coder, but clearly this code that was posted above will not work:

if(strtoupper($_SERVER[�HTTP_X_MOZ�]) == �prefetch�)

strtoupper would capitalize ‘prefetch’ to ‘PREFETCH’ right? Therefore, it’d never match.

Dallas Pool 07 May 05

*From Googles FAQ

4. Are there any files for which I can’t specify prefetching?

Yes, there are. We don’t prefetch HTTPS: files (i.e., secure pages), or large data files such as MP3 or MPEG. Note, however, that nothing bad will happen if you specify a file type which Google Web Accelerator doesn’t prefetch; that file type simply won’t load any faster.

Solution: Place your web apps behind a secure cert (as they ought to be) and your safe.

Brent Dax 07 May 05

Er…if these are GET requests, they have every right to run through and “click” links. GETs aren’t supposed to change anything.

Anonymous Coward 07 May 05

I was half-amused when I saw a forum thread yesterday in which nearly half the comments were “deleted by the user” in the past day or two.

pb 07 May 05

The other, much easier solution is to DO NOTHING.

GWA *only* pre-fetches links specifed as such:
{link rel=”prefetch” href=”http://url/to/get/”}

Jough 07 May 05

“GWA *only* pre-fetches links specifed as such:
{link rel=�prefetch� href=�http://url/to/get/�}”

This is completely false. GWA prefetches pretty much every link on the page, including all “a hrefs” and links without the rel=”prefetch” attribute.

See some other comments above for more on this.

Will Hayworth 07 May 05

Peter’s right. if(strtoupper($_SERVER[�HTTP_X_MOZ�]) == �prefetch�) wouldn’t work at all. You just need the following:

if($_SERVER[�HTTP_X_MOZ�]) == �prefetch�){
//insert 403 Forbidden code here
}

Oh, and as an alternative to giving a 403 Forbidden, you could just have your PHP script die with the appropriate message:

if($_SERVER[�HTTP_X_MOZ�]) == �prefetch�){
die(‘To prevent undesired actions from taking place, prefetching this page is not allowed.’);
}

Sean 07 May 05

“The other, much easier solution is to DO NOTHING.

GWA *only* pre-fetches links specifed as such:
{link rel=�prefetch� href=�http://url/to/get/�}”

I’ve seen this isnt the case either. On the public side of one of our sites, I have seen gwa prefetch plain old hrefs and stuff in img tags…

which goes back to my big question of google says it doesn’t prefetch ads… how on earth would it know what is and what isnt an ad?

and to repeat myself again… i think one big problem here is that the “webmaster help” that google provides is inconsistent with itself and horribly short and lacking in almost any usefulness.

whether sites follow best practices or not… google releasing a beta means it is going to make it into a lot of people hands and they should have a better help up to explain to those who are going to be effected by it, what those effects will be.

even if your site follows best practices… this has an effect on it and google does a really bad job of explaining what they are

points 1 and 2 in the webmaster certainly seem to contradict each other

and point 6 really needs an explaination..

our solution… until google explains what gwa *really* does… we aren’t allowing access from it

and no we dont have destructive gets…

Carl Klapper 07 May 05

Of course, there would not be a problem if the web was used as it was intended: a hypertext document retrieval system. Then again, Google uses the web for applications, not only for gmail but for Adwords, with far more serious consequences.

BTW, I can not get to Google now. Is anyone else experiencing this? If so, is it related to this WebAccelerator issue?

Jason 07 May 05

It’s interesting — I’ll admit that despite developing web apps for going on ten years now, I haven’t put a ton of time into thinking about GET vs. POST and idempotency. That being said, I also can’t think of a single web app I use regularly — and that includes Ta-da Lists, Basecamp, Gmail Google Mail, Movable Type, MetaFilter, and on and on — that don’t use an HTTP GET for their logout link. That’s certainly a non-idempotent function, yet it’s almost universally tucked behind a normal {a href}.

(Incidentally, this very thread confirms my suspicion that a huge chunk of people have all but ceased reading any comments posted before their own; the very same tracts of text are cut-and-pasted into the comments here as if they’re brand new, and well after people have reported that GWA in fact does prefetch content other than that explicitly {link}ed to with a prefetch tag, people are posting “hey, it only prefetches things that you tell it to!” What a grand way to continue a discussion…)

brian 07 May 05

Google Web Accelerator did NOT break your application, GWA was just the first to point out that it has always been broken!

All this information about detecting �HTTP_X_MOZ� or user-agents and sending 403 is purely reactive! This may solve the problem today, but doesn’t protect you from other applications in the future that don’t pass HTTP headers and prefetch or download sites for offline viewing, etc. WGET, for instance, can use cookies and will spider and fetch all the links on your site.

GREAT, Rails now protects against GWA, so it can happily and blissfully continue ignoring this secuirty hole and continue to use GET requests, javascript, and AJAX to delete data. This is just a patch on a problem. It doesn’t fix it - Backpack IS STILL BROKEN!

A preventitive system NEEDS to be put into place. That is the only way to keep this sort of thing from happening again.

AJAX and all this fancy “HTML slight-of-hand” needs to be redesign or dropped!

The irony of the whole thing, is the only browsers effected by this are the newest of the new that can install GWA and can run javascript, etc. - The browsers and users NOT effected by this are the ones that Backpack no longer supports.

pb 07 May 05

GWA in fact does prefetch content other than that explicitly {link}ed to with a prefetch tag, people are posting �hey, it only prefetches things that you tell it to!� What a grand way to continue a discussion�)

People have said that it has but I’m totally convinced that it has. First, Google says it only prefetches links marked as such. Second, do people realize that the Google agent is grabbing pages for reasons other than pre-fetching?

I actually have read nearly all the comments. It’s clear to me that Google is NOT prefetching entire page-fulls of links.

Anonymous Coward 08 May 05

Without regards to what google does or does not do, since you can’t nest forms, how do those suggesting you should not use gets (or specifically, a href with a query string) suggest interspersing a postable form that is a list of items with associated fields and links that take actions with regards to individual items (such as purchasing them in the case of ecomerce or sending out event alerts in the case of a service/alarm monitoring app)?

For instance, http://www.amazon.com/gp/registry/wishlist , heaven help you if GWA does in fact tag all a href s (something I’m not sure about) and you have one-click turned on…

Similar things are implied for sites that for performance and cost reasons place rate limits on how often you can view something (search operations in some board applications that offer a list of historical searches but limit a user to one search per minute). In that case, even though there is no permenant state change, the first click in the minute works, all others fail. Hope your accelerator guess correctly as to what you wanted to view…

pb 08 May 05

Without regards to what google does or does not do

That’s kind of an important point. If GWA only prefetches links that specify pre-fetching (which is what it claims to do), then all of this is a non-issue.

00blogger 08 May 05

time for tuning and security improvements - google removed download link to the web accelerator and thus de-accelerated distribution of their new tool.

Mark Nottingham 08 May 05

As I’m sure many have told you by now, that’s not disrespect; it’s the way the Web is designed. GET shouldn’t have side effects (like deleting things); that’s what POST (and other methods) are for. This allows Web caches to safely store copies, spiders to safely look for content, and so forth.

A properly-designed Web application won’t have any problem with Google Accelerator. Making assumptions about Javascript support and the nature of the software looking at your Web site is a recipe for disaster.

Cheers,

SU 08 May 05

Making assumptions about Javascript support and the nature of the software looking at your Web site is a recipe for disaster.

We’re getting to the point where web applications are advancing so far beyond the traditional ask/redraw/receive paradigm that I’m not so sure we need to make every DHTML application equally accessible. Is every single Windows application available on the Mac? Sometimes, you’ve just got to draw a line in the sand.

Thomas 08 May 05

Brian:

Ajax isn’t the problem here, because:

1. You can send Ajax POST requests.
2. Ajax calls are usually triggered by JS events, and thus safe from GWA prefetching.

Anonymous Coward 08 May 05

how do those suggesting you should not use gets (or specifically, a href with a query string) suggest interspersing a postable form that is a list of items with associated fields and links that take actions with regards to individual items (such as purchasing them in the case of ecomerce or sending out event alerts in the case of a service/alarm monitoring app)

I’m not sure I understand your question, nor can I see what you mean from the page you linked to. However, if you happen to mean how can you have a form with multiple items which can have different actions such as delete or edit done to each, and they’re to be done in bulk, instead of individually… what’s wrong with using different submit buttons for the different actions, and chechboxes to select the items to do the actions to?

Lach 08 May 05

how do those suggesting you should not use gets (or specifically, a href with a query string) suggest interspersing a postable form that is a list of items with associated fields and links that take actions with regards to individual items (such as purchasing them in the case of ecomerce or sending out event alerts in the case of a service/alarm monitoring app)

I’m not sure I understand your question, nor can I see what you mean from the page you linked to. However, if you happen to mean how can you have a form with multiple items which can have different actions such as delete or edit done to each, and they’re to be done in bulk, instead of individually… what’s wrong with using different submit buttons for the different actions, and chechboxes to select the items to do the actions to?

Lach 08 May 05

Sorry about the double post. Thought I’d cancelled the first submission in time.

isildursheir 08 May 05

i havn’t seen much of a speed upgrade anyways, so i’m just uninstalling it

bkk_mike 09 May 05

The problem is if it’s prefetching every link with a get on a page, I have intranet web apps that return tables from databases, with links to allow you to search down further, implemented as gets that go off and do another database query.

If it’s prefetching every get on a page, even where those gets in themselves are doing nothing nasty, the fact that a large number of gets is being kicked off will hit whatever is behind those requests. i.e. the database server.

We need something like a “rel=no-prefetch” to avoid hammering intranet database servers. (It’s one thing on the public web where the web sites are built to cope with the possibility of large numbers of users, but where it’s built for 5 or 10, it can result in performance issues).

Martin 09 May 05

Hmmm, I too live in the dubious shadow world of trying to get people to listen to the good of our glorious webstandards (Yes, I’m trying to make you puke with the overzealous argumentation so common these days) and completely agree that you “should not” use get for anything other then ‘getting’ static information.
But then I stumble from Utopia, back into ‘the real world’. Yes, it’s out there. This is the world where our homepages contain ‘the latest news’, which is hardly static where I live, and blogs and many other new informative pieces of content. Isn’t it funny, I don’t see how I’d be getting this, obviously, non-static information with a post?!?
And then off course, if we ignore this rather rediculous piece of ‘argumentation’ how about, ‘design’, or even worse, ‘what the client wants’.
Notice that a form-button has verry limited support for styling it and javascript is being marked as an evil security-risk. Both of these I issues I need to cater to. I have to make it look like the designer wants, I can tweak it, suggest alternatives, but in the end it isn’t my call, and I must try and be as user-friendly as I can. So, since the dangerous link is in the admin area, I decide to use the GET to do something it ‘should not’ be used for. Enter the world GWA. I now have to fix it, taking into accound:
‘What the client wants’. I wish our clients would cater to theirs, but they don’t, they decide what they want and this often includes things that “just don’t work that way”. But no is hard to sell and a smart developer gets a lot of goodwill by working his way around these obstacles.

In the end I’m quite happy to see all these opinion, just like when I’m happy reading about the many other browser-flaws, in the end, that’s all it is to me, a browser flaw, something to work around when I can, with whatever standard or non-standard tool I can use to get the job done, that’s what I’m being payed to do after all.

Timwi 09 May 05

This is why some of the HTTP RFCs recommend that GET requests should always be idempotent, i.e. never perform a lasting action, and that all actions should be coded as forms that generate a POST request. Google Accelerator is perfectly reasonable in assuming that webmasters would follow these recommendations in the same way they follow the w3c’s recommendations regarding HTML and CSS, and webmasters who don’t, simply deserve to face this problem.

pb 09 May 05

The problem is if it�s prefetching every link with a get on a page

GWA DOES NOT PREFETCH EVERY LINK OR GET ON A PAGE!!!!

It only prefetches links that explicitly request to be prefetched in this format:
{link rel=�prefetch� href=�http://url/to/get/�}

Randy 09 May 05

pb, you’re wrong.

pb 09 May 05

Care to offer up any evidence?

I created a web page with links and form gets, visited that page, checked my logs and didn’t see any accesses of the other pages.

I suspect 37’s experiences were cause by other GWA behaviors. Everyone else has been speculating. Out of the 134 posts, only a few have actually claimed to have seen the behavior. It only gets two (speculative) mentions on the 792 post Slashdot thread.

Frank 09 May 05

My concern, is a privacy one of course, because google is able to see all sites one visits and can easily build a profile…

Don Wilson 09 May 05

pb,

it DOES fetch links surrounding the link you just clicked. If you turn on the double-underline feature that shows you what links have been cached you’ll see what I mean. It’s highly unlikely the admin area of a dating site has prefetch in the links.

pb 09 May 05

OK, I stand mostly corrected. However, it still appears to me that GWA does not pre-fetch *every* link. Significantly, Google (Marissa Mayer) indicated that it does not pre-fetch links with a “?” in it. This should minimize the issue greatly.

matthew 10 May 05

not to keep kicking this, but i also was unable to mimic the behavior that i’ve been reading about (i tested the GWA on a test version of phpMyAdmin and it did not rampantly delete everything)

if pb is right, and google isn’t prefetching links with a “?” in it, then the problem may be from the current trend to rewrite url’s in an effort to pretty them up (i.e. instead of website.com/index.php?action=delete&id=2, you read website.com/delete/2/). if that is what’s going on, then i’m so happy i never cared enough to do that.

Jeff 10 May 05

I’ll bet they programmed it NOT to pre-load AdWords links.

If it pre-clicked ads for you, I’d be promoting it like crazy!!!

Richard 10 May 05

Speaking as a writer of web based applications, it wouldn’t have occured to me that I ought to read the HTTP RFC to discover this advised usage of GET and POST. Furthermore, before reading this comment thread, I’d never seen the advised usage of GET and POST referred to anywhere else. Nor had I ever heard the word “idempotent” used before.

You live and learn I suppose, but I shouldn’t think I am alone in this.

Chris Larsen 10 May 05

That is wildly irresponsible of Google. That will increase global bandwidth consumption significantly, raising hosting costs for everyone and clogging the Internet with requests that aren’t actually asked for. Wow! That’s really lame. I wonder if it was a mistake or if it’s a deliberate attempt to create bugs in the web applications of their competitors. Pre-fetching links is so short-sited. Google? How could you?
Maybe these third party browser add-ons are a really bad idea.

matthew 10 May 05

i concur with richard. i also never read the RFC spec’s to determine what is or is not “proper” in a web application

David 11 May 05

Like said before…Sue Google…LMAO

James Harvard 11 May 05

If you feed the GWA a ‘prefetching prohibited’ message then surely the user will the be blocked from deliberately deleting / whatever because they’ll only get your anti-GWA message?

It should be relatively simple to make the GWA more conservative in its behaviour (yes, the affected web apps are broken and I’ve only limited sympathy for their creators, but even the 1 ton Google Gorilla (TM) has to live in the real world). The GWA could have a blacklist of sites that it doesn’t prefetch on, adding any site using HTTP authentication or which has used a POST request.

According to Google’s web accelerator page they aren’t offering it for download any more because they have, ahem, reached their “maximum capacity of users”. You have to wonder if that’s damage-limitation-speak for “everybody’s shouting at us”.

brandos 11 May 05

yea
lost your brain on pills in the 90’s ?

Greg W 11 May 05

Bypassing logins:

A few folks have commented that GWA allows getting around logins to view pages that are supposed to be protected. I don’t see how this is possible if those pages are “properly” coded (and maybe that’s the point). In my apps, a login generates a server side user profile. Every page that is supposed to be protected includes server-side code to test the presence and validity of that profile. I don’t see how any automated cacher, clicker, fetcher can defeat that, but I don’t want to write it off as not possible either. My pages won’t even display a navigation menu if you’re not authorized for that page, so there’s no links (except to go back to the login page) to even follow.

Can anyone share how their logins were circumvented, and how you’ve coded your “protected” pages?

pb 11 May 05

Does anyone know if GWA does or is able to serve the cookie when it makes its pre-fetch request?

hink 11 May 05

Seems like any web app worth its salt would have a defined array of headers to immediately rejected or redirected, eh?

R Martin Ladner 12 May 05

- Others who have written about this problem have made the point that if your application uses a permanent cookie to identify you, and if you view a page on another site that “happens” to have links to your site, links you might not even see, WebAccelerator will execute those links.
- They have also written that operations that change data “should” use “post” rather than get.
- Unfortunately, for the many sites that use fusebox’s FormURL2Attributes tag, it doesn’t matter if you intended the data to be posted from a form. If a link contains the necessary information, the fusebox application will blithely accept the input as if it came from a form.
- So, if you’re logged in (either through current action in another window or by the magic of a cookie that remains on your browser to keep you from having to log in), then browsing other sites with WebAccelerator can be hazardous to your application; especially if you’ve ticked off anyone who has a general idea of how your application works. =Marty=

Matt Shadbolt 12 May 05

What about my google adwords? im paying per-click. If the links are loaded in the background of the page the user may be viewing do i pay for page visits that users havent really clicked??

Freehawk 12 May 05

I am trying it, no problems yet. It tells you how much time it has saved. If you take its word for it.
This reminds me of a program I downloaded (I admit it) in the bulletin board days. It said it would upgrade your computer to some (then) unheard of speed. You download it (over 28.8 dial up) and then launch it, and a little dialog box runs, impressive looking things scroll by telling you it is working, and then it says “congratulations, your computer processing speed is now X.” Of course, it did nothing.

Mark Baker 12 May 05

Regarding the “SHOULD isn’t MUST” argument, that’s really immaterial. As the spec says;

” Naturally, it is not possible to ensure that the server does not
generate side-effects as a result of performing a GET request; in
fact, some dynamic resources consider that a feature. The important
distinction here is that the user did not request the side-effects,
so therefore cannot be held accountable for them.

In other words, the user, using GWA as its agent, can’t be held accountable for a change made via a Web app that uses GET to change state. Consider that I haven’t heard anybody complain about link counters (which change state on GET) going up too much.

Aral Balkan 13 May 05

Good thing RIAs do not have to worry about such things. Maintaining state on the client is a beautiful thing :)

Makedoneco 13 May 05

I cannot download google web accelerator because it on a site already is not present, but I wish to try this accelerator. Let who that to send me this exe a file on e-mail: [email protected] I thank preliminary

mehdi 14 May 05

salam

saeed 18 May 05

googel web accelerator

paolo 23 May 05

mon pc il est tres lente je veudre bien telecherge accelerateur web merci

Iacovos 28 May 05

Please send me the web acceleratror at [email protected], because google does not allow anybody to download it, due to the increased number of users testing it. Please send me the .exe file….
Thanx

shahab 29 May 05

salam

xur17 29 May 05

Guys, this is still in beta in case you haven’t noticed. It hasn’t officially been released yet. We know that it is going to have problems.

Xedium 31 May 05

I don’t think Google would let their own accelarator click on their own ads. I just hope Google does a little better job differentiating from what it should and should NOT click on.

Danaro 01 Jun 05

Google!

JA 02 Jun 05

GET vs. POST is not hidden knowledge. From O’Reilly’s Java Servlet Programming of all places:

Just remember that GET requests, because they can be bookmarked so easily, should not be allowed to cause a change on the server for which the client could be held responsible. In other words, GET requests should not be used to place an order, update a database, or take an explicit client action in any way.

anonymous 03 Jun 05

I dont know anything about comuters

Anonymou 03 Jun 05

Trent blanchette is a homosexual

Trent Blanchette 03 Jun 05

I only like computer guys

mehdi 07 Jun 05

SALAM

bobby ward 11 Jun 05

google is no worse than anyone else, after all, ford made the edsel?

dc0514 13 Jun 05

i think this is the best one in my mind!

Ally 15 Jun 05

I’m creating a database that has multiple pages…. When I pull an employee number up on hte first page… I want all the other pages to automatically pull up the employee numberand all the info for that employee per page….. My problem is that I believe me sql statement is working but I’m confused on the get and post. Not sure if it’s working or not?

here is my sql statement:

$SQL = “SELECT employee.emp_id, parka.*”;
$SQL .= “FROM employee, parka”;
$SQL .= “WHERE (parka.u_id = ‘{$_REQUEST[‘emp_id’]}’)”;

Duncan Simpson 17 Jun 05

I am wiring a web application where both GET and POST can, and sometimes do have side effects. This does not cross any MUST NOT lines in the any RFC. In particular there is a navigation bar, which are hyperlinks to uncachable cgi pages, one of which is the logout page.

In any event fetching a page invalidates your cookie and replaces it with a new one, which gets modified to the next value the server will acecpt by java and javascript. If GWA fetches a page and then ignores the java applet that woiuld be a major problem.

Following robots.txt would avoid these dangers.

Benjamin Cutler 24 Jun 05

Oops, I forgot to fill the href for the second link. Now I can’t find the article I was going to link to. Essentially somebody can attack your site by linking to a GET form, since the form data is stored in a URL.

Benjamin Cutler 24 Jun 05

Here is an article describing a possible attack that is the result of using a GET when you should be using a POST.

Benjamin Cutler 24 Jun 05

Here is a link describing the Cross-site request forgery (CSRF) attack. It turns out that if a browser has javascript enabled, a POST form can be desguised as a link, allowing the CSRF to attack POST forms as well as GET forms. Read the link to learn how to protect yourself from this vulnerablity.

Nick Timmermann 29 Jun 05

Does anyone know where I could download this…I would very much like to experiment and test it on a graveyard machine. They have taken away the download from their website, and a link to the file, or an email would be very helpful. I work for Saint Louis University, and we would like to see more or less how it could benefit/harm our site.

Thank You

Bob Gurch 30 Aug 05

Say what?

Azu 23 Sep 05

*Looks down at bottum of page* OMFG stupid spammer people o_O

~Back on topic~

So ya.. to sum it up:

Google web accelerator will click on every single link on your browser and it will record ALL internet activity from the browser (Including passwords, encrypted stuff, all of it), and other people (Hackers mainly) could be able to access that information.

It might make your internet run about 10% faster.. but is it worth it? Up to you..

rwe 27 Sep 05

ewe

Anonymous Coward 30 Sep 05

slt

Anonymous Coward 05 Oct 05

I would say its a great application.

Regards,

Admans.

Make Money Making Websites.

online casino btdino 23 Oct 05

online casino btdino