GWA shines light on Google privacy concerns David 24 Oct 2005

39 comments Latest by Martyn

So you thought the Google Web Accellerator was bad just because it destroys all your data? Try reading their privacy policy. Especially the two choice bits about how all your traffic and all your cookies are belong to Google:

When you use Google Web Accelerator, Google servers automatically receive and log your web requests. Web requests and data sent in encrypted form using an HTTPS connection will not go through Google. It is possible that a URL or other page information sent to Google may itself contain personal information… Google temporarily caches cookies from third party sites that are used in your web requests.

So let’s recap. Google…

Oh, and btw, they’re an advertising company.

Now I usually care as little about conspiracy theories as the rest of you, but I’m starting to grow a little uncomfortable with the one ring to bind them all in darkness.

39 comments so far (Jump to latest)

JF 24 Oct 05

Wait… Does this mean *anything* you look at through a web browser that isn’t protected by SSL will be indexed by Google? Even pages not fit for public consumption yet? Even files on your own computer? Even files on your non-SSL password protected intranet?

Google 24 Oct 05

We are just acting in your best interest.

(If you are a shareholder. Mwhaaaaaaaaaa!)

Charlie 24 Oct 05

They also quietly purchased the web metrics company Urchin just before the Web Accelerator came out.

Anonymous Coward 24 Oct 05

Does this mean *anything* you look at through a web browser that isn�t protected by SSL will be indexed by Google?

I didn’t see anything about them indexing it.

JF 24 Oct 05

“Web requests and data sent in encrypted form using an HTTPS connection will not go through Google.”

So what does it mean to “go through Google” if they aren’t indexing the pages? Maybe I’m worried about nothing, but “going through” requires some clarification.

Steve 24 Oct 05

Now… I always thought that Big Brother was “the Government”… not “the Google.”

Maybe I just got my G’s mixed up.

Jan 24 Oct 05

Oh please, let’s not go overboard with the whole “google is evil” stuff. They have personal information on their server. Big deal, so does 37signals. Very sensitive information such as business strategies and such.
I haven’t seen proof yet on how either you or google (or any other company I’m trusting sensitive data to) is abusing this information. No proof, not guilty.

To be honest, I think prefetching and web accelerators are a good idea in theory. It’s just that too many sites ignore standards completely. GETs are sometimes destructive despite the HTTP specification.

JF 24 Oct 05

Jan, I’m not of the “Google is evil” mindset, but I am worried about the *potential* of having Google see/index things that aren’t meant for Google or the public to see. If they don’t do this then that’s perfect, I’m not worried.

Jamie 24 Oct 05

They’re not forcing you to use it right? Don’t use it.

JF 24 Oct 05

I’m not using it, but plenty of people are who I promise you don’t know the implications of using it.

Scott M. 24 Oct 05

Oh please, let�s not go overboard with the whole �google is evil� stuff.

Well, Google’s corporate mantra is “Don’t be evil.” So I think it’s perfectly acceptable to hold them to it.

David Heinemeier Hansson 24 Oct 05

I surely don’t think Google as whole is evil. That’s ludicrous. But I do think the GWA is pretty evil and that the privacy questions that the combination of all these inputs create are concerning.

Lee 24 Oct 05

Could you explain how the accelerator destroys all my data?

Thanks.

Jason Johnson 24 Oct 05

Here’s the catch: you don’t have to use it.

Aside from that, regardless of your level of skepticism of their intentions, it is a totally kickin’ service. Just as their mission claims they are trying to do, they are making the worlds information a little more accessible. Maybe they haven’t indexed it yet, maybe it isn’t ranked very high yet, or whatever, but they’re going to help you look at it just a little faster? That’s… well… kinda neat, right?

…oh, and you think your ISP doesn’t already know where you’re going? You think they don’t log traffic flying across their switch? You bet your matchsticks they do (though, probably not to the degree Google does). And they don’t even need you to install software, either.

matrin 24 Oct 05

Does it not worry you that G=6 oo gg=66, and 3d Timecube? Haven’t we been warned? I say no now!

JF 24 Oct 05

The difference with Google knowing vs. an ISP knowing is that you can’t search your ISP but you can search Google. I just don’t want to see stuff ending up in Google that isn’t supposed to be there. That’s my biggest worry.

David Heinemeier Hansson 24 Oct 05

Lee: Using GWA with any web application that has destructive/side-effect GETs will cause mishaps/destroyed data. The worst example is something like phpMyAdmin, a database administration tool, which will zap your entire database if used together with GWA.

Gene Crawford 24 Oct 05

I’ll second that, who says you have to use it… It is free afterall right?

Sam Sherwood 24 Oct 05

Yikes! Google bought Urchin? That’s actually big news, as Urchin is one best web metrics companies I’ve ever dealt with, in both price and customer service.

Gene Crawford 24 Oct 05

I’ll second that, who says you have to use it… It is free afterall right?

pwb 24 Oct 05

“temporarily caches” = “owns” ??

pwb 24 Oct 05

Enough with the “you don’t have to use it” drivel. Fact is a lot of people will use it wihtout understanding all the nuances of what’s going on. It’s useful for a sub-set of people to really try and understand what the implications are and to report that analysis. “Just don’t use it” is not even remotely helpful advice.

Tim Uruski 24 Oct 05

Agreed. The sentiment of the “just don’t use it” is lost when you’re on the receiving end of the emails asking where someone’s data mysteriously vanished to.

Eric 24 Oct 05

Well no one has attempted to address some of Jason’s spooky musings, so I will. First, let me quote him.

Even files on your own computer? Even files on your non-SSL password protected intranet?

The answer is basically “no” to both questions, but let me address them in more detail below. [And I sincerely hope that Jason is asking these questions out of true ignorance and not trying to play a FUD card. Perhaps David, whom I’m sure could anwer the questions, has been ironing his shirts all these hours.]

As for files on your own computer, the answer is no. First they’re not accessed via HTTP, as the URLs typically are “file://…” and not “http://…”. Second, even if you are accessing them via “http://…”, you’re probably using a loop-back address (i.e., 127.0.0.1 OR localhost) or private network IP addresses (e.g., 192.168.x.y, 172.16.x.y, or 10.x.y.z). Such would not be fetchable by Google’s servers.

As for stuff on your local intranet, the same would apply. If someone sitting outside your intranet could access them, then Google could. If no one outside your intranet can, then neither can Google. Again, it will come down to an issue of whether you’re using the private IP address space (see above) or whether you use a firewall to protect access from the internet into the intrarnet.

JF 24 Oct 05

Eric, what about people who access password protected but not SSL encrypted content on the web? This is content that is normally blocked by public view by way of a password.

Take for example Moveable Type. Password protected, but not SSL encrypted. What if we use our installation of Moveable Type for private business purposes? Same with many web stats packages. There’s plenty of web-based software out there that’s password protected but not encrypted.

Will Google index that content? It’s available via http and it’s password protected, but it’s not SSL encrypted.

not my real name 24 Oct 05

I think Eric’s point here is that google can, and will index anything that it can access. It can’t access something inaccessible from an outside address (lan/private network/etc), and neither can it access something password protected w/o the password.

As I understand it, GWA is only reporting your “web requests” back to the server. That is, logging into 37signals.com/svn/mt.cgi (or whevever you have your blogging software loaded) will only report back that you have visited that url. It won’t then send the html/etc back to google. If google is so inclined, at that point it may decide to try and index the urls you’ve sent back from entering a blog post, but it’ll just reach the username/password screen and won’t be able to get any further.

What I feel is a concern here is that it’s looking at the cookies from “third party sites”. I don’t see any reason it should be looking at *any* cookies. It certainly should never look at cookies from your first party site, as those can often contain usernames and passwords, and whatever else lazy developers stick in there instead of doing something smart, but that’s another rant for another day.

not my real name 24 Oct 05

this will kill any security that obfuscation once provided. which if i remember correctly in some circumstances, some 37s products employs.

Chris Nolan.ca 25 Oct 05

If you haven’t read it already you should check out ‘The Search’ by John Battelle. The talk of the clickstream in there will be greatly improved by the web accellerator. If anything just check out the last two or three chapters.

Stephen 25 Oct 05

Google knows what you did last summer!!!

And the previous 23 also…

Martin 25 Oct 05

When you use Google Web Accelerator, Google servers automatically receive and log your web requests. … Google temporarily caches cookies from third party sites that are used in your web requests.

This is no surprise, really. And it actually describes the way GWA works.

GWA is a local proxy, installed on your box. This proxy communicates with proxys in Google’s network. So when it does the prefetching, it does it indirectly through Google. The two proxys communicate with compression and Google also presumably sends diffs only to pages known by the local part. This is how the acceleration works besides the prefetching.

To do this, Google does need to see the request as well as the cookies. No surprise, really.

Anonymous Coward 25 Oct 05

Lode you (and others) continue to miss the point. When you use Basecamp or Blinksale or whatever other web app you use, you are knowingly placing your trust with those services. When you use GWA things are happening and you don’t know they’re happening. GWA’s pitch is “we’ll make the web faster for you” but the tradeoff is that Google tracks every site you visit, caches your cookies with personal information, and who knows what else. There’s a significant difference here.

PJ Hyett 25 Oct 05

I also let them know what RSS feeds I’m subscribed to (Google Reader). Google produces fantastic products and I will continue to use them conspiracy theory or not.

Matthew Lock 25 Oct 05

> Oh, and btw, they�re an advertising company

No, Google is a software company that sells advertising space on their site. An advertising company actually creates the advert content and runs the campaigns. Like all the ones on Maddison Avenue. Google doesn’t do this, it just sells the space, just like newspapers do.

Adam 25 Oct 05

TCP/IP packets - not packages.

Ok… rant over. Sorry everybody.

Martyn 02 Nov 05

If you are saying that Google’s Web Accelerator is evil, then that must mean that AOL’s Top Speed is evil as well, AOL force you to use it, it is part of the AOL Software. AOL’s Search Engine is powered by Google. Are AOL & Google in on it together to try to take over the world and put Microsoft out of business. I cannot wait until Google release their own Operating System, then they WILL own all your data.