I had the wonderful opportunity to participate in Shoemoney’s “Net Income” show on Webmaster Radio on Tuesday, November 21 (listen here if you are so inclined), and we discussed the session at PubCon where Matt Cutts did a quick clickety-click and suddenly knew about all 40+ of the reviewees registered domains, and gave him a little bit of a hard time about it.
What I had taken away from that was that Google is profiling webmasters, and when we hit a certain threshhold in terms of quantity and quality of domains registered, it flags us. Flags us as what? Okay, I don’t know that, and I’m fairly positive no one from Google is going to volunteer that information. However, I did speculate on the possible ramifications in my post the other day.
So on to the update…
Brian B., who was sitting with me at the session, commented on Matt Cutts’ recap and pointed out what I had suspected, which is that Google can see past privacy protection.
…you mentioned that you knew that this person owned 40+ sites, and afterwards when you clicked on the WhoIs info that they were all privacy protected. I think you just let it slip that Google has access to all the WhoIs info regardless of what is protected from public domain. That’s a pretty big statement to make and will probably make a lot of people nervous.
Um, yeah! Makes me nervous and I’m not even doing anything wrong. (I swear!) Matt replied to Brian and explained that he did not employ any voodoo or secret Google magic to determine that the domains the reviewee owned were all privacy protected…
You’re correct up to the whois speculation, Brian B. All I did was take one of the domains and run “whois domain.com” from a command-line and noticed that whois data privacy protection was on for that domain. Then I did the same with 1-2 more domains to verify it. So I wasn’t using any special Google data or tool for noting the whois info was private. Sorry if I gave that impression.
Matt knows that everything he says gets completely over analyzed so I’m sure he wants to quell any rumors before they start a hysterical panic. That being said (cshel places tin foil cap firmly upon her head and secures the chin strap) the sequence of events in that portion of the session make me think the checking the “regular” whois data for privacy protection was an after thought.
Let’s consider the following:
- Matt was giving the reviewee the business about his plethora of domain holdings for a good 5 minutes before the privacy thing got mentioned. I suppose that could be a coincidence…
- How would he have known which domains to check for privacy protection if he didn’t already have the list of domains associated with the reviewee’s business?
If the domains were all associated with the same owner via some other means, like they’re all hosted on the same server or subnet, all have the same name servers, etc. he still would have needed some tool or script that would have run out and collected all of that info quickly and in the manner he needed to make the connection.
I’ve sniffed out covert domain owners in a similar manner before, but it was a pain in the butt and took a long time to manually pick and choose which bits of information might lead me to other domains and contact information. I guess I’m just saying that he had a lot of information VERY quickly for not having a tool that did the leg work for him.
Also, while Matt protests that it’s not a special Google data, I might argue that technically, WHOIS registration data wouldn’t be “special Google” data anyway, as any company that is a licensed registrar would have access to it — and Google became a registrar in 2005 even though they don’t actually do domain registrations for anyone (besides maybe themselves). His statement isn’t false; however, his statement also doesn’t confirm or deny that Google has access to the private registration data.
Conspiracy theories are fun!
In regards to this: “How would he have known which domains to check for privacy protection if he didn’t already have the list of domains associated with the reviewee’s business?”
I decided to find the actual domain being reviewed at the session. And then I checked how easy it would be to find the other domains. The result was approx. 2 minutes and I had a bunch of them.
The process:
– lookup ip for the domain
– check for other domains on ip (none)
– check for domains on surrounding ip’s (1 domain on each)
– check the span of the ip block (1 C class – assigned to the corp. having the first domain)
This can be done by anybody without special tools. MSN live has the ‘ip:’ operator (not perfect). Domaintools.com has a reverse ip lookup tool (better than MSN’s).
Shhh, you’re ruining the conspiracy theory!