From the BlogSubscribe Now

The DaveN Blog Hack Raises Questions On Google Blacklist API

When Dave Naylor first reported some issues with his blog, I fired up SERPGuard.com to check out his URL. Our tool interacts with Google’s API to find compromised sites which are listed on their malware or phishing blacklists. Dave’s pages were being dropped from Google’s SERPs, which raised my suspicions he had been blacklisted but the result was negative. google safe browsing diagnostic page for wwwdavidnaylorcouk 150x150 The DaveN Blog Hack Raises Questions On Google Blacklist API  You can check here manually (screenshot) and at StopBadWare’s clearing list.

What is the current listing status for www.davidnaylor.co.uk/?

This site is not currently listed as suspicious.

What happened when Google visited this site?

Of the 29 pages we tested on the site over the past 90 days, 0 page(s) resulted in malicious software being downloaded and installed without user consent. The last time Google visited this site was on 09/18/2008, and suspicious content was never found on this site within the past 90 days.

Has this site acted as an intermediary resulting in further distribution of malware?

Over the past 90 days, www.davidnaylor.co.uk/ did not appear to function as an intermediary for the infection of any sites.

Has this site hosted malware?

No, this site has not hosted malicious software over the past 90 days.

OK, so we assume Google did not see Dave’s site as compromised. This would explain why Dave did not get any warning messages from Webmaster Central. Both that system and SERPGuard work on the Google Safe Browsing API and without a positive listing of DaveNaylor.co.uk – neither would have reason to send out an alert.

According to Dave Naylor’s latest blog post, his site appears to have suffered a text-book hack.

While I was in the meeting Becky texted me to say they had found something Patrick at Blogstorm ( I’m not linking out just in case I pass bad karma) and Josh from JaeWeb, had spotted an issue. It was spot on, the server had been comprised and the site was cloaking links to google of antidepressant drugs and we had a fake adsense code injected into the blog.

Patrick Altoft discovered a useful way of detecting compromises in your blog, by setting up Google Alerts on key terms. I imagine one of these searches revealed the cloaked pages that Dave had unknowingly served to Google.

The real advantage of Patrick’s technique, is that it might be able to catch this kind of compromise before the site makes it onto Google’s blacklists. I’ve always assumed that SERP penalties would only be applied after the site is listed by Google as being compromised.

So the real question here is why would Google apply a penalty without first listing DaveNaylor.co.uk? Is this a case of the left hand not knowing what the right is doing? ie. are malware/phishing penalties applied irrespective of the Safe Browsing list? Or does Google not consider this incident malware/phishing related?

There are still a lot of unanswered questions for me with this incident. I look forward to Dave revealing more about the hack and any feedback Google can give on why he was penalised but not listed? If the Safe Browsing project is to have any use to webmasters, we should know it accurately reflects Google’s opinion of the site.

Nick Wilsdon is the Head of Content and Media at iProspect UK, part of the Densu Aegis Network. He manages online campaigns for the UK's leading telecom, finance and FMCG brands.

Comments

  1. There are a lot of different kinds of hacks. In this case, it’s a textbook version of a SEO hack: someone placed hidden links with keyword-heavy anchor texts on the pages in the site. You can still see that in the cached versions of some of these pages. Since this kind of hack does not contain malware, it’s not something that the malware diagnostics page will warn you about. It doesn’t harm a user to open a page with hidden content on it :).

    We notify webmasters through Webmaster Tools when we spot malware, but general hidden/hacked content like this is hard to spot algorithmically with certainty, so we can’t always notify the webmaster about it.

    One way of recognizing this would be to regularly check Webmaster Tools, Statistics / “What Googlebot Sees.” On that page is generally a section for “Keywords in your site’s content.” If everything is ok, you should see keywords there that match what you write about. If it’s hacked with hidden text/links, you’ll likely see things that match the hidden content, like various kinds of pills, porn phrases, etc. Maybe this should be something that webmasters should be doing on a regular basis? :-)

    You can achieve the same by setting up Google alerts for those kinds keywords on your site. I like to do that anyway, just to be sure that I don’t accidentally miss anything (it could also bring up spammy comments that you can remove, etc).

    I hate it when sites get hacked like this (and there are a LOT of sites hacked like that) and honestly, I don’t know how the search engines should react when we spot hidden content like that. On the one hand, the site was compromised and ANYTHING could be placed on it by the hackers, including cloaked redirects to pornographic sites and malware; on the other hand, the webmaster is often not aware of it and we don’t want to be too harsh.

    What do you think?

  2. Nick Wilsdon says:

    @John

    Thanks, you helped clarify the difference here between malware and hidden links. I see now why the Google Alert strategy is not just complementary but an essential part of your site defence.

    It’s a tricky one this. Most normal users are going to be assuming the Google Safe Browsing API picks up these kind of text-book hacks. I’ve found the easiest way to explain these new tools are “anti-virus protection” for your websites.

    On the one hand, yes I think people would like to know if you pick up that kind of material on their sites. In Dave’s case here, a warning email or alert on the API would have really helped.

    On the other hand, like you say, it’s going to be hard to work out intent here. If someone has chosen to link out to these places, it would be harsh to stop their traffic with the interstitial page. Especially as the most recent copy doesn’t seem to offer any way to continue to the site.

    So I’d be in favour of warnings but not the interstitial page for these kind of situations.

    The Google Alerts service is great but it’s still left to the owner to set this up. I know a lot of people who have no idea about GWC – let alone how to set up alerts or check for keywords. It would be great to make this easier for people somehow. Maybe it could be built into the API with a public list of “suspicious” words (Cialis, Prozac etc.)? People are lazy and I think most just want this kind of “anti-virus” protection, where they just enter their URL and applications flag up issues.

    The API seems a great step in this direction, as it allows so many people and applications to build in these kind of reports. Getting them to set up Google Alerts for their users is going to be too much work. Also a centralized “suspicious” list would be updated more regularly as new keywords appear (drug brand names etc.). It would be interesting to see how we all agree on the list though! Maybe we need to do it like the RBLs, certain people create their own filter list which can be easy or hard and that interacts with the API in some way.

  3. Yep, it’s certainly complicated :-)

    I like the idea of an automatic alert spammy-keyword-list, but I’m somewhat worried how it would look if Google were to provide something like that. I have a feeling this would make a great service by someone outside of Google, it could be a kind of wiki that collects this information and helps users to set up appropriate Google Alerts. Since these alerts just send you an email, the occasional false positive wouldn’t be much of a problem.

    John

  4. Nick, could I ask you for a favor – a friend runs a noncommercial site, and he suddenly got a massive penalty in G. My hunch is that he got cracked n there’s hidden (probably cloaked) stuff on the domain, but GWT only lists crawl errors (the static, WP generated URLs haven’t changed…) . Think he could get his domain analyzed free by SERPGuard?

Click on a tab to select how you'd like to leave your comment

Speak Your Mind

*