Blog publishing services typically propagate updates about new posts from blogs (ergo, new blogs too) by pinging or publishing a changes.xml file. But what none of the services provide is an "un-ping" -- blog indexing services such as Technorati don't know when a blog has been deleted from a service. I noticed this today when I found http://blogtrarian.blogspot.com/ participating in a link farm infesting Blogger's service. This can happen because Google's Blogger recycles URLs; when a blog is removed from the system, the URL is freed for reuse.
That particular URL is one that dates back to 2004, it was dormant for several months but just came to life recently with spam. The historic posts (until August 2005) look like normal blogging fare but the recent posts are clearly just splog content. We'll have to work on "un-pinging" so it's easier to distinguish dormant blogs and dead ones.
spam splog web spam google blogger ping
( May 06 2006, 03:13:14 PM PDT ) PermalinkSo Google's CEO Eric Schmidt says his servers are full, hmm. Tying that to SEO'ers griping about their indexing, Andrew Orlowski speculates that it's web spam besetting big daddy. Could be but the hard data isn't out in the wild. The numbers that we can see are that Google is spending several banana republics worth of GDP on capital expenses:
Google continued to make substantial capital investments, mainly in computer servers, networking equipment and its data centers. It spent $345 million on such items in the first quarter, more than double the level of last year. Yahoo, its closest rival, spent $142 million on capital expenses in the first quarter.
Referring to the sheer volume of Web site information, video and e-mail that Google's servers hold, Schmidt said: "Those machines are full. We have a huge machine crisis." (read more)
If the problem is spam, then certainly it's Google's own doing. The elephant in the room is that the acceleration of web spam everyone's talking about is fueled by AdSense, often aided and abetted by Blogger splogs, Google Pages, Google Base, etc. The spam ecosystem is within Google's capacity to reign in but the don't-be-evil company is making too much money on click fraud with plausible deniability to do anything about it. Is Google having problems handling web spam and "filling up" their machines? Cry me a river, all the way to the bank.
spam google adsense splog web spam
( May 05 2006, 02:09:19 PM PDT ) PermalinkWhen I read the words on
Microsoft yesterday reached a tentative $70 million deal to settle a California class-action antitrust lawsuit, according to a statement by the law firm representing the plaintiffs in the suit.at http://www.satishlive.info/?p=27 I had the distinct sense of deja-vu. So I ran some queries against Technorati's index and sho-nuf, I found the exact same content had already been published by InfoWorld. Ah, there was an attribution at the bottom... but InfoWorld didn't publish under a creative commons license. Looks like blatant theft.
Then I checked the next post (http://www.satishlive.info/?p=28) on that blog and read:
I took a new blog search tool called Sphere for a little spin this morning and found it useful.... hey, didn't I just see that somewhere else? Yep, this time it was PC World and no attribution.
It's safe to surmise that this is kleptotorial laden with AdSense and stuffed into the update stream. I've seen screenscrapes and feedscrapes on splogs before but they're usually easier to identify visually, I had to look more carefully at this to note its spamminess. Is there a market in alerting publishers to copyright infringement? Obviously this stuff should be removed from Technorati's index but is there a more valuable service to publishers that should be provided here? How much would you pay to find out about misappropriations of your content? Is there a market for Technorati to do something like Plagiarism.org to fingerprint blog content?
splog creativecommons copyright spam creative commons plagiarism adsense
( May 04 2006, 09:34:17 PM PDT ) Permalink
The chatter (even art work on flickr) about it is frantic. Thank You Stephen Colbert has 700 links right now (this is a blog that came into being less than 72 hours ago), it's getting about five or ten links per hour at the moment. The videos are the most linked-to youtube reels on Technorati. How wonderful it is to have an administration that is so bad, the opportunities for high humor are so many. Why did we invade Iraq?
stephencolbert colbert flickr youtube technorati bush cspan whitehousecorrespondentsdinner colbertreport comedycentral politicalhumor iraq blogs blogging
( May 02 2006, 09:27:30 PM PDT ) Permalink