<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Nick Wilsdon&#187; Sitemaps</title>
	<atom:link href="http://nickwilsdon.com/tag/sitemaps/feed/" rel="self" type="application/rss+xml" />
	<link>http://nickwilsdon.com</link>
	<description></description>
	<lastBuildDate>Thu, 01 Apr 2010 08:54:01 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Googlebot Strips Default Filenames From Sitemap URLs</title>
		<link>http://nickwilsdon.com/googlebot-strips-default-filenames-from-sitemap-urls/</link>
		<comments>http://nickwilsdon.com/googlebot-strips-default-filenames-from-sitemap-urls/#comments</comments>
		<pubDate>Mon, 22 Sep 2008 11:01:47 +0000</pubDate>
		<dc:creator>Nick Wilsdon</dc:creator>
				<category><![CDATA[Search Marketing]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Webmaster Groups]]></category>
		<category><![CDATA[JohnMu]]></category>
		<category><![CDATA[Sitemaps]]></category>

		<guid isPermaLink="false">http://nickwilsdon.com/?p=478</guid>
		<description><![CDATA[There&#8217;s a useful thread over at Google Webmaster Groups that highlights an issue with default filenames such as index.html and sitemaps. As user edralph888 explains: 
The URL in our sitemap is in the format:
http://www.domain.com/index.html?whatever=value
The problem with Googlebot is that even though that is the URL we put in the sitemap, it doesn&#8217;t use that URL [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a <a href="http://groups.google.com/group/Google_Webmaster_Help-Sitemap/browse_thread/thread/56bc8e9a510bf18b#">useful thread</a> over at Google Webmaster Groups that highlights an issue with default filenames such as index.html and sitemaps. As user <a href="http://groups.google.com/groups/profile?enc_user=m3IF-zYAAAC0ZCEBAysSlShC_gPAdXUZwGmwl9hUoO8dIfBynGz-uYXrXvpq834yOyZV2QFyt2cS4iuyIZmpFukhBkCAv2d3">edralph888</a> explains: </p>
<blockquote><p>The URL in our sitemap is in the format:</p>
<p><code>http://www.domain.com/index.html?whatever=value</code></p>
<p>The problem with Googlebot is that even though that is the URL we put in the sitemap, it doesn&#8217;t use that URL to make the request &#8211; it contracts it down to:</p>
<p><code>http://www.domain.com/?whatever=value</code></p>
<p>So our server sees this &#8216;incorrect&#8217; URL, issues a 301 with the &#8216;correct&#8217; URL (that has the index.html bit in it), but then Googlebot doesn&#8217;t follow that URL faithfully and again tries to request the URL without index.html in the path.  So our server again issues a 301 redirect, with the correct URL and here we go off on our infinite loop. So no wonder we get the error message:</p>
<p><code>URLs not followed.... [sitemap] contained too many redirects. </code></p>
</blockquote>
<p><a href="http://johnmu.com">John Mueller</a>, Webmaster Trends Analyst at Google Zürich replies:</p>
<blockquote><p>In this case it actually is something that we&#8217;re doing &#8212; we strip &#8220;/index.html&#8221; from URLs because that&#8217;s generally irrelevant and only makes the URL longer and look more complicated to the user. We do this when processing the URLs in your Sitemap file so if you *need* to have &#8220;/index.html&#8221; in the URLs, they generally won&#8217;t work like that. At the moment, there is no solution for using these URLs in Sitemap files if you need to have &#8220;/index.html&#8221; in them. I would generally recommend dropping the &#8220;/index.html&#8221; part, but I realize that this is sometimes not easily done.</p>
<p>That said, we will still crawl the website normally, so if those URLs are reachable through a normal web crawl, we&#8217;ll still find and index them normally.</p>
</blockquote>
<p>Useful advice there for anyone putting together a sitemap and wondering why Google was throwing an error on URLs requiring a default filename. I assume this would also apply to the other &#8220;default&#8221; page names such as index.html index.htm index.cgi index.pl index.php index.xhtml, index.asp and perhaps default.html etc.</p>

<!-- start wp-tags-to-technorati 1.01 -->

<p class='technorati-tags'>Technorati Tags: <a class='technorati-link' href='http://technorati.com/tag/Google' rel='tag' target='_self'>Google</a>, <a class='technorati-link' href='http://technorati.com/tag/Google+Webmaster+Groups' rel='tag' target='_self'>Google Webmaster Groups</a>, <a class='technorati-link' href='http://technorati.com/tag/JohnMu' rel='tag' target='_self'>JohnMu</a>, <a class='technorati-link' href='http://technorati.com/tag/Sitemaps' rel='tag' target='_self'>Sitemaps</a></p>

<!-- end wp-tags-to-technorati -->
<img src="http://nickwilsdon.com/?ak_action=api_record_view&id=478&type=feed" alt=" Googlebot Strips Default Filenames From Sitemap URLs  "  title="Googlebot Strips Default Filenames From Sitemap URLs  " />]]></content:encoded>
			<wfw:commentRss>http://nickwilsdon.com/googlebot-strips-default-filenames-from-sitemap-urls/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>GSiteCrawler Now On Ubuntu</title>
		<link>http://nickwilsdon.com/gsitecrawler-now-on-ubuntu/</link>
		<comments>http://nickwilsdon.com/gsitecrawler-now-on-ubuntu/#comments</comments>
		<pubDate>Fri, 15 Aug 2008 09:02:28 +0000</pubDate>
		<dc:creator>Nick Wilsdon</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[GsiteCrawler]]></category>
		<category><![CDATA[JohnMu]]></category>
		<category><![CDATA[Sitemaps]]></category>
		<category><![CDATA[Wine]]></category>

		<guid isPermaLink="false">http://nickwilsdon.com/?p=298</guid>
		<description><![CDATA[I went cold-turkey on Windows at the beginning of this year, installing Ubuntu on my work station. It&#8217;s been frustrating at times, when you have to learn to do tasks again but you soon adapt. One thing I have missed though are some Windows tools and applications. Most have been replaced, even PhotoShop has a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ubuntu.com/"><img class="float-right" title="ubuntulogo" src="http://nickwilsdon.com/wp-content/uploads/2008/08/ubuntulogo.png" alt="ubuntulogo GSiteCrawler Now On Ubuntu" width="202" height="55" /></a>I went cold-turkey on Windows at the beginning of this year, installing <a href="http://www.ubuntu.com/">Ubuntu</a> on my work station. It&#8217;s been frustrating at times, when you have to learn to do tasks again but you soon adapt. One thing I have missed though are some Windows tools and applications. Most have been replaced, even PhotoShop has a near perfect copy in <a href="http://www.kanzelsberger.com/pixel/?page_id=4">Pixel Image Editor</a> but no one has bothered to port the smaller tools, especially niche ones that help in SEO/SEM.</p>
<p><a href="http://www.winehq.org/"><img class="float-right" title="winehq_top_logo" src="http://nickwilsdon.com/wp-content/uploads/2008/08/winehq_top_logo.png" alt="winehq top logo GSiteCrawler Now On Ubuntu" width="209" height="99" /></a>So as part of my contribution to the Linux community, I&#8217;ve been trying to help in the <a href="http://www.winehq.org/">Wine project</a>. This software attempts to allow you to run Windows programs in a Linux environment, and is completely free. The list of applications they have <a href="http://appdb.winehq.org/">working is impressive</a> and the work of the Wine community has definitely enabled many people to cross to Linux.</p>
<p>One of the first tools I have managed to <a href="http://appdb.winehq.org/objectManager.php?sClass=application&amp;iId=6383">get working</a> there, is <a href="http://gsitecrawler.com/">GSiteCrawler</a>. This program will scan your site and automatically create and upload an XML sitemap for Google or Yahoo! Once uploaded, their location can be added to the respective webmaster panels.</p>
<h3>How to Install GSiteCrawler for Ubuntu</h3>
<p><em>If you already have Wine installed then jump to point 4.</em></p>
<p><strong>1.</strong> Add the Wine respository to your system:<br />
<code># sudo wget http://wine.budgetdedicated.com/apt/sources.list.d/hardy.list -O /etc/apt/sources.list.d/winehq.list</code></p>
<p><strong>2.</strong> Update your system package information:<br />
<code># sudo apt-get update</code></p>
<p><strong>3.</strong> You can now install Wine by <a href="apt://wine">clicking this link</a>. Alternatively, you can install by going to Applications-&gt;Add/Remove and searching for Wine.</p>
<p><strong>4.</strong> Install the following at the terminal:<br />
<code># wget  kegel.com/wine/winetricks  &amp;&amp; sh winetricks jet40</code></p>
<p><strong>5.</strong> Download the full installation copy of <a href="http://gsitecrawler.com/en/download/">GSiteCrawler from here</a> and save to your desktop.</p>
<p><strong>6.</strong> Right click the program install file and select &#8220;Open with Wine Windows Program loader&#8221;.</p>
<p>You&#8217;re done. Of course, any of my Windows readers can also <a href="http://gsitecrawler.com/en/download/">download GSiteCrawler</a> and install this handy application. I&#8217;m building a list of Ubuntu-friendly SEO/SEM tools as a reference for online marketers who have crossed over or are thinking about the switch. Feel free to list any must-have Windows tools in the comments and I&#8217;ll add them to my list for testing.</p>

<!-- start wp-tags-to-technorati 1.01 -->

<p class='technorati-tags'>Technorati Tags: <a class='technorati-link' href='http://technorati.com/tag/Google' rel='tag' target='_self'>Google</a>, <a class='technorati-link' href='http://technorati.com/tag/GsiteCrawler' rel='tag' target='_self'>GsiteCrawler</a>, <a class='technorati-link' href='http://technorati.com/tag/JohnMu' rel='tag' target='_self'>JohnMu</a>, <a class='technorati-link' href='http://technorati.com/tag/Sitemaps' rel='tag' target='_self'>Sitemaps</a>, <a class='technorati-link' href='http://technorati.com/tag/Ubuntu' rel='tag' target='_self'>Ubuntu</a>, <a class='technorati-link' href='http://technorati.com/tag/Wine' rel='tag' target='_self'>Wine</a></p>

<!-- end wp-tags-to-technorati -->
<img src="http://nickwilsdon.com/?ak_action=api_record_view&id=298&type=feed" alt=" GSiteCrawler Now On Ubuntu"  title="GSiteCrawler Now On Ubuntu" />]]></content:encoded>
			<wfw:commentRss>http://nickwilsdon.com/gsitecrawler-now-on-ubuntu/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>
