The current development version of Dokuwiki already has integrated support for creating a Google sitemap, but there was no way to create one using the current stable version. Thus I wrote my own small PHP script (googlesitemap.php) to generate a Google XML sitemap from the pages .txt files found recursively scanning the content directory (data/pages) of Dokuwiki. Here it is all the code:
<?php $baseurl = 'http://www.flagar.com/'; $mainpath = 'data/pages'; function getPaths($path, $ext=""){ if($ext==""){ return glob($path."/*"); }else{ return glob($path."/*.".$ext); } } function recScan($mainpath){ foreach(getPaths($mainpath) as $path){ if(is_dir($path)){ recScan($path); }elseif(substr($path, -4)=='.txt'){ $page = str_replace($GLOBALS['mainpath'].'/', '', substr($path, 0, (strlen($path)-4))); print("<url><loc>".$GLOBALS['baseurl'].strtolower($page)."</loc><lastmod>".date("Y-m-d\TH:i:s+01:00", filemtime($path))."</lastmod></url>"); } } } header("Content-Type: text/xml; charset: UTF-8"); print("<?xml version=\"1.0\" encoding=\"UTF-8\"?>"); print("<urlset xmlns=\"http://www.google.com/schemas/sitemap/0.84\">"); recScan($mainpath); print("</urlset>"); ?>
It requires at least PHP 4.3.3 (for the glob function).
Just using the script doesn’t let to access the statistics from Google. It will return an error about receiving the HTTP 200 OK response instead of an expected 404 Not Found error. There will be also an error in reading the robots.txt in an obviously incomprehensible format for search engines (HTML actually). To solve all this, it’s enough to add the following code:
if($INFO['exists']!=true): header("HTTP/1.0 404 Not Found"); endif; // for Google sitemap stats if($ID=='robots.txt'): exit(); endif; // for robots.txt search engine requests
before this lines in the inc/actions.php file:
header('Content-Type: text/html; charset=utf-8'); include(template('main.php'));
To add files to download