Update: Get SitemapCFC at Google Code or at RIAForge.
If you're not familiar with Sitemaps, visit the sitemaps.org home page, which provides an overview of the Sitemaps protocol (adopted by Google, Yahoo! and Microsoft, among others).
I wanted to generate a sitemap.xml file to submit to Google, Yahoo!, etc. based on data from a simple CMS application's database. I ran some quick searches and was surprised to not quickly find a CFC that did exactly what I was looking for. There are a few sitemap generators out there that crawl site links to produce a sitemap XML file, but I didn't find any that generated an XML file based on data. I know there are a number of applications that have built-in site generator support (like BlogCFC and probably most modern blog and CMS apps), but I didn't find anything generic and flexible enough for my needs. Ray Camden did share a UDF that handles the basics very nicely, but I wanted to be able to pass in different URL collection types with flexible key/column names. I'd already cooked up two different (albeit simple) application-specific sitemap generators for apps that I maintain, so it was time to genericize and reuse!
I'll outline the Sitemap.cfc I created, its features and some examples. I will update this post with a link to RIAForge once I have the project approved for upload there. If you're not looking for a data-driven sitemap generator, but rather a crawler or spider style sitemap generator, then check out this "Google Sitemap XML Generator". For a data driven sitemap generator (or, if you want to use your own crawler and just need to model and generate a valid sitemap.xml file), read on...
I quickly put together a relatively simple Sitemap.cfc to suit my needs, but then I found myself adding more and more little enhancements. Since the sitemaps.org protocol is relatively simple, it wasn't too difficult to create a CFC to model the protocol. I tried to keep it simple, but flexible enough to take a collection of URLs (and relevant meta data) in just about any form and spit out a valid sitemap.xml file.
SitemapCFC Feature Overview
- Use a list, query or array (of structs) to initialize a sitemap object.
- Query column or struct key names used to initialize a Sitemap.cfc object are not important; an optional init() argument can be used to map to standard sitemaps.org protocol tag names.
- Write your sitemap.xml file to disk or dynamically send a sitemap XML document to the browser as binary page output (cfcontent type text/xml).
- Debugging methods available to access a Sitemap.cfc object's URL collection in the form of an array, an XML object or the raw XML string.
- XML document is schema validation ready.
- All initialization data values are cleaned (entity escaping, date/time format, valid string values, etc.) and validated.
- Date(/time) values for the <lastmod> tags can be passed in as any valid date/time string or object; they will be automatically converted to UTC in proper W3C Datetime format (again, per sitemaps.org protocol).
Read on for an [] of examples...
[More]