All about Magento E-commerce Store.......MagentoForum: Generating Google Sitemaps in Magento

Friday, September 16, 2011

Generating Google Sitemaps in Magento


Magento comes bundled with the ability to generate a Google sitemap. Google sitemaps are XML files that tell Google's webmaster tools where your site's content is. There's debate in the professional webmaster community as to how a Google sitemap will or won't affect your search engine listings, but chances are someone's going to ask you to generate one, and this article will get you sorted.
Magento's Google sitemap implementation also offers a simple example of the Magento Inc. approach to object oriented programming.
To use the Google Sitemap feature, you'll need to first tell Magento you want a site map. This is done via the
Catalog -> Google Sitemap  
menu in the Admin Console.
This UI will let you tell Magento you want a Google sitemap. Click on the Add Sitemap button, and enter a filename, a file path, and select a store view.
Magento allows/requires you to setup an individual sitemap for each store in your system. The file path and filename are combined to create a path from the root of your installation. This path must be writable by the file system.
Click on Save & Generate and Magento will save your sitemap configuration, as well as generate a file at the location you specified above.

Automatic Sitemap Generation

If you've setup the Magento maintenance cron job on your system, you can also have Magento generate a sitemap for you on a regular basis. In the Admin Console, browse to
Systme -> Configuration -> Catalog: Google Sitemap -> Generation Settings 
These system config setting will allow to you configure how often a Magento sitemap is created. The rest of this article is going to dive into some code, but if your do a lot of sitemaps submitting you should check out Ashley Schroder's Sitemap Submit extension.

Into the Code

The sitemap generation code is a good introduction to Magento's object oriented philosophy. You've probably seen a lot of one-off sitemap generation scripts that came about something like
  1. OK, we need a sitemap, so we better make a shell script to generate one
  2. First that shell script needs to read a bunch of information from somewhere to figure out what URLs are active in a Magento site.
  3. Next that shell script needs to generate a bunch of information in XML format based on the information I read above in step two
  4. Finally, I need to write that shell script out to the file system somewhere
At first, there's nothing wrong with the above. It will do the job and generate a sitemap. The problems with this non-object-oriented approach comes later, when someone else wants to generate a Google sitemap, or new types of pages are added that the original script missed. All there is to work with is this one off shell script. Maybe they can require_once it into their project, but all the logic of what a site means (step two) is "trapped" in global level variables and (maybe) functions. It becomes difficult for anyone but the person who wrote that script to use any of that code elsewhere.
Magento's approach to this (and every other problem) is different. Whenever a Magento programmer approach a problem, they break the problem entities out into Domain Model objects. With the example of Google sitemaps, that means a single Model object to represent a sitemap
Mage_Core_Model_Sitemap 
The reason this object is a model isn't because it's reading/writing information to/from a database. It's a model because it models the problem domain of Google sitemaps.
For the developer responsible for creating the sitemap cron job, they don't need to know anything about Google sitemaps. All they need to know is which method to call on the sitemap. Consider the Magento cron job code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#File: app/code/core/Mage/Sitemap/Model/Observer.php
public function scheduledGenerateSitemaps($schedule)
{
    $errors = array();
    if (!Mage::getStoreConfigFlag(self::XML_PATH_GENERATION_ENABLED)) {
        return;
    }
    $collection = Mage::getModel('sitemap/sitemap')->getCollection();
    foreach ($collection as $sitemap) {
        try {
            $sitemap->generateXml();
        }
        catch (Exception $e) {
            $errors[] = $e->getMessage();
        }
    }
    //...
}  
This code gets a collection of all sitemap objects
1
$collection = Mage::getModel('sitemap/sitemap')->getCollection();
and then iterates over them calling the generateXml method.
1
2
3
4
5
6
7
8
foreach ($collection as $sitemap) {
    try {
        $sitemap->generateXml();
    }
    catch (Exception $e) {
        $errors[] = $e->getMessage();
    }
}  
That's it. The cron job writer doesn't need to know anything about a Google sitemap. All the programmer needs to know is what method to call.
On the other site of the contract, the sitemap object implements sitemap generation in the generateXmlmethod.
1
2
3
4
5
app/code/core/Mage/Sitemap/Model/Sitemap.php
public function generateXml()
{
    ...
}
This method may change, but as long as it fulfills its contract, the cron job writer never needs to change the cron job code. It can continue to run unmolested.

Long Term vs. Short Term

That's a key benefit of Magento's object oriented system. By structuring code into accepted design patterns it becomes easier to split a large project into different components. The person working on the cron job system doesn't need to worry about the person working on the sitemap system is doing, and neither of them needs to worry about what the person implementing the promotional engine is doing. Then, when it comes times to integrate everyone's code, the path forward is
  1. Look at the domain objects that have been implemented
  2. Learn what the contracts are for any particular method 

When you're stuck on a particular gnarly bit of Magento code, think like you're a part of their team. Accept the only unifying thing between Magento's various sub-systems is the use of domain model objects which perform actions in controllers, and are read from in blocks to generate output. Look at the Model objects used in the system you're investigating, investigate what their implied contracts are, and go forward from there.
You will never understand what every sub-system in Magento does, but if you learn its core architectural principles, you'll be able to quickly zero in on whatever bit of functionality you need to understand and/or modify. That's what separates a good Magento developer from a bad one: being able to use the system as a black box, but knowing how the black box is put together when it won't do what you want

No comments:

Post a Comment