All about Magento E-commerce Store.......MagentoForum: Magento’s Many 404 Pages

Friday, September 16, 2011

Magento’s Many 404 Pages

The 404 page has a long and illustrious history in the world of web development. What started as a simple, unfriendly error message has turned into a key part of any site's experience, and any retail outlet's conversion rate. Like many other PHP frameworks, Magento faces the challenge of providing a unified 404 experience. Also like many other PHP frameworks, Magento has punted that responsibility onto the end user-developer of the system. In this article we'll explore the various ways that the Magento cart application generates 404 pages, which will allow you to make educated choices when building your 404 experience.

Before we begin though, a quick history lesson and HTTP primer is in order.

Some HTTP Background

If you have curl installed on your system, try running the following command

 curl -I http://example.com 

Assuming your computer is connected to the internet, you should see output something like this

 HTTP/1.0 302 Found Location: http://www.iana.org/domains/example/ Server: BigIP Connection: Keep-Alive Content-Length: 0 

Here's another one

 curl -I http://www.iana.org/domains/example/ 

with results something like

 HTTP/1.1 200 OK Date: Fri, 22 Apr 2011 13:15:13 GMT Server: Apache/2.2.3 (CentOS) Last-Modified: Wed, 09 Feb 2011 17:13:15 GMT Content-Length: 2945 Connection: close Content-Type: text/html; charset=UTF-8 

The command line curl program allows you to download files over http via the shell. The -I option tells curl that we only want the HTTP headers returned to us, not the actual contents of the file. HTTP headers are the information that a web server sends to the client about the request. While it's good to understand what every line means, it's the first line of each response that we're interested in

 HTTP/1.0 302 Found HTTP/1.1 200 OK 

These are HTTP status codes. HTTP stands for Hyper Text Transfer Protocol, and is the common language of the web. It defines how a computer or software application should act when it receives or requests information. The response code can be broken into two parts. The first is the HTTP version being used (HTTP/1.0,HTTP/1.1), and the second is the status itself (302 Found200 OK).

This status attempts to describe the type of response from the server. For example, a code of 200 means everything went as expected (OK). A code of 302 tells the client/browser that a resource has been moved to a different URL. This may seem like over engineered nerdy fluff, but it's actually important.

When a browser receives a status code of 200, it knows to expect a document after the headers, and that it should attempt to render the document, or in the case of supporting files (images, CSS, Javascript), apply the contents of those files to the main HTML document in a way that makes sense (display the image, apply the CSS, run the Javascript). However, when a browser receives a status code of 302, it knows to look for a companionLocation header, and then automatically make another request for the URL it finds there.

That's the first reason status codes are important. They tell the browser what to do with a particular request. Status codes also allow other kinds of web clients, particularly web spiders, to infer information about a page/resource based on its status headers. For example, if a URL returns a status of

 301 Moved Permanently 

the spider knows it may safely ignore the previous URL in the future, and start treating the new URL in the Location field as canonical. Google infers a significant amount of information about your site based on its headers, which is why their webmaster tools are geared towards cleaning these up.

Status 404

This brings us, finally, to the topic at hand. Give the following request a try

 curl -I http://www.iana.org/domains/example/notthere.html 

You should get a response something like

 HTTP/1.1 404 NOT FOUND Date: Fri, 22 Apr 2011 14:02:26 GMT Server: Apache/2.2.3 (CentOS) Connection: close Content-Type: text/html; charset=utf-8 

A "404 page" gets it's name from the HTTP status code for file not found. Back in the day, the original web servers were designed to share documents. The 404 status code was originally intended to tell a browser that the file they were looking for was not available. The HTTP specification is silent on how a browser should handle 404 responses. Early web servers included a brief HTML document along with the not found status

 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL /testing was not found on this server.</p> </body></html> 

and most browsers chose to display whatever HTML was returned via a 404 document. This seemingly innocuous choice had an interesting effect on web development and internet culture.

Webmasters of that bygone era quickly realized that the standard 404 page provided an awful user experience for their visitors, and they started customizing the HTML output on a per-site basis so that a more useful page was returned. From a user experience point of view this allowed the end-user-visitor to continue navigating on the site despite the fact the page they were looking for wasn't there. From an engineering point of view this created a weird situation where you needed to return a document even though the document wasn't found. The 404 code went from being a simple status to becoming an integral part of any website's design.

The interesting bit is, if a modern browser encounters a default 404 page (such as the one above), instead of displaying the page it will display a custom error message

If the original web browsers/web-culture had chosen to implement things this way, the entire idea of a 404 page may have never existed.

404 in the MVC Era

Modern PHP web development and 404's present a problem that needs to be solved. Out of the box most web servers (Apache, etc.) handle 404 pages themselves. Early PHP web applications relied on a server's 404 mechanism handling the file not found responses. If the URL was for a file that existed, PHP would process the request. If the user requested a PHP page that didn't exist, Apache would send back its configured 404 document, and the request would never get to the PHP processing portion.

However, as you're likely aware, most modern PHP MVC systems route all requests through a single PHP file.

 http://example.com/index.php/some/uri/path http://example.com/some/uri/path 

The code in index.php is then responsible for bootstrapping the system, and handing off control to a PHP controller class. The problem this creates is with PHP handling the request, the web server (Apache) can no longer handle 404's. As far as the web server is concerned, if the request mapped to a PHP file, that's a 200 OK. This means if a user enters an invalid route, it's the responsibility of the PHP framework to

  1. Send back HTML for a 404 page
  2. Send back the proper HTTP 404 header <br/><br/>

Framework authors need to be careful and provide a centralized 404 mechanism, or else they may end up with multiple sources for 404 page content. Also, and very commonly missed, is sending the proper 404 header. If your PHP page is returning a status 200 header Google ends up indexing every file-not-found page as an actual page, meaning you may have an infinite number of identical pages in your google results, which will negatively impact your search rankings.

Magento gets the status code right. However, it falls prey to the problem most PHP frameworks do, in that there are multiple ways a 404 page is created and rendered. Let's take a look at those now.

Magento 404 Pages

If we take a look at the rewrite rule (in .htaccess) that captures and redirect's requests into Magento's bootstrap file

 ############################################ ## always send 404 on missing files in these folders      RewriteCond %{REQUEST_URI} !^/(media|skin|js)/  ############################################ ## never rewrite for existing files, directories and links      RewriteCond %{REQUEST_FILENAME} !-f      RewriteCond %{REQUEST_FILENAME} !-d      RewriteCond %{REQUEST_FILENAME} !-l   ############################################ ## rewrite everything else to index.php      RewriteRule .* index.php [L] 

we can see that the line that does the capturing is

 RewriteRule .* index.php [L] 

However, it's preceeded by by four RewriteCond statments. These statements provide rules that will allow certain requests to skip the bootstrapping process. For example, these three

 RewriteCond %{REQUEST_FILENAME} !-f  RewriteCond %{REQUEST_FILENAME} !-d  RewriteCond %{REQUEST_FILENAME} !-l  

say only apply this rule if a file (-f), directory (-d) or link (-l) do not exist for the request. This allows Apache to serve out existing static files without incurring the performance cost of Magento's bootstrapping. The primary reason this rule is here is to allow the serving of CSS, Javascript and images from any folder in the system without additional special cases. You can also use the presence of these rules to implement a simple static cache. If you have a URL like this

 http://magento.example.com/some/controller/route 

and created a static HTML file at the following location

 /path/to/wwwroot/some/controller/route/index.html 

Apache would serve out the index.html file instead of handing control over to Magento.

Of particular interest to us is the first rule

 RewriteCond %{REQUEST_URI} !^/(media|skin|js)/ 

This one says if the request URL starts with media, skin, or js, then Apache should handle the request. This means requests for files that don't exist with URLs that look like the following

 http://magento.example.com/media/file.jpg http://magento.example.com/skin/base/badstyle.css http://magento.example.com/js/another-file-that-is-not-there.js 

will use the web server's configured 404 page. This means if you want to ensure all 404 pages have the same experience, you still need to configure a custom 404 page via your web server.

That's the first 404 page you need to be aware of in a Magento system.

Magento's Outer Shell

Magento's index.php bootstrap is relatively simple. A few environmental variables are set and checked, and then the following static method is called

?
1
Mage::run($mageRunCode, $mageRunType);

The run method is on the Mage class located in app/Mage.php. On the surface this run method is relatively simple.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public static function run($code = '', $type = 'store', $options=array())
{
    try {
        Varien_Profiler::start('mage');
        self::setRoot();
        self::$_app = new Mage_Core_Model_App();
        self::$_events = new Varien_Event_Collection();
        self::$_config = new Mage_Core_Model_Config();
        self::$_app->run(array(
            'scope_code' => $code,
            'scope_type' => $type,
            'options'    => $options,
        ));           
        Varien_Profiler::stop('mage');
    //...  
}

Outside of the profiler lines, all that's involved in starting up a Magento system is five lines of code

  1. First, the root file path for the application is stored for later retrieval and path creation (self::setRoot();)

  2. Then, an "application" domain model object is instantiated (self::$_app = new Mage_Core_Model_App())

  3. Then, an event collection is instantiated (self::$_events = new Varien_Event_Collection(); )

  4. Then, a configuration object is instantiated (self::$_config = new Mage_Core_Model_Config();)

  5. Finally, the run method of the application domain model object is called self::$_app->run(... <br/><br/>

Each of the objects instantiated here gets assigned as a static property of the Mage class, and will be referenced later during the processing of the request. You'll notice this entire bit of code is enclosed in a try block. Let's take a look at the exception catching to see what happens if an exception bubbles up to this top layer

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
...
    Varien_Profiler::stop('mage');
 
} catch (Mage_Core_Model_Session_Exception $e) {
    header('Location: ' . self::getBaseUrl());
    die();
} catch (Mage_Core_Model_Store_Exception $e) {
    require_once(self::getBaseDir() . DS . 'errors' . DS . '404.php');
    die();
} catch (Exception $e) {
    if (self::isInstalled() || self::$_isDownloader) {
        self::printException($e);
        exit();
    }
    try {
        self::dispatchEvent('mage_run_exception', array('exception' => $e));
        if (!headers_sent()) {
            header('Location:' . self::getUrl('install'));
        } else {
            self::printException($e);
        }
    } catch (Exception $ne) {
        self::printException($ne, $e->getMessage());
    }
}

Here we can see there's three catch blocks. First Magento looks for its custom exceptions (Mage_Core_Model_Session_ExceptionMage_Core_Model_Store_Exception), and then the last block is a catch-all for any other exception type. The session and generic exception blocks are worth exploring, but that's for another article. It's the store exception we're interested in.

?
1
2
3
4
} catch (Mage_Core_Model_Store_Exception $e) {
    require_once(self::getBaseDir() . DS . 'errors' . DS . '404.php');
    die();
}

If a Mage_Core_Model_Store_Exception is thrown anywhere in the system and is uncaught, Magento will catch it up here. When a store exception is caught, Magento will require in the following file.

 errors/404.php 

This is second Magento 404 handler. It handles page not found states for requests that don't quite make it to the controller dispatch stage. Let's take a look at what's going on in 404.php

Error Proceesor

If you take a look at 404.php you'll see the following code.

?
1
2
3
require_once 'processor.php';  
$processor = new Error_Processor();
$processor->process404();

This code bootstraps a mini error processing system inside Magento. (If you've spent anytime with Magento you'll find that it's the Mandelbrot set of software systems). The end result of process404 being called is the rendering of the following phtml template

 errors/default/page.phtml 

In turn, this phtml template will include the following inner-template

 errors/default/404.phtml 

If you had called $processor->process503(); then 503.phtml would have been rendered instead, withpage.phtml remaining the outer template. If you're interested in tracing how this happens, then checkout the definition of the Error_Processor class in

 errors/processor.php 

Customizing the Store Exception 404 Page

Chance are you're going to want to customize this 404 page. You could just edit edit page.phtml and404.phtml with your desired style and content. However, like any Magento core hack, you run the risk of your changes being overritten during an upgrade, and the general scorn of the Magento development community.

Fortunatly, Magento provides a mechanism for creating a custom skin folder for your error pages. Take a look at the following file

 errors/local.xml.sample 

This is a sample error configuration override file. If you rename it to

 errors/local.xml 

the Error_Processing class will load this file and use its values rather than use the defaults hard coded in the class, (for legacy reasons Magento will also look for a design.xml file). Take a look at the skin node in this file

 <config>     <skin>default</skin>     <!-- ... --> </config> 

This is the value that controls which folder the Error_Processor object looks for it's phtml files in. Let's change that to something like

 <config>     <skin>our_custom_skin</skin>     <!-- ... --> </config> 

Error skin names must me comprised of letters, numbers, and the underscore character. A folder created with any other characters will be ignored.

To test our custom 404 we'll need to trigger a Magento store exception. The simplest way to do that istemporarily add one to the run method in Mage.php

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#File: app/Mage.php
public static function run($code = '', $type = 'store', $options=array())
{
    try {
        Varien_Profiler::start('mage');
        self::setRoot();
        self::$_app = new Mage_Core_Model_App();
        self::$_events = new Varien_Event_Collection();
        self::$_config = new Mage_Core_Model_Config();
 
        #our new exception            
        throw new Mage_Core_Model_Store_Exception('');
 
        self::$_app->run(array(
            'scope_code' => $code,
            'scope_type' => $type,
            'options'    => $options,
        ));           

If you reload your development environment with the above in place, you'll see a 404 page something like

Creating the Custom Skin

When we jiggered our system to throw that Exception, Magento ignored the custom value in the <skin/> node because it didn't find a errors/our_custom_skin folder. Let's change that now. Copy the existingerrors/default to create a new errors/our_custom_skin

 cp -r errors/default errors/our_custom_skin 

and then let's edit the text in our_custom_skin/404.phtml. Replace the following file with the following content.

 #File: errors/our_custom_skin/404.phtml <div id="main" class="col-main"> <!-- [start] content -->     <div class="page-title">         <h1>404 error: Page not found.</h1>         <p>             <em>we're sorry that you / had to see this four o four / it is what it is</em>         </p>      </div> <!-- [end] content --> </div> 

Reload the page and you should now see your own custom 404 page.

Important: The entire default folder will need to be copied over to make this work. There's isn't a robust "look in my custom folder, then look in default" fallback system in place as there are in other parts of the Magento system.

Before we continue. you'll want to restore app/Mage.php by removing the custom exception we dropped into place.

?
1
2
3
4
5
6
7
8
#our new exception            
##throw new Mage_Core_Model_Store_Exception('');
 
self::$_app->run(array(
    'scope_code' => $code,
    'scope_type' => $type,
    'options'    => $options,
));

No Route 404

So far we've covered two of Magento's 404 errors. The first was the Apache 404 issued when requesting nonexistent files in the media/skin/js folders. The second was the store exception 404. The third, and most common yet most complex is the no route 404. You can see this 404 page by browsing to the following URL

 http://magento.example.com/not/a/file 

In a default install you should see a page that looks something like this

Magento is an MVC system. If you're not sure what that means now might be a good time to review the Magento for PHP MVC Developers series. We'll be here when you get back.

Similar to other web MVC systems, when Magento encounters a URL like /not/a/file, it searches the configuration for a frontName with the name of not. If it finds one, next it will look for a controller in the associated module(s) named something like

?
1
class Packagename_Modulename_AController

if it finds the controller, it will look for an action method in that controller named

?
1
public fileAction()

If any of the above steps fail, Magento will search the database for a CMS page with the identifier

 not/a/file 

If it find a CMS page, that page will be rendered. If none of the above result in a match, Magento will need to create a 404 page to let the user know their resource wasn't found. In a default installation, Magento does this by manually setting the controller on the request object to the CMS Index controller, and the action to use for the request to noRouteAction.

When this is dispatched, the following code runs

?
1
2
3
4
5
6
7
8
9
10
11
#File: app/code/core/Mage/Cms/controllers/IndexController.php
public function noRouteAction($coreRoute = null)
{
    $this->getResponse()->setHeader('HTTP/1.1','404 Not Found');
    $this->getResponse()->setHeader('Status','404 File not found');
 
    $pageId = Mage::getStoreConfig(Mage_Cms_Helper_Page::XML_PATH_NO_ROUTE_PAGE);
    if (!Mage::helper('cms/page')->renderPage($this, $pageId)) {
        $this->_forward('defaultNoRoute');
    }
}

In a default instalation, this code looks for a CMS page named no-route, and if it finds one, the CMS page will be rendered. Magento ships with a default CMS page named no-route, which is the "Whoops, our bad…" page you've probably seen too much of.

If this CMS page has been deleted or renamed, Magento will forward the request on to the defaultNoRoutecontroller action,

?
1
$this->_forward('defaultNoRoute');

which looks like this

?
1
2
3
4
5
6
7
8
9
#File: app/code/core/Mage/Cms/controllers/IndexController.php
public function defaultNoRouteAction()
{
    $this->getResponse()->setHeader('HTTP/1.1','404 Not Found');
    $this->getResponse()->setHeader('Status','404 File not found');
 
    $this->loadLayout();
    $this->renderLayout();
}

resulting in a page like this

Here, Magento is simply setting the correct headers for a 404, and then loading and rendering the layout. This results in a layout handle of cms_index_defaultnoroute being issued, which (again, in a default installation), results in the following Layout Update XML being applied

 <cms_index_defaultnoroute>     <remove name="right"/>     <remove name="left"/>      <reference name="root">         <action method="setTemplate"><template>page/1column.phtml</template></action>     </reference>     <reference name="content">         <block type="core/template" name="default_no_route" template="cms/default/no-route.phtml"/>     </reference> </cms_index_defaultnoroute> 

In layman's terms, this removes the left and right content blocks, sets the root template topage/1column.phtml and then adds a content block that renders the following theme template

 cms/default/no-route.phtml       

If you're getting tripped up on layout concepts, reviewing this article or (better yet!) purchasing No Frills Magento Layout should set you straight.

Customizing No Route 404

You'll notice the above paragraphs were peppered with a phrase something like "in a default install". Out in the wild, there's a huge number of ways the no route page might be customized. If you're working for a variety of clients, or on a team with a number of head strong developers, you'll probably run into some combination of the following. Neither the community or Magento Inc. has much guidance on "the right" way to do this, so your best bet is to be aware of each possible customization point and learn to debug them quickly. Let's take a look.

Default Pages

If open up the Admin Console's system configuration at

 System -> Configuration -> Web -> Default Pages 

you'll see there's several ways you might configure the behavior of the no route 404. Above we mentioned that Magento will attempt to load a page with the CMS identifier of no-route. The page that Magento attempt to load is actually controlled by the CMS No Route Page setting. If you wanted to hand over management of the CMS Page to some folks from marketing, this is your best bet

Using a Different Controller Action

The CMS no route page works because there's code in Magento that will override the controller and action used for a request if no real route to a controller is detected. By default, that's the CMS controller and thenoRouteAction method. However, using the Default No-route URL System Configuration, a system owner can change which controller action is dispatched to a no route state. By default, this value is

 cms/index/noRoute 

The format of this string is

 frontname/controller/action-name 

You might do this if you were creating a custom module to run a significant amount of logic before (or after) displaying the 404 page.

If it wasn't obvious from the above, if you're using a custom controller action for your 404 page, you loose the ability to set a custom CMS page with CMS No Route Page

Controller 404 via Layout XML

The no route 404 page is rendered using the Magento layout xml system. That means its appearance may be customized by Adding custom Layout XML Updates to the handles cms_index_noroute andcms_index_defaultnoroute. This could happen via local.xml, a custom layout XML file, or by editing/replacing one of the existing layout XML files in the design package.

Finally, even if no custom Layout Update XML has been added, it's possible that a new no-route.phtmltemplate has been added to the current theme, or that someone has modified the no-route.phtml template in the base folder.

Wrap UP

A good 404 page is an important part of any website's user experience, and a Magento store is no exception. We've shown you the various places where Magento will detect and render a 404 into its system, as well as shown you the various ways that the experience may have been customized. With these tools in hand, you'll be ready to conquer any 404 related challenges that the fates (or you boss!) throw at you.

No comments:

Post a Comment