Friday, April 11, 2014

Sitecore: Custom Item Crawler

Background

An Item Crawler crawls the child nodes within a given parent node and determines if any of the items are fit to be added to a Lucene index.  Let's start by looking at a configuration file that defines all the indexes in the master database:

Sitecore.ContentSearch.Lucene.Indexes.Sharded.Master.config

All the indexes defined in this file used the standard crawler:



This is the built-in crawler for general purposes.  It includes methods like "IsExcludedFromIndex" to determine if an item should be added to the index or not.


Requirements

Not let's say you have a "/sitecore/content" node that has lots and lots of child nodes, in the hundreds.  This is a likely scenario for single-instance, multi-site solutions for companies that have hundreds of sub-brands.  Now, if we index the "/sitecore/content" node, the resulting index could contain a lot of items that we don't care about.  The standard crawler would index all the nodes and generate a huge index.  What if we only want to index items that are based on a specific template?  We would have to built a custom crawler to do that.  The resulting crawler would be something like this:



Step-by-Step

Let's create a new class that inherits from the standard crawler:



If we inspect the code for the standard crawler, everything is fine except for the "IsExcludedFromIndex" method.  We can override this method to do what we want and to exclude items based on certain templates here.




Summary

This again illustrates how flexible Sitecore is.  If you don't like the out-of-the-box behavior, modify it to suit your needs.  Modify the config file a little, create a new class that inherits from an existing class and you are good to go.

No comments:

Post a Comment