Friday, October 28, 2016

Coveo Indexes and Search Errors in Sitecore

Background

If anyone has worked with Coveo for Sitecore, you would know that Coveo will create new indexes for the master, core, and web indexes along with any other custom indexes you have defined in your configuration files.  It is very easy to define new Coveo indexes and is almost, if not exactly, the same as Lucene indexes.  What you may not know is that there are some hidden problems that could arise from using your own custom Coveo indexes.


Problem

Defining a Coveo or Lucene index is simple.  Drop in an <index> element in //sitecore/contentSearch/configuration/indexes in the XML config files.  Give it an id, assign a crawler (most likely the default SitecoreItemCrawler), specify the content database, and define the root, which is where the crawler begins to crawl.  The crawler will crawl the root and all its child items but not its parents.  By default, all Coveo index definitions have the root listed as "/sitecore", the root item and therefore everything is crawled and indexed.  But what if we don't care about all the noise?  We may not care for templates and layout items.  What if we just want to get all content items pertaining to the website?  You would set the root as something like "/sitecore/content/home" and this would yield the desired results.  Your index has shrunk considerable and only page-level website content items will be included. 

Now the problem arises when you try to perform a search on the Sitecore admin UI.  There are two places to search.  The search bar on top of the content tree in the Content Editor and the search bar in the Windows-style desktop taskbar.  Searching in the Content Editor yields desired results but searching in the taskbar will yield an undesirable error.  You will not get any results even though the index is fully functional.  Investigating the log files will reveal that there are no indexes that contain the root "/sitecore" content item.  Apparently, the taskbar search looks through all your index collections and at least one index has to contain the root item.  But adding the root item would entail adding all the noise back in to all your indexes, which is not an acceptable solution.


Solution

A simple solution in this scenario is to create your own crawler.  The crawler can be derived from the default SitecoreItemCrawler.  This crawler should function the same way, for the most part.  It also has an IsExcludedFromIndex boolean method that you could override to determine what is excluded.  By doing this we can use the exact same XML structure to define the content database and root item.  This is how the crawler would look like:


public class SingleItemCrawler : SitecoreItemCrawler
{

   protected override bool IsExcludedFromIndex(SitecoreIndexableItem indexable, bool checkLocation = false)
   {
      return indexable.AbsolutePath != RootItem.Paths.ContentPath ? true : base.IsExcludedFromIndex(indexable);
   }


}


A one line method solved the problem.  The overriden IsExcludedFromIndex method basically checks the root item from the XML and compares it to the item being crawled and if they don't match then exclude it.  This ensures the result set is a single item.  Use this single item crawler in conjunction with the default crawler for the page-level content items and you will get a resulting index with all the desired page-level items and the required root item in the content tree.