Thursday, July 10, 2014

Sitecore Rant: Lucene Index of System Fields

Lucene Index of System Fields

On a current project, there is a need to index the system language fields.  By default, there is an index that takes care of all items in "/sitecore/system" called "system_index".  This contains everything.  I don't want that.  I just want a list of all the languages so I created a new index and added to the "Sitecore.ContentSearch.Lucene.Indexes.Sharded.Master.config" file.  Simple enough.  I also have to define the fields that I want to have indexed.  To do that we open the "Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config" file and define the fields that we want to index.




Notice that we have two kinds of fields in this list.  We have the default fields on top and three custom fields at the bottom.  For all custom fields that are added post-install, the field names are as is.  For most system fields, there are usually underscores either at the beginning of the name or in the middle.  Sometimes there are two leading underscores.  This is used to indicate system fields that are part of the standard template. The field names for "code_page", "regional_iso_code" and "worldlingo_language_identifier" are default system fields and would make sense to have underscores connecting the words.  Also, they are obtained from the section of excluded fields down below:



Note: Keep in mind that unless you remove the entry from the list of excluded fields, the field will not get indexed even if you tell it to in the previous screenshot.

The regional iso code is a good example to explain my rant.  This basically says to the indexer to ignore this field with the ID.  If you inspect the template for the system language, you can indeed see the ID does match up with the field so you would naturally believe the field name is also correct.  WRONG!  If you run the indexer, the value for this field will always be null because there is no such field with this name.  After debugging and looping through all the fields using item.Fields, we can see that the field name is simply "Regional Iso Code" and not "Regional_iso_code". 

Why would Sitecore decide to use this kind of notation for this field and others while some default fields look like this?



I know that there cannot be spaces in the html element tags but if you use underscores for some fields and camel case for others, it would lead us to believe that the underscores are deliberate and indicate the real field names.  If not, then why not just use camel case for all fields definitions in the excluded list?

Also, there is no way to know what the field names are unless you iterate through item.Fields for all the field names.  If you try looking at the config file above, you will be misguided very often, at least in the section of excluded fields.  The correct index entry for all the fields would be:

No comments:

Post a Comment