Skip to main content

Search & Indexing

You now have pages, assets, and Content Fragments. The moment you build a navigation, a "related articles" list, or a search page, you run a query -- and a query that is not backed by an index can quietly bring an instance to its knees. This chapter is the beginner's view; the Search & Indexing reference goes deeper.

Ways to query

APILooks likeUse it for
QueryBuilderA map of predicatesMost application code -- readable, composable, paginated
JCR-SQL2SQL-ish textComplex conditions, scripts, the Groovy Console
XPathXPath 2.0Legacy; avoid in new code

All three run on the same Oak query engine and use the same indexes. QueryBuilder is the default in AEM application code:

Map<String, String> params = new HashMap<>();
params.put("path", "/content/mysite");
params.put("type", "cq:Page");
params.put("property", "jcr:content/cq:template");
params.put("property.value", "/conf/mysite/settings/wcm/templates/article");
params.put("orderby", "@jcr:content/cq:lastModified");
params.put("orderby.sort", "desc");
params.put("p.limit", "10");

SearchResult result = queryBuilder
.createQuery(PredicateGroup.create(params), resourceResolver.adaptTo(Session.class))
.getResult();

The equivalent JCR-SQL2 (see The JCR & Sling for syntax):

SELECT * FROM [cq:Page] AS page
WHERE ISDESCENDANTNODE(page, '/content/mysite')
AND page.[jcr:content/cq:template] = '/conf/mysite/settings/wcm/templates/article'
ORDER BY page.[jcr:content/cq:lastModified] DESC

Why unindexed queries are dangerous

Oak does not index every property. If no index matches your query, Oak traverses the content tree node by node and logs:

Traversed 10000 nodes ... consider creating an index or changing the query

On a large repository this is slow on author and can take down publish under traffic. Treat the warning as a bug to fix, not noise.

:::danger Never ship a traversal to production A query that traverses thousands of nodes per request will not survive real traffic. Always confirm a query is index-backed before you ship it. :::

What an index is

An Oak index is a definition under /oak:index that tells the query engine how to look up nodes by certain properties without scanning the tree. The two you will use:

  • Property index -- fast exact-match on one or few properties.
  • Lucene index -- the standard custom index in AEM; supports exact match, sorting, and full-text.

A simple property index

ui.apps/.../jcr_root/_oak_index/mysiteTemplate/.content.xml
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:oak="http://jackrabbit.apache.org/oak/ns/1.0"
jcr:primaryType="oak:QueryIndexDefinition"
type="property"
propertyNames="[cq:template]"
reindex="{Boolean}true"/>

A Lucene index (property + sort)

ui.apps/.../jcr_root/_oak_index/mysiteArticle-custom-1/.content.xml
<jcr:root xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
xmlns:oak="http://jackrabbit.apache.org/oak/ns/1.0"
jcr:primaryType="oak:QueryIndexDefinition"
type="lucene"
async="[async]"
compatVersion="{Long}2"
evaluatePathRestrictions="{Boolean}true">
<indexRules jcr:primaryType="nt:unstructured">
<cq:Page jcr:primaryType="nt:unstructured">
<properties jcr:primaryType="nt:unstructured">
<template jcr:primaryType="nt:unstructured"
name="jcr:content/cq:template"
propertyIndex="{Boolean}true"/>
<lastModified jcr:primaryType="nt:unstructured"
name="jcr:content/cq:lastModified"
ordered="{Boolean}true"
type="Date"/>
</properties>
</cq:Page>
</indexRules>
</jcr:root>

Commit index definitions to Git under ui.apps and deploy them like any other code -- never hand-create them in CRXDE on a real environment (they will drift and be lost on deploy). On AEM as a Cloud Service, custom indexes must follow the <name>-custom-<N> naming convention.

Check your query with Explain Query

Before shipping, run the query through the Explain Query tool to confirm it uses an index and is not traversing:

/libs/granite/operations/content/diagnosistools/queryExplainTool.html

Paste your query; the tool reports which index is used (or warns about a traversal) and the estimated cost.

Best practices

  • Constrain by path and type -- the tightest scope is the cheapest query.
  • Index for the exact filter + sort your code runs, then verify with Explain Query.
  • Paginate with p.limit; never load an unbounded result set.
  • Reindex off-peak -- it is I/O heavy.
  • Prefer one well-designed Lucene index over many tiny property indexes.

Summary

You learned:

  • The three query options (QueryBuilder, JCR-SQL2, XPath) and when to use each
  • Why an unindexed query traverses the repository and is dangerous in production
  • The difference between a property index and a Lucene index, with deployable examples
  • How to verify a query with the Explain Query tool
  • Indexing best practices and the AEMaaCS naming convention

Official Documentation

Next up: Multi-Site Manager & i18n - Blueprints, Live Copies, language copies, and the translation framework.