Search and Contentful

Hi! I’m looking to add search functionality to my site and I’m wondering what options are available. I’ve heard something about Algolia…

CDA search

I’m assuming you’ve got special needs that cannot be realized with Contentful’s built-in search. For other readers I want to start by mentioning that our Content Delivery API allows you to query your data with a set of predefined mechanisms and filters: https://www.contentful.com/developers/docs/references/content-delivery-api/#/reference/search-parameters.

CDA is read-only and CDA API keys can be safely used in public apps (unless you keep secret, but published entries in your space).

Implementing search

When implementing search functionality for your application or website you have two options to choose from:

  1. Use a Search-as-a-Service product. Algolia is the most prominent representative of this group.
  2. Index your content on your own and issue search queries against this index.

Algolia

For the former solution you could use a community (not officially supported) library for syncing content to Algolia: https://github.com/drublic/contentful-to-algolia. The flow would look as follows:

  1. A webhook configured for publication events calls a “sync” endpoint, located either in your infrastructure or hosted in a serveless provider (with AWS Lambda + API Gateway being the most popular combination)
  2. The “sync” endpoint uses the library to sync your content.
  3. Your application or website uses Algolia’s SDK to query index you’ve synced your entries in (2)

There are two important implementation details to keep in mind:

  1. contentful-to-algolia doesn’t support Contentful’s Sync API, so the syncing process can take a while and generate redundant API calls for both Contentful and Algolia
  2. There is a limit of the record size (10kB per record) in Algolia: https://www.algolia.com/doc/faq/basics/is-there-a-size-limit-for-my-index-records/. It means that sometimes data processing may be necessary:

Indexing on your own

Small/medium apps

For small and medium webapps/websites your can use libraries like https://lunrjs.com/ or http://elasticlunr.com/. The integration flow is quite similar to the Algolia flow:

  1. A webhook for publication events calls an endpoint hosted somewhere
  2. This endpoint generates a search index with a library of your choice and puts it on some CDN
  3. Your application or website downloads the index and queries it with the library

Important things to mention:

  1. How do you determine if your project is small/medium/big? I’d use index size as a metric - you’ll have to deliver your index to end users. It’s up to you to decide how many bytes your infrastructure and your wallet is comfortable with.
  2. Always seek for libraries that:
    • Generate fairly compact indexes
    • Perform stop word removal, stemming (https://en.wikipedia.org/wiki/Stemming) and allow field-level boosting
    • Two aforementioned libraries are good in quality
  3. Always gzip your index. They compress extremely well

I think our documentation is a good example: https://www.contentful.com/developers/docs/ or the approach. We’re using Elasticlunr to generate an index containing all pieces of information in our documentation, including tutorials and API references. Index size is around ~1MB and less than 100kB gzipped.

Big scale

If the size of the index generated in the previous section is not acceptable then some self-hosted search engine has to be used. The most popular option is https://www.elastic.co/products/elasticsearch.

The principle is still the same: a webhook “feeds” a search engine with updated data. Of course this option requires bigger administrative/infrastructural effort and cannot be easily wrapped up - you’ll have to do your own research.

2 Likes