Whats the purpose of sync, or should I use contentful-import instead

Hi @felipe.cardozo , in the company I work for we have a similar issue and since we worked on it for a few weeks, I think it could be useful to share my findings:

  • We evaluated using the export/import function, but the problem is that the tool practically export all the data, and in JSON format (that is not easy to manipulate). That meaning that we would need to use the ‘import’ to import the data, but also, as in your case, we didn’t want all the entries re-synced, just the ‘updated’ or ‘new’ ones.
  • Similarly, from what I understood, the Sync use case is mostly for Apps, that they can download the ‘delta’ and have the latest content offline on whatever client. I didn’t feel comfortable using that approach either, cause I wanted something repeatable and scalable.

Having said that decided to use the standard CMA (Management API) to both read from our source environment and sync back to our destination one. What did we come up with?

  • First of all, we realised that we would need to read the entries from the source and destination environment, to decide then which entry would need to be inserted or updated. To do that we decide to use a simple SQLite database to store every info we needed from each entry, like the updated_at, the version (if published), the sys.id (entry-id), the content fields (as JSON), and so on. The table is very simple, for each field in the entry response, we would have a column for both the source and the destination environment. For SQLite, we are using this package: better-sqlite3 - npm (it’s really great and documentation is amazing)
  • The second step is to read the source environment. Luckily with getEntries we could paginate 1000 entries per API call, so quite convenient. We would take all the entries and save the relevant data into the SQLite database, included the JSON with the entry-data. In this way if the entry will need to be inserted/updated in the destination, we already have the payload (saving a lot of API calls)
  • As third step we read in the same way the entries in the destination environment. Two considerations: 1) There will be entries that exist in the source, but not in destination (aka new entries) 2) and entries that can be updated, cause they have same entry-id and the updated_at (or published_at) is more recent. But we did a little ‘trick’ here and if a content-type has a unique and required field, we consider it as sort of primary-key, and so we would match the ‘same’ entry that has been created independently in two environments. To make it simple, if you have a Blog post entry with same Slug, but different Entry-id, then you could ‘match’ that entry, even if there entry-id is different.

Still there? :smiley:

Ok, here the fun part. Since the data is now in your local database, all you need to do is to come up with your own strategy on what need to be synced. You practically end-up with entries that needs to be created in the destination environment and entries that have same entry-id (or slug) that needs to be update. In addition, you can/should consider if you want to update entries that are only-published (but then you should look that the linked entries are also published), or (as we do), sync every entry that is more updated (it’s the simplest and most efficient strategy).

A few tips also on the insert/update:

  • When inserting a new one from source to destination, we use the function createEntryWithId, so the newly created entry will have the same entry-id of the source environment. In this way, at the next run of the sync, they would be matching perfectly;
  • When updating, as said, we consider first the same entry-id in both environment, and then eventually the same slug/unique-key. For that we use a simple conf where we write down which field is the ‘primary-key’ for that content-type.
  • The first passage is to insert/update the entries (using the JSON for the source we saved), and then after all that is done, we do a second passage to publish the ones that are published in the source environment. In this way we reduce the risk of unresolvable linked entries (still remember to print the errors so you can fix errors manually).

A tip that will save you a lot of time, you can’t update an entry and immediately publish it, Nodejs won’t allow it cause the ‘entry’ version will be different; you will need to update it, do a getEntry again and then publish it :wink:

A sneak peak of the run look like this:

##INFO: From Environment: X-env
##INFO: To Environment: master
##INFO: Entry type: All (published, drafts, archived)
##INFO: Verbose Output: true
##INFO: Dry run: true
#########################################################
##LOG: Creating the Database to store Sync data
##LOG: Database created successfully!
##LOG: Getting the Content-types from Contentful
##LOG: Checks and Save which Content-types have Linked entries
##LOG: Retrieving the Content-types list to query for Entries
#########################################################
##LOG: Retrieved xxx Entries of type: blog - environment: X-env
##LOG: Retrieved yyy Entries of type: category - environment: X-env
##LOG: Retrieved zzz Entries of type: page - environment: X-env
#########################################################
##LOG: Retrieved xxx Entries of type: blog - environment: master
##LOG: Retrieved yyy Entries of type: category - environment: master
##LOG: Retrieved zzz Entries of type: page - environment: master
#########################################################
##LOG: Computing entries to update/insert
##LOG: Printing summary of entries to be inserted/updated
##DRY-RUN: Will update xxx entries of Content-type: blog
##DRY-RUN: Will update yyy entries of Content-type: category
##DRY-RUN: Will update zzz entries of Content-type: page
##DRY-RUN: Will insert xxx entries of Content-type: blog
##DRY-RUN: Will insert yyy entries of Content-type: category
##DRY-RUN: Will insert zzz entries of Content-type: page
#########################################################

Hope it helps :tada:

2 Likes