Last year, Nuvalence had the opportunity to work with a large university to develop a COVID symptom assessment & testing platform.  The platform was built initially to support a scientific study being run by the university, with the goal of expanding to also accommodate back-to-work and back-to-school monitoring scenarios.  It was built to be hosted on Microsoft Azure.  In order to support both the large anticipated scale of data, as well as the rapidly evolving and changing data schema needs, the team decided to utilize Cosmos DB as the primary data store.

If you are not familiar with Cosmos DB, it is Microsoft’s global scale, fully managed NoSQL database that offers not just its own SQL interface, but MongoDB, Cassandra, Azure Table, and Gremlin APIs.  Throughout the project, we had several lessons learned that we wanted to share with our clients and fellow developers.

Terminology

  • Database Account – may contain 1 or more databases

  • Database – Determines API, managed users, permissions, user defined functions, and can provide a pool of shared R/Us to collections

  • Collection – A group of documents.

  • Document – A JSON document

  • R/U – Request Unit, the unit of billing and scale

  • Change Feed – A stream of events from a collection reporting all Inserts and Updates to documents.

Backups and Restores

By default, Cosmos DB backs up your data every 4 hours, and keeps the last 8 hours of backups (meaning the last 2 backups are kept).  You can choose to backup more frequently, and keep backups for a longer period of time.  This will of course cost extra, but you do get the first 2 backups for free.  As an initial set of defaults, this works reasonably well.

Unfortunately, this is the extent of backup and restore capability.  If you want to create a backup before a big deployment or upgrade, you have to enter a support ticket.  If you do need to restore, you will also need to enter a support ticket.  And you should be sure to expand your retention period before entering the ticket, because you may lose the point you wish to restore from by the time the ticket is handled.

Alternatively, you can build your own backup and restore functionality by using either the Change Feed, Azure Data Factory, or Azure Synapse to build a replica of your database elsewhere.  The scope of doing that is beyond this blog post, and I would not recommend it unless the SLA’s for a restore were unacceptable to the business.

Thankfully, Microsoft is improving this experience, and has Point In Time Restore (PITR) in public preview.  This is a capability already present in AWS’s DynamoDB, and will be a great addition to CosmosDB.

Don’t Over Allocate R/Us

When creating a Database Collection in Cosmos, the collection may be granted either a static pool of RUs to consume, or it can be set to automatically scale up to a maximum throughput.  At first glance, it may seem wise to set the maximum throughput value to something very high, to never need to worry about running out of R/Us.  This would be an incorrect assumption to make, and one that has many implications that are not immediately apparent if you are using Terraform or some other Infrastructure as Code tool.

The first issue is the cost impact.  Azure will scale from 10% of your max, to 100% of your max capacity.  So you will constantly be consuming and paying for at least 10% of the very high threshold you set.  Assuming for an instant that you set 500,000 R/U’s as the max, you consume anywhere from 50,000 – 500,000 R/U’s, and your monthly bill (at time of writing) will be anywhere from $4,380 – $43,800 for that single collection.

The next issue is how Azure horizontally partitions its collections.  Each physical partition can provide up to 10,000 R/Us, and up to 50 GB of storage.  So by allocating a maximum of 500,000 RUs, your collection will automatically have 50 physical partitions.  The logical partitions are then allocated to these physical partitions, and Azure manages that allocation automatically.   This can lead to some physical partitions being overused, and others not being used at all, based on how you’ve chosen to logically partition your data.

For example, let’s assume that we have a system where we’ve chosen to partition the database container by state (shown below).  In this scenario, we have some physical partitions with a single state, others with several states.  The best we could possibly achieve with the current partition key is a true 1-to-1 mapping of state to physical partition, if we had records in every state.  But the utilization of the physical partitions allocated would vary greatly.  California and New York partitions will very likely receive much more traffic than Wyoming and Alaska.

Screen Shot 2021-04-08 at 9.29.07 PM.png

A large number of physical partitions can also have ramifications on using the Change Feed.  The change feed allows developers to react to changes in the data.  Azure Functions can consume this feed by checkpointing where in the history they are.  However, this checkpoint must be maintained per physical partition.  So, if you are using the change feed to respond to events in our example system, the change feed functions will end up consuming 50x the RUs, because it has to monitor and checkpoint all 50 partitions.

Unfortunately, if you as a developer make this error, there is no going back.  Cosmos DB is built to scale up, it is not built as well to scale back down.  Cosmos DB will not merge and coalesce physical partitions (yet).  So even if you scale the collection R/Us back under 10,000 units, you will still have 50 partitions.

Some better practices when dealing with Cosmos DB:

  1. If you don’t know how much you’ll need, set your maximum R/Us to 10,000 or less to start.

  2. Build retry logic into your code.  You will get HTTP 429 errors from Cosmos if you have exceeded the throughput allocation.  These can be addressed with an exponential backoff policy.

  3. Monitor 429 errors.  Having an occasional 429 error response is good, it means you’re using the throughput R/U’s that you are paying for.  Having a lot of them probably means it’s time to scale up.

  4. Make sure your Partition Keys for logical partitioning can scale out as your storage needs increase.

Key Takeaways

  1. Plan ahead and be very careful about large data migrations, because developers and administrators do not have access to do a backup or a restore on-demand.

  2. Plan your partition keys carefully, and in a way that can grow and be re-distributed over time

  3. Plan to start low on your provisioned throughput, and scale to grow.  Massively over-allocating cannot be undone, except to delete the database container and start again.

  4. If you like what you’ve read, and enjoy solving tough problems, check out our open positions.  We are growing quickly, and we’d love to hear from you!

Further Reading

  1. Cosmos DB Request Units – https://docs.microsoft.com/en-us/azure/cosmos-db/request-units

  2. Cosmost DB Partitioning – https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview

  3. Cosmos DB Change Feed – https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed

  4. Cosmos DB Capacity Calculator – https://cosmos.azure.com/capacitycalculator/

  5. Exponential Backoff Policies w/ Polly: https://docs.microsoft.com/en-us/dotnet/architecture/microservices/implement-resilient-applications/implement-http-call-retries-exponential-backoff-polly