Cockroach Labs Database

cockroach-free Header Image
Left Arrow
Jeff Bower | 2024-10-12

Setting up a database is pretty easy, it’s just part of the LAMP model (Linux, Apache, MySQL, Perl/PHP/Python). Setting up a database cluster for resiliency is a bit harder. Setting up a distributed cluster such that you’ve got geographic redundancy using a consumer-grade ISP is actually very hard. And that’s before you take into account health monitoring which often needs to be hand-crafted. Many database engines still rely on fixed IPv4 addresses or inflexible port definitions. Enter Cockroach Labs.

In fairness, as a New Yorker the name resonated with me and I instantly understood why they used it. CockroachDB is a PostgreSQL-based solution that has an impressive customer list and was dead-simple to setup. After a few days of arguing with a variety of self-hosted FOSS solutions, Cockroach got me up and running with PHP and Python access in an hour.

Professionally, I used Cockroach as the core of an internal application with ~1M rows of data and we never dinged past the free tier. Personally, my data needs are much lower and using Cockroach is a no-brainer. In fact, my Cockroach-powered personal data warehouse was the inspiration for the version I put into production at ActiveDraft.

Setup

Setup with Cockroach is easy enough. They have a Serverless Cloud option with both AWS and Google Cloud hosting options along with 4-9s reliability. More importantly, they offer the ability to signup using Google, Microsoft, Github, or a standard email login. This is an incredible reduction in friction that cannot be underestimated, the moment you need a user to create a new account with yet another password to manage you lose a ton of customers. If it’s clicking on the “Sign Up with Google” link a ton of friction is removed - this was one of the failures I’ve encountered building my own products.

Once done, you have access to a web-based management platform that allows you to configure multiple databases, users, and perform hourly backup management. You can even setup fake SAML protections by whitelisting domains that can authenticate (if you use MSFT or Google to authenticate your domain, you can restrict to just your own domain).

PHP

Much-maligned, PHP is a useful tool for server-side authenticated access. Much of my read access is via the CockroachDB PHP Libraries which can pull the data and display it in an meaningful way. Additionally, I’ve built many webhook handlers to insert the data, using authenticated access based on the source. My goal with PHP-based solutions is to provide realtime access into the data, but some of the data may be limited in consistency if a webhook fails to trigger or there are race conditions (a user is changed in HubSpot but the data is pulled before the webhook is triggered and resulting actions executed, for example).

Personally, I’ve always hated direct database integrations. Instead I used the PHP libraries to create a web-based mediation layer. The goal here is that database engines change all the time. If I need to migrate to something beyond Cockroach, I don’t want to have to touch a bunch of libraries to make sure the data is constant, so instead I created a web-based API call that translates a generic SQL query into a database-specific set of commands. This allows me to build a series of libraries in other languages that call this API without needing to worry whether it’s PGSQL, MySQL, or some NoSQL variant. Put the smarts into a static API call and you can migrate reads and writes separately allowing for seamless database migrations when needed. Feel free to reach out if you’d like a deep dive on this API abstraction layer sooner rather than later.

Python

I’ve never been very happy with Python’s direct database access. Python for me is there for periodic data syncs, pulling data via an API and pushing it to the database to make it more inline with the actual data. While you can build a CockroachDB application with psycopg2, I much prefer my option as it decouples the database from subtending libraries by abstracting the database call to a RESTFUL API using a simple requests.post() mechanism which allows me to pass not only a complex SQL query but also an API key via a simple import requests inclusion instead of installing a database-specific driver.

JavaScript

What’s a database without a GUI? Well, it’s probably most databases… Similar to Python, I use a simple fetch() call for accessing the same API. SAML data provides me with some default access permissions allowing me to pre-position some generic or user-specific API keys. But, just like Python, I can switch database providers or architecture on the back end without needing to change the queries people know and love. Using my own server as an intermediary layer can even hide the actual API key from the user, instead calling an API on the internal server and using the group-based data to pull the API key at the PHP user without exposing it to the end user at all.

Data Sovereignty

It’s hard to give up your data to a third party. Cockroach’s service poses risks, that can’t be denied. However, we need to assess the customer base as well as their security posture. Cockroach’s customer list is pretty impressive, but that’s a far cry from the list of customers trusting a cloud-based, multi-tenant solution vs. a dedicated local deployment. I can only state that Cockroach has done me well, while I’ve run into a few issues around pushing data (likely during a software upgrade or local ISP issues - nothing frequent enough for me to even troubleshoot), I’ve never felt that my data was compromised nor that I was able to compromise other users’ data.

Their free tier is multi-tenant meaning that my data is technically mixed with data from other users, but their authentication and segregation of data seems reasonable. And the data that would be exposed is no worse than any other cloud provider, we’re not storing passwords (even hashed) in Cockroach nor do I believe Cockroach is a larger target than HubSpot, Pendo, or SalesForce when it comes to customer PII like email addresses. For my personal use, access to my Cockroach DB would give you a list of MAC and IP addresses for the devices I know about across two locations as well as a bunch of stuff that may tell you what software version some of my devices are on. If the data I held included a password database (even hashed), or something where the loss was more serious, I would certainly work on a front-end to an existing self-hosted solution to automate the needful things.

Summary

If you’re not a DB Admin by preference and need a database, Cockroach is a fantastic solution. I’ve pushed hundreds of thousands of rows into it without issue from either a technical or billing perspective. The setup is simple, and uses your pre-existing Google, MSFT, or GitHub credentials so even if you find it’s not for you there’s no real downside. Performance for my needs are great, and reliability follows standard rules of “if it fails, try again later.” That said, the management is much more SQL-based than using specific PostGres which for me is a boon, but for others may be a learning curve. Personally, my implementation with a RESTful API as an intermediary, makes Cockroach a perfectly viable solution I will not be deviating from in the future.