Written by Andris Reinman
These days, there are a number of popular platforms and methods for deploying software and managing infrastructure–from Kubernetes and Containers to Docker and Observability.
At Outfunnel, we use none of these.
This article takes a look into how we at Outfunnel have built up our infrastructure, from the early days of an MVP to an increasingly complex web app serving 1000+ customers.
Featured Content Ads
add advertising hereEvery system is a victim of its past.
When we initially started our software as a service platform, it used so few machine resources that we managed to host everything in a single $20/mo DigitalOcean droplet. We didn’t have the time, staffing or need to think about infrastructure and deployments. So we made it as automated as possible within that $20 machine.
All deployments were automatic. Anything in the master branch in Github immediately ended up in production thanks to a CI/CD service. Our data model was simple and our primary database, MongoDB, managed to handle everything without much oversight. So much so that we kind of forgot that infrastructure even is a thing to consider. Everything was working smoothly.
Gradually, the system grew, until at one point we could no longer fit all of that into a single machine. At first, we moved out our database and immediately made it into a replicated cluster instead of using a single node.
Featured Content Ads
add advertising hereNext, we had to separate our log management system. The application itself outgrew the machine’s resources. We split it up to run on multiple servers while not making any actual improvements in the infrastructure, just copying the exact solution to every new machine.
We had an opportunity to change our system when we received free credits from AWS, and started to move over from DigitalOcean. As it was a new platform, we had to set up all machines from the start.
However, our system was not provider-specific, it was server-specific, and you can run Ubuntu pretty much anywhere, it does not matter who the provider is. So, we just went with the existing system.
Today, we are running around 60 EC2 instances and still using the same system that we started with on that single VPS. A system that is mainly based on many bash scripts that we run in every new server that we provide to prepare it to run our application.
You may be wondering what our AWS costs look like with a setup like this. Here’s an monthly overview of that, for July-November 2021:
We got it right for a single server. For a more significant number of servers, and more complex application architecture, probably not.
Applications/services
All applications are Node.js applications running as SystemD services. Each application can run on a single server or multiple servers depending on the load that the service is causing. Services are set up manually on each server–well, actually with a bash script, not 100% manually.
Pushing code to the server
All master branch commits are automatically deployed to production with the Codeship CI/CD service as follows:
- Codeship receives a webhook from Github about the new commit.
- Deploy script then:
checks out the code,
installs npm packages,
compresses the result and
uploads it to each production server for that service. - Then it remotely calls the installation script over ssh to run on these servers.
Next, the remote installation script:
- unpacks the application code,
- replaces existing application folder with the new one and
- restarts the SystemD service.
Install script always runs in unprivileged user mode. It has a single exception set where it is allowed to restart the service as a root user.
Configuration
All applications use the same configuration setup. The application looks for a default.toml file from the ./config folder in the application root and then looks for the production config file provided by an environment variable.
Both files are merged in a way where production config overrides keys in the default config. This way, we can keep most of the configuration in default.toml, and only production-specific keys have to be set in the production config file.
Production configuration files are updated manually. Usually, it includes service-specific secrets (e.g., Pipedrive OAuth client secret to access Pipedrive’s API) and increased worker counts (for example, the default would be to run a single worker process; in production, the same service runs ten worker processes).
Communication between services
Config file in the production server is edited manually, and the API root is given for a specific service:
Logging
Syslog
All applications run as SystemD services and have a unique log id set in the service file.
Log output written to stdout is Winston JSON and prefixed with _jsonrc: string.
Log entries end up in syslog where rsyslog daemon is configured to send these rows to a central Graylog instance.
Graylog
Graylog has two extractors set up:
1. Split & Index step to extract JSON string from the log entry.
2. Parse that JSON string.
Monitoring
Each application has an HTTP health endpoint at /health and Prometheus metrics endpoint at /metrics set up.
Health
/health is publicly accessible via our Nginx proxy at outfunnel.com
The Freshping service checks the health endpoint, and it uses it to generate our public status page.
Metrics
Our central Prometheus server then periodically fetches metrics from the application’s /metrics endpoint.
Grafana uses these metrics for its dashboards that we display on the TVs in our office (or in a browser, if working remotely is your thing).
We have a backup machine outside our primary network with daily cron jobs to back up different parts of the stack. As all of the code is stored in Github anyway, we mainly focus on backing up the databases.
- MongoDB backup is generated by running mongodump.
- Redis backup is generated by running rsync and copying the latest Redis dump file from one of the replica servers.
- Our homepage runs on WordPress, so we dump the SQL data and copy WordPress files over as well.
So, in short, that’s our system for deploying software and managing infrastructure. The system we got right at first but failed to outgrow. Do you have better ideas for solving the same issues? Come work with us!
NOW WITH OVER +8500 USERS. people can Join Knowasiak for free. Sign up on Knowasiak.com
Read More