This document will cover how to run an FlowMatrix cluster on Linux VMs or bare-metal
server, like EC2 instances. This requires a good understanding of the FlowMatrix
architecture. For an easier approach to running a production-quality FlowMatrix
cluster, see the docs for running on top of Kubernetes.
Before starting this guide, follow the common setup steps in the deployment overview guide.
There are two options for running FlowMatrix on VMs; you may run the flowmatrix binary directly,
or use Docker.
Binaries for release versions are available for Linux (x86 and ARM) and MacOS (M1) on the
Github Releases page, or you can build your own binaries
by following the dev guide.
If building your own binary, make sure to use the
--release flag when calling cargo build. You may also want to build on a machine with the same CPU
as you plan to deploy with and use the env var RUSTFLAGS='-C target-cpu=native' to get the best performance
at the cost of portability to CPUs with different micro-architectures.
Alternatively, you can run the flowmatrix docker image, ghcr.io/flowmatrixsystems/flowmatrix:latest.
Running the migrations
FlowMatrix supports two databases for its configuration store: Sqlite
(the default) and Postgres. The chosen database can be configured via the database.type config option or via
FLOWMATRIX__DATABASE__TYPE env var.
If you are using Postgres you will need to run the database migrations on your
database before starting the cluster.
By default, FlowMatrix will expect a database called flowmatrix, a user account flowmatrix with password flowmatrix
at localhost:5432. These can be configured via the database config options.
# Add other env vars as appropriate; this example will run migrations
# on the host computer in Docker for Mac
$ docker run -e FLOWMATRIX__DATABASE__POSTGRES__HOST=host.docker.internal \
ghcr.io/flowmatrixsystems/flowmatrix:latest migrate
Running the cluster
The FlowMatrix cluster can run in two modes on VMs; either as a single-node cluster using the process scheduler,
or as a distributed cluster using the node scheduler. The former is simpler, but cannot scale horizontally.
Additionally, you may decide to run all FlowMatrix services together in a single
process, or as a separate processes for high-availability. This guide will only
cover the former; for guidance on more complex deployments please reach out to
the dev team on Discord or at
support@flowmatrix.systems.
Configuration
In addition to the database configs described in the migration section, there are several
other configuration options that you may wish to set via environment variables, including
CHECKPOINT_URL and ARTIFACT_URL. Note that for a distributed
cluster, those must be set to remote storage that is accessible by all nodes in the cluster.
FlowMatrix services
The entire FlowMatrix control plane can be run as a single process with the cluster subcommand:
$ docker run ghcr.io/flowmatrixsystems/flowmatrix:latest cluster
For high-availability, this should be managed by a process manager like systemd.
Schedulers
FlowMatrix ships with two schedulers that can be used for
VM deployments: process and node. The scheduler is selected via the controller.scheduler
config option or the FLOWMATRIX__CONTROLLER__SCHEDULER environment variable.
Process scheduler
The process scheduler is the default. It runs pipelines by spawning new processes on the same host
as the control plane. This is great for simple, single-node deployments as no other infrastructure
is required.
To use the process scheduler, run the control plane with controller.scheduler=process or with no
scheduler configuration.
Node scheduler
The node scheduler supports running a distributed FlowMatrix cluster without
requiring Kubernetes or another complex distributed runtime. An FlowMatrix node cluster is
made up on some number of hosts running the node process, which are able to schedule work,
and a control plane running with controller.scheduler=node.
A node can be run via the flowmatrix binary or Docker image:
$ FLOWMATRIX__CONTROLLER_ENDPOINT=http://localhost:9190 flowmatrix node
$ docker run \
-e FLOWMATRIX__CONTROLLER_ENDPOINT=http://localhost:9190 \
ghcr.io/flowmatrixsystems/flowmatrix:latest cluster
Replace the FLOWMATRIX__CONTROLLER_ENDPOINT configuration with the host and port that the controller is running on.
Note that the node should always be run within a process manager that restarts
it, like systemd; nodes are designed to restart when they lose connection to the controller.
Nodes can be configured with a given number of slots, via the node.tasks-slots
config. This controls how many parallel subtasks can run on that node;
typically, you would want to set this to the number of CPUs but this can be somewhat
hardware and workload dependent.