Skip to content

gw: implement PROXY protocol#361

Open
kvinwang wants to merge 15 commits intomasterfrom
pp
Open

gw: implement PROXY protocol#361
kvinwang wants to merge 15 commits intomasterfrom
pp

Conversation

@kvinwang
Copy link
Copy Markdown
Collaborator

@kvinwang kvinwang commented Sep 30, 2025

Summary

Add PROXY protocol (v1/v2) support to dstack-gateway, with the decision of whether to send a PP header to a backend made per (instance, port) — not per-client, not per-gateway.

Background & security

An earlier revision of this PR encoded PP as a p suffix in the SNI subdomain (e.g. app-8080p.domain.com). That's client-controlled: a client could connect to a PP-expecting port without the suffix, the gateway would skip writing the PP header, and the backend would fall back to the raw TCP peer address — the gateway's WireGuard IP — as the source. Effectively a source-address spoof.

PP must therefore be declared by the app itself and delivered to the gateway through channels that clients cannot forge.

Design

1. AppCompose.ports (dstack-types)

Apps declare per-port attributes in app-compose.json:

{
  "ports": [{ "port": 8080, "pp": true }]
}

Because it's part of app-compose, the declaration is measured into compose_hash and covered by attestation.

2. Reported at registration (RegisterCvmRequest.port_attrs)

New CVMs include their port_attrs in the existing WireGuard registration RPC. The field is wrapped in optional PortAttrsList so the gateway can distinguish "not reported" (old dstack-util) from "reported empty" (new CVM with no PP-enabled port).

3. Stored per-instance, synced across gateway nodes

InstanceInfo/InstanceData grow a port_attrs map and a port_attrs_hash (the compose_hash it was learned against). Both are persisted in the existing WaveKV inst/{instance_id} record, so per-instance decisions survive gateway restarts and propagate across the cluster without extra keys.

Different instances of the same app may legitimately run different compose hashes (rolling upgrades), so caching is keyed by instance_id, not app_id.

4. Per-connection decision

AddressInfo carries the instance_id and connect_multiple_hosts returns the winner's id. should_send_pp(state, instance_id, port) consults the cached port_attrs:

  • Hit → send PP header (or not) as declared.
  • Miss → lazy-fetch via the agent's Info() RPC (see next section), cache, and use the result.
  • Any failure → default pp=false.

5. Backward compatibility

Legacy CVMs that don't yet ship port_attrs at registration:

  • Registration stores port_attrs: None but records the attested compose_hash.
  • On the first proxied connection, the gateway calls http://{cvm_ip}:{agent_port}/prpc Info(), parses tcb_info.app_compose, extracts ports, and writes the result back to WaveKV.
  • Subsequent connections hit the cache.

Subtleties:

  • Re-registration with port_attrs=None does not wipe previously cached attrs (avoids a redundant lazy fetch every 3 minutes when dstack-util is old).
  • Re-registration with a different compose_hash (app upgraded in place — typical for KMS-provisioned CVMs that reuse their disk) does invalidate the cache so stale PP flags don't outlive the upgrade.

6. Inbound PP

inbound_pp_enabled (server config) tells the gateway to read a PP header from the inbound TCP stream — used when the gateway sits behind a PP-aware LB like Cloudflare. When disabled, the gateway synthesises a PP header from the real TCP peer. Either way, the resulting header is what gets forwarded to the backend (when enabled).

Config surface

In gateway.toml under [core.proxy]:

agent_port = 8090          # Guest-agent port inside each CVM (used for Info() fetch)
inbound_pp_enabled = false # Read PP header from upstream (e.g. Cloudflare)

[core.proxy.timeouts]
pp_header = "5s"           # Timeout for reading inbound PP header

There is no global outbound_pp_enabled — per-port control comes from app-compose.json.

Files changed

  • dstack-types/src/lib.rsAppCompose.ports, PortAttrs
  • gateway/rpc/proto/gateway_rpc.protoPortAttrs, PortAttrsList, RegisterCvmRequest.port_attrs
  • dstack-util/src/system_setup.rs — CVM reports port_attrs during registration
  • gateway/src/pp.rs (new) — PP v1/v2 header parse + synthesis
  • gateway/src/proxy/port_attrs.rs (new)should_send_pp + lazy fetch
  • gateway/src/{config,main_service,models,debug_service}.rs, gateway/src/kv/mod.rs, gateway/src/proxy/{proxy,tls_passthough,tls_terminate}.rs — wiring

Test plan

  • cargo check --workspace
  • cargo test -p dstack-gateway (8 tests pass, snapshots updated)
  • cargo fmt --all
  • Manual: deploy a CVM with ports: [{port: 8080, pp: true}], confirm PP header is received at the backend
  • Manual: deploy a legacy CVM (no ports field), confirm no PP header is sent
  • Manual: rolling-upgrade a KMS-provisioned CVM from pp: true to pp: false, confirm the cache is invalidated on the first re-registration with the new compose_hash
  • Manual: behind Cloudflare with inbound_pp_enabled = true, confirm client IP is propagated end-to-end

Add PROXY protocol support to the gateway with two server-side config
options instead of client-controlled SNI suffixes:

- inbound_pp_enabled: read PP headers from upstream load balancers
- outbound_pp_enabled: send PP headers to backend apps

The original PR#361 used a 'p' suffix in the SNI subdomain to toggle
outbound PP per-connection. This is a security flaw: a client could
connect to a PP-expecting port without sending PP headers, allowing
source address spoofing. Both flags are now server-side config only.
Replace the global outbound_pp_enabled switch with a per-(instance, port)
lookup so different ports of the same backend can have different PP
behaviour. PP is declared by the app and reported to the gateway through
authenticated channels — never by client SNI.

Pipeline:

1. dstack-types::AppCompose grows a "ports" array. Each entry carries a
   port number and a "pp" flag. Because it's part of app-compose.json it
   is measured into compose_hash and attested.

2. RegisterCvmRequest grows an optional PortAttrsList. New CVMs include
   their port_attrs at WireGuard registration time. The optional wrapper
   lets the gateway distinguish "not reported" (legacy CVM) from
   "reported empty" (new CVM with no PP-enabled port).

3. The gateway stores port_attrs on InstanceInfo and persists/syncs it
   via WaveKV (InstanceData), keyed by instance_id (different instances
   of the same app may run different code).

4. AddressInfo now carries instance_id, and connect_multiple_hosts
   returns the winner's instance_id. The proxy looks up that instance's
   port_attrs to decide whether to send a PROXY header.

5. Backward compat: if an instance has no port_attrs (legacy CVM), the
   gateway lazily fetches them via the agent's Info() RPC, parses
   tcb_info.app_compose, and caches the result in WaveKV.

PROXY protocol module is unchanged; only the *decision* of whether to
send a header moves from a global config to a per-instance lookup.
A re-registration from a legacy CVM carries port_attrs=None, which
previously wiped any value learned at an earlier registration or lazy
fetch. Gateway restart + CVM re-register would then force a redundant
Info() fetch. Keep cached attrs unless the caller actively reports new
ones; same instance_id implies same compose_hash, so the cache cannot
go stale.
Same instance_id with a different compose_hash means the app was
upgraded in place (typical for KMS-provisioned CVMs that reuse their
disk). Previously, a legacy-style re-registration (port_attrs=None)
would preserve stale cached attrs across such upgrades because the
gateway assumed instance_id ↔ compose_hash was stable.

Track the compose_hash each cached port_attrs was learned against
(taken directly from the attested AppInfo, not from client input).
Mismatch clears the cache so the lazy Info() fetch runs again.
@kvinwang kvinwang changed the title gw: Implement proxy protocol gw: implement PROXY protocol with per-instance control Apr 16, 2026
@kvinwang kvinwang changed the title gw: implement PROXY protocol with per-instance control gw: implement PROXY protocol Apr 16, 2026
@kvinwang
Copy link
Copy Markdown
Collaborator Author

End-to-end test on tdxlab

Deployed an nginx app with pp=true on port 8080 and pp=false on port 8081, exercised both forward and reverse paths.

Test endpoints

App ID a1586902ff3d0860ea024f3ebd87e91d508ed675, instance ID dc7f9c87d80e4bf47bc9dc7ab27f371cf9455144. CVM uses dstack-0.5.8 (legacy image — does not yet ship port_attrs at registration), so port_attrs is populated via the lazy Info() fetch path. Backward-compat path verified end to end.

Outbound PP (per-port, declared in app-compose.json)

{ "ports": [{"port": 8080, "pp": true}, {"port": 8081, "pp": false}] }
Port pp Backend sees
8080 true proxy_protocol_addr=107.131.79.101 (real client)
8081 false remote_addr=10.8.42.1 (gateway WG IP — client IP lost, as expected)

Inbound PP (gateway behind a PP-aware LB)

Set inbound_pp_enabled = true, moved gateway listen to :13006, fronted with haproxy on :13004 using send-proxy-v2:

client (107.131.79.101) → haproxy:13004 → [PP v2] → gateway:13006 → [PP v2] → backend

Result on pp=true port: origin addr = 107.131.79.101 — the real client IP propagates through both hops.

@kvinwang kvinwang marked this pull request as ready for review April 16, 2026 07:29
Copilot AI review requested due to automatic review settings April 16, 2026 07:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end PROXY protocol (v1/v2) support in dstack-gateway, with per-(instance_id, port) outbound decisioning sourced from attested app-compose metadata (and lazily fetched for legacy CVMs), plus optional inbound PP parsing when the gateway is behind a PP-aware LB.

Changes:

  • Introduce AppCompose.ports / PortAttrs and propagate port attributes through CVM registration (protobuf + dstack-util).
  • Add gateway-side PP header read/synthesis and conditional outbound PP header injection per selected backend instance/port.
  • Persist per-instance port attributes in WaveKV with compose-hash invalidation and legacy lazy fetch via guest-agent Info().

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
guest-agent/src/rpc_service.rs Updates test fixture for new AppCompose.ports field.
gateway/src/proxy/tls_terminate.rs Injects outbound PP header (when enabled) before bridging to backend.
gateway/src/proxy/tls_passthough.rs Carries PP header through passthrough flow; returns winning instance_id from racing connect; injects PP header per port.
gateway/src/proxy/port_attrs.rs New: per-instance/port lookup with legacy lazy fetch via agent Info().
gateway/src/proxy.rs Reads/synthesizes inbound PP header before SNI extraction; passes header through proxy paths.
gateway/src/pp.rs New: inbound PROXY protocol v1/v2 parse + synthesized header creation + display helper.
gateway/src/models.rs Extends InstanceInfo with port_attrs and port_attrs_hash.
gateway/src/main_service/tests.rs Adjusts test calls for updated registration/new_client signatures.
gateway/src/main_service/snapshots/dstack_gateway__main_service__tests__config.snap Snapshot update for new InstanceInfo fields.
gateway/src/main_service/snapshots/dstack_gateway__main_service__tests__config-2.snap Snapshot update for new InstanceInfo fields.
gateway/src/main_service.rs Wires registration to store port attrs + compose hash; persists/invalidates cache; exposes instance lookup/update helpers; threads instance_id through address selection.
gateway/src/main.rs Registers new pp module.
gateway/src/kv/mod.rs Persists port_attrs and port_attrs_hash in InstanceData; defines PortFlags.
gateway/src/debug_service.rs Updates debug registration call signature.
gateway/src/config.rs Adds agent_port, inbound_pp_enabled, and timeouts.pp_header.
gateway/rpc/proto/gateway_rpc.proto Adds PortAttrs / PortAttrsList; extends RegisterCvmRequest with optional port_attrs.
gateway/gateway.toml Adds inbound_pp_enabled and pp_header timeout configuration.
gateway/Cargo.toml Adds proxy-protocol dependency.
dstack-util/src/system_setup.rs Sends port_attrs during registration based on app-compose ports.
dstack-types/src/lib.rs Adds AppCompose.ports and PortAttrs schema.
Cargo.toml Adds workspace dependency pin for proxy-protocol.
Cargo.lock Locks new dependency graph for proxy-protocol and transitive deps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gateway/src/main_service.rs
Comment thread gateway/src/proxy/port_attrs.rs Outdated
Comment thread gateway/src/pp.rs
Three fixes from review:

1. Treat the wire-format `port: uint32` as out-of-range when it can't fit
   in u16 (instead of silently truncating to a different valid port). Use
   `u16::try_from` and skip invalid entries.

2. Move the legacy `Info()` lazy fetch off the connection critical path:
   - `should_send_pp` is now sync. On a cache hit it returns the declared
     value; on a miss it enqueues the instance for the background worker
     and returns `pp = false` immediately, so a slow/missing CVM agent
     never blocks a proxied connection.
   - A single background task (`spawn_fetcher`) drains the queue, dedupes
     in-flight instance ids via a HashSet, applies a configurable
     timeout (`timeouts.port_attrs_fetch`, default 10s), and writes the
     result back to WaveKV.

3. Add unit tests in `pp.rs` for the inbound PROXY parser: v1/v2 IPv4
   happy paths, no-prefix rejection, v1 missing terminator, v2
   over-length cap, and the address synthesis/Display helpers.
When a CVM registers without port_attrs (legacy CVM, or compose_hash
mismatch invalidated the cache), enqueue a background fetch right away
instead of waiting for the first proxied connection to discover the
miss. Reduces the window during which the fast path returns a wrong
`pp = false` because the cache hasn't been populated yet.

The fetcher dedupes in-flight ids, so this is safe to enqueue on every
registration that ends up without cached attrs.
Right after registration, the WireGuard handshake hasn't completed yet
and the agent's TCP port isn't reachable. The previous one-shot fetch
would fail and leave the cache empty, falling back to pp=false until
the next connection (which would itself eat one more failed fetch).

Move the timeout/retry policy into a dedicated config block so it can
be tuned per deployment:

  [core.proxy.port_attrs_fetch]
  timeout = "10s"          # per-attempt Info() RPC timeout
  max_retries = 5          # extra attempts after the initial try
  backoff_initial = "1s"   # doubles each retry up to backoff_max
  backoff_max = "30s"

Worst-case 1+2+4+8+16+30 ≈ 1 min covers a reasonable WG warmup window.

Bail out early when the instance is no longer in state (recycled while
queued) — the unknown-instance error chain is the signal.
Don't waste a 1-minute retry budget on errors that can't recover. Two
classes:

- Transient → retry: TCP/RPC failure, Info() timeout. The CVM may just
  be warming up.
- Permanent → bail: instance was recycled (no longer in state), tcb_info
  isn't valid JSON, missing app_compose key, or app_compose itself
  fails to parse. Same input each retry, same failure.

`tcb_info` empty (public_tcbinfo=false) still goes through the success
path with an empty map cached, as before — that's not a fetch failure.
Thread the new gateway config knobs through the dstack-app deployment:

- .env / .app_env gains `INBOUND_PP_ENABLED` (default false). Set to
  true only when the gateway runs behind a PP-aware L4 LB; otherwise
  every connection would be rejected because the parser would try to
  read a PP header that isn't there.

- docker-compose.yaml forwards the new env vars plus the retry/backoff
  knobs for the background port_attrs fetcher and the pp_header read
  timeout.

- entrypoint.sh writes the corresponding fields into gateway.toml,
  including the new [core.proxy.port_attrs_fetch] section.

Defaults match the in-repo gateway.toml so existing deployments
continue to work without any .env changes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 25 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gateway/dstack-app/builder/entrypoint.sh
Comment thread gateway/src/proxy/port_attrs.rs
Comment thread gateway/src/proxy/port_attrs.rs
Comment thread gateway/src/main_service.rs Outdated
kvinwang and others added 3 commits April 16, 2026 01:24
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The pre-existing script had three latent issues that weren't checked
because the file hadn't been touched. Modifying it for the PP rollout
brings it into the prek diff, so fix them now:

- SC1091: `source .env` — explicitly mark dynamic include
- SC2002: replace `cat … | tr …` with `tr … < file` redirect
- SC2086: quote $WG_ADDR in the cut pipeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants