fix(coderd): bump workspace autostop deadline on chat heartbeat#23314
fix(coderd): bump workspace autostop deadline on chat heartbeat#23314
Conversation
trackWorkspaceUsage previously only called usageTracker.Add (dormancy protection). It now also calls workspacestats.ActivityBumpWorkspace so the workspace build deadline is extended each time the chat heartbeat fires, preventing mid-conversation autostop.
a8cb2d2 to
c2fa11f
Compare
|
Addressing both review comments: RE: DB thrashing concern (chatd.go) Fires every heartbeat (~30s, RE: Merge test into existing one Done — rolled the deadline assertion into
|
mafredri
left a comment
There was a problem hiding this comment.
🤖 mafredri sent me to poke holes in this. I brought friends.
Good call wiring the deadline bump into the heartbeat, this is a real gap for chat-only workspaces. The implementation is clean: synchronous call in a serial goroutine, correct context derivation, and the SQL guards make concurrent bumps idempotent.
One blocker (P0), one suggestion (P2), two nits (P3), and a few observations across 4 inline comments.
🤖 Six reviewers walked in, one finding walked out alive. The rest were no-ops, like most of these heartbeat bumps.
| // so no prebuild guard is needed (unlike reporter.go). | ||
| // | ||
| // This fires every heartbeat (~30s) but the SQL only | ||
| // performs an UPDATE when 5% of the deadline has elapsed, |
There was a problem hiding this comment.
P0 ActivityBumpWorkspace will be denied by dbauthz in production (Gopher, Breaker, DBA)
The chatd subject (dbauthz.go:698-717) grants only policy.ActionRead on ResourceWorkspace, but querier.ActivityBumpWorkspace (dbauthz.go:1554-1558) calls update(), which enforces policy.ActionUpdate. Every heartbeat tick will:
- Run
GetWorkspaceByID(theworkspaces_expandedview, 4-way JOIN + 2 correlated subqueries) - Fail the authz check
- Log at error level via
activitybump.go:49-53
The test doesn't catch this because dbtestutil.NewDB(t) returns a raw database.Store without dbauthz wrapping.
At 1000 concurrent chats, that's ~2000 wasted view queries/min plus error-level log spam.
Fix options:
- Add
policy.ActionUpdatetoResourceWorkspacein the chatd role. Check what else this unlocks. - Use
dbauthz.AsSystemRestricted(ctx)scoped to just this call, with a//nolint:gocriticexplaining why. - Add a narrower permission if the RBAC system supports it.
Whichever route, adding a dbauthz-aware integration test (or at least a unit test with the wrapped store) would catch regressions.
There was a problem hiding this comment.
This is a good catch. I wonder if it's worth wrapping with dbauthz by default.
| // TemplateScheduleStore here. The activity bump logic | ||
| // defaults to the template's activity_bump duration | ||
| // (typically 1 hour). Chat workspaces are never prebuilds, | ||
| // so no prebuild guard is needed (unlike reporter.go). |
There was a problem hiding this comment.
P2 "no-op SELECTs" overstates it (DBA, Scribe)
The 5% guard prevents the write (no row lock, no WAL entry), but the CTE still runs a 4-table join (workspace_builds → provisioner_jobs → workspaces → templates) with interval arithmetic on every call. It's fast (index scan + 3 PK lookups), but calling it a "no-op SELECT" is misleading.
Suggestion: something like "most calls perform a read-only CTE lookup with no UPDATE" would be more accurate.
PS. Not a performance concern at current scale, just a documentation accuracy thing.
| Valid: true, | ||
| Time: dbtime.Now().Add(-30 * time.Minute), | ||
| }, | ||
| }) |
There was a problem hiding this comment.
P3 WorkspaceResource creation is unnecessary (Tester)
The activitybump.sql query joins only workspace_builds → provisioner_jobs → workspaces → templates. No join on workspace_resources. This line creates a row nothing reads. Harmless but cargo-culted, could confuse future readers into thinking it's a precondition.
| }) | ||
| ws := dbgen.Workspace(t, db, database.WorkspaceTable{ | ||
| OwnerID: user.ID, | ||
| OrganizationID: org.ID, |
There was a problem hiding this comment.
[obs] Template activity_bump relies on implicit DB column default (Tester)
The test never explicitly sets activity_bump on the template. It works because the column default is 3600000000000 (1 hour, from migration 000190). Compare with activitybump_test.go which explicitly calls UpdateTemplateScheduleByID. Not a bug, just an implicit coupling that could silently break if the default ever changed.
…y bump Address review feedback from @mafredri: - P0: Add policy.ActionUpdate to the chatd RBAC role so ActivityBumpWorkspace passes dbauthz checks in production. - P2: Reword comment — the CTE still runs joins on every call, only the UPDATE is guarded by the 5% threshold. - P3: Remove unnecessary WorkspaceResource creation in test. - obs: Explicitly set template activity_bump instead of relying on implicit DB column default.
|
Pushed f05deea addressing all of @mafredri's feedback: P0 — dbauthz denial: Added P2 — comment wording: Reworded to "most calls perform a read-only CTE lookup with no UPDATE" — no longer claims "no-op SELECTs". P3 — WorkspaceResource: Removed. The obs — implicit activity_bump: Added explicit
|
Pass a dbauthz-wrapped store to the chatd server so the AsChatd role is enforced on every query, matching production. Confirmed this catches the missing ActionUpdate permission: reverting it locally causes the test to fail with rbac: forbidden on every heartbeat tick.
mafredri
left a comment
There was a problem hiding this comment.
🤖 mafredri's neural network weights are satisfied.
All four findings addressed:
- P0 authz: chatd role now grants
ActionUpdateonResourceWorkspace, test wraps the DB with dbauthz and usesAsChatdcontext. This is the broadest fix (chatd can now alsoUpdateWorkspace,FavoriteWorkspace, etc. from an RBAC perspective), but the chatd code itself only callsActivityBumpWorkspace, so the practical scope is fine. - P2 comment: reworded to "writes when 5% of the deadline has elapsed — most calls perform a read-only CTE lookup with no UPDATE."
- P3
WorkspaceResource: removed. - [obs] implicit
activity_bump: now explicitly set viaUpdateTemplateScheduleByID.
🤖 Hyvää työtä. Four findings in, four fixes out, nothing left to argue about.
There was a problem hiding this comment.
Pull request overview
This PR prevents chat-associated workspaces from auto-stopping mid-conversation by extending the workspace build deadline on each chat heartbeat, reusing the existing workspace activity-bump mechanism.
Changes:
- Call
workspacestats.ActivityBumpWorkspacefromchatd.Server.trackWorkspaceUsage(invoked on heartbeat) to extend the autostop deadline. - Expand the
chatddbauthz subject permissions to includeworkspace:update, whichActivityBumpWorkspacerequires. - Extend the existing chat heartbeat test to validate both
last_used_atbumping and autostop deadline bumping, using a dbauthz-wrapped DB for production-like enforcement.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
coderd/chatd/chatd.go |
Adds autostop-deadline bumping to the heartbeat usage tracking path. |
coderd/chatd/chatd_test.go |
Enhances heartbeat test coverage to assert workspace build deadline extension; wraps DB with dbauthz to enforce AsChatd permissions. |
coderd/database/dbauthz/dbauthz.go |
Grants chatd subject workspace:update to authorize activity bump updates. |
coderd/database/dbauthz/dbauthz_test.go |
Updates AsChatd authorization tests to reflect the new allowed/denied workspace actions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // | ||
| // Scaling note: for 10,000 active chats, this could lead to | ||
| // approx. 333 CTE queries/second. A cheap fix for this could | ||
| // be to heartbeat every Nth query. Leaving as potential future | ||
| // low-hanging fruit if needed. | ||
| workspacestats.ActivityBumpWorkspace(ctx, logger.Named("activity_bump"), p.db, wsID.UUID, time.Time{}) |
There was a problem hiding this comment.
trackWorkspaceUsage now calls ActivityBumpWorkspace on every heartbeat, which adds a DB query per active chat tick even when no UPDATE is needed. If this becomes noisy at scale, an easy follow-up is to add a small in-memory throttle (per-workspace last bump attempt) so the CTE runs less frequently.
workspacestats.ActivityBumpWorkspaceintotrackWorkspaceUsageso the workspace build deadline is extended each time the chat heartbeat firesTestHeartbeatBumpsWorkspaceUsageverifying the deadline bump