Skip to content

fix(coderd): bump workspace autostop deadline on chat heartbeat#23314

Merged
johnstcn merged 4 commits intomainfrom
cian/chat-heartbeat-bump-deadline
Mar 19, 2026
Merged

fix(coderd): bump workspace autostop deadline on chat heartbeat#23314
johnstcn merged 4 commits intomainfrom
cian/chat-heartbeat-bump-deadline

Conversation

@johnstcn
Copy link
Copy Markdown
Member

@johnstcn johnstcn commented Mar 19, 2026

  • Wire workspacestats.ActivityBumpWorkspace into trackWorkspaceUsage so the workspace build deadline is extended each time the chat heartbeat fires
  • Prevents mid-conversation autostop for chat workspaces
  • Updates TestHeartbeatBumpsWorkspaceUsage verifying the deadline bump

🤖 This PR was created with the help of Coder Agents, and will be reviewed by my human. 🧑‍💻

@johnstcn johnstcn self-assigned this Mar 19, 2026
Comment thread coderd/chatd/chatd.go Outdated
Comment thread coderd/chatd/chatd_test.go Outdated
trackWorkspaceUsage previously only called usageTracker.Add (dormancy
protection). It now also calls workspacestats.ActivityBumpWorkspace so
the workspace build deadline is extended each time the chat heartbeat
fires, preventing mid-conversation autostop.
@johnstcn johnstcn force-pushed the cian/chat-heartbeat-bump-deadline branch from a8cb2d2 to c2fa11f Compare March 19, 2026 18:54
Copy link
Copy Markdown
Member Author

Addressing both review comments:

RE: DB thrashing concern (chatd.go)

Fires every heartbeat (~30s, DefaultChatHeartbeatInterval). The SQL has a built-in guard — it only performs an actual UPDATE when 5% of the deadline has elapsed (activitybump.sql:83-84), so for a 1-hour bump the write only happens in the last ~3 minutes. The rest of the time it's a cheap no-op SELECT. This is the same cadence as the agent stats reporter. Added a comment on the call site noting this.

RE: Merge test into existing one

Done — rolled the deadline assertion into TestHeartbeatBumpsWorkspaceUsage. Enriched the workspace setup with a full build chain (template version, completed provisioner job, workspace build with near-expired deadline) so both last_used_at and build deadline bumps are verified in a single test.

🤖 This response was generated by Coder Agents.

Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 mafredri sent me to poke holes in this. I brought friends.

Good call wiring the deadline bump into the heartbeat, this is a real gap for chat-only workspaces. The implementation is clean: synchronous call in a serial goroutine, correct context derivation, and the SQL guards make concurrent bumps idempotent.

One blocker (P0), one suggestion (P2), two nits (P3), and a few observations across 4 inline comments.

🤖 Six reviewers walked in, one finding walked out alive. The rest were no-ops, like most of these heartbeat bumps.

Comment thread coderd/chatd/chatd.go Outdated
// so no prebuild guard is needed (unlike reporter.go).
//
// This fires every heartbeat (~30s) but the SQL only
// performs an UPDATE when 5% of the deadline has elapsed,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 ActivityBumpWorkspace will be denied by dbauthz in production (Gopher, Breaker, DBA)

The chatd subject (dbauthz.go:698-717) grants only policy.ActionRead on ResourceWorkspace, but querier.ActivityBumpWorkspace (dbauthz.go:1554-1558) calls update(), which enforces policy.ActionUpdate. Every heartbeat tick will:

  1. Run GetWorkspaceByID (the workspaces_expanded view, 4-way JOIN + 2 correlated subqueries)
  2. Fail the authz check
  3. Log at error level via activitybump.go:49-53

The test doesn't catch this because dbtestutil.NewDB(t) returns a raw database.Store without dbauthz wrapping.

At 1000 concurrent chats, that's ~2000 wasted view queries/min plus error-level log spam.

Fix options:

  1. Add policy.ActionUpdate to ResourceWorkspace in the chatd role. Check what else this unlocks.
  2. Use dbauthz.AsSystemRestricted(ctx) scoped to just this call, with a //nolint:gocritic explaining why.
  3. Add a narrower permission if the RBAC system supports it.

Whichever route, adding a dbauthz-aware integration test (or at least a unit test with the wrapped store) would catch regressions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch. I wonder if it's worth wrapping with dbauthz by default.

Comment thread coderd/chatd/chatd.go
// TemplateScheduleStore here. The activity bump logic
// defaults to the template's activity_bump duration
// (typically 1 hour). Chat workspaces are never prebuilds,
// so no prebuild guard is needed (unlike reporter.go).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 "no-op SELECTs" overstates it (DBA, Scribe)

The 5% guard prevents the write (no row lock, no WAL entry), but the CTE still runs a 4-table join (workspace_builds → provisioner_jobs → workspaces → templates) with interval arithmetic on every call. It's fast (index scan + 3 PK lookups), but calling it a "no-op SELECT" is misleading.

Suggestion: something like "most calls perform a read-only CTE lookup with no UPDATE" would be more accurate.

PS. Not a performance concern at current scale, just a documentation accuracy thing.

Valid: true,
Time: dbtime.Now().Add(-30 * time.Minute),
},
})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 WorkspaceResource creation is unnecessary (Tester)

The activitybump.sql query joins only workspace_builds → provisioner_jobs → workspaces → templates. No join on workspace_resources. This line creates a row nothing reads. Harmless but cargo-culted, could confuse future readers into thinking it's a precondition.

})
ws := dbgen.Workspace(t, db, database.WorkspaceTable{
OwnerID: user.ID,
OrganizationID: org.ID,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[obs] Template activity_bump relies on implicit DB column default (Tester)

The test never explicitly sets activity_bump on the template. It works because the column default is 3600000000000 (1 hour, from migration 000190). Compare with activitybump_test.go which explicitly calls UpdateTemplateScheduleByID. Not a bug, just an implicit coupling that could silently break if the default ever changed.

…y bump

Address review feedback from @mafredri:
- P0: Add policy.ActionUpdate to the chatd RBAC role so
  ActivityBumpWorkspace passes dbauthz checks in production.
- P2: Reword comment — the CTE still runs joins on every call,
  only the UPDATE is guarded by the 5% threshold.
- P3: Remove unnecessary WorkspaceResource creation in test.
- obs: Explicitly set template activity_bump instead of relying
  on implicit DB column default.
Copy link
Copy Markdown
Member Author

Pushed f05deea addressing all of @mafredri's feedback:

P0 — dbauthz denial: Added policy.ActionUpdate to the chatd RBAC role for ResourceWorkspace (not AsSystemRestricted). Updated TestAsChatd to expect Update allowed / Delete still denied.

P2 — comment wording: Reworded to "most calls perform a read-only CTE lookup with no UPDATE" — no longer claims "no-op SELECTs".

P3 — WorkspaceResource: Removed. The activitybump.sql query doesn't join workspace_resources.

obs — implicit activity_bump: Added explicit UpdateTemplateScheduleByID with ActivityBump: int64(time.Hour) instead of relying on the migration default.

🤖 This response was generated by Coder Agents.

Pass a dbauthz-wrapped store to the chatd server so the AsChatd
role is enforced on every query, matching production. Confirmed
this catches the missing ActionUpdate permission: reverting it
locally causes the test to fail with rbac: forbidden on every
heartbeat tick.
@johnstcn johnstcn changed the title fix(coderd/chatd): bump workspace autostop deadline on chat heartbeat fix(coderd): bump workspace autostop deadline on chat heartbeat Mar 19, 2026
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 mafredri's neural network weights are satisfied.

All four findings addressed:

  • P0 authz: chatd role now grants ActionUpdate on ResourceWorkspace, test wraps the DB with dbauthz and uses AsChatd context. This is the broadest fix (chatd can now also UpdateWorkspace, FavoriteWorkspace, etc. from an RBAC perspective), but the chatd code itself only calls ActivityBumpWorkspace, so the practical scope is fine.
  • P2 comment: reworded to "writes when 5% of the deadline has elapsed — most calls perform a read-only CTE lookup with no UPDATE."
  • P3 WorkspaceResource: removed.
  • [obs] implicit activity_bump: now explicitly set via UpdateTemplateScheduleByID.

🤖 Hyvää työtä. Four findings in, four fixes out, nothing left to argue about.

Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Human approved, too!

@johnstcn johnstcn marked this pull request as ready for review March 19, 2026 21:52
Copilot AI review requested due to automatic review settings March 19, 2026 21:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents chat-associated workspaces from auto-stopping mid-conversation by extending the workspace build deadline on each chat heartbeat, reusing the existing workspace activity-bump mechanism.

Changes:

  • Call workspacestats.ActivityBumpWorkspace from chatd.Server.trackWorkspaceUsage (invoked on heartbeat) to extend the autostop deadline.
  • Expand the chatd dbauthz subject permissions to include workspace:update, which ActivityBumpWorkspace requires.
  • Extend the existing chat heartbeat test to validate both last_used_at bumping and autostop deadline bumping, using a dbauthz-wrapped DB for production-like enforcement.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
coderd/chatd/chatd.go Adds autostop-deadline bumping to the heartbeat usage tracking path.
coderd/chatd/chatd_test.go Enhances heartbeat test coverage to assert workspace build deadline extension; wraps DB with dbauthz to enforce AsChatd permissions.
coderd/database/dbauthz/dbauthz.go Grants chatd subject workspace:update to authorize activity bump updates.
coderd/database/dbauthz/dbauthz_test.go Updates AsChatd authorization tests to reflect the new allowed/denied workspace actions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread coderd/chatd/chatd_test.go
Comment thread coderd/chatd/chatd.go
Comment on lines +2389 to +2394
//
// Scaling note: for 10,000 active chats, this could lead to
// approx. 333 CTE queries/second. A cheap fix for this could
// be to heartbeat every Nth query. Leaving as potential future
// low-hanging fruit if needed.
workspacestats.ActivityBumpWorkspace(ctx, logger.Named("activity_bump"), p.db, wsID.UUID, time.Time{})
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trackWorkspaceUsage now calls ActivityBumpWorkspace on every heartbeat, which adds a DB query per active chat tick even when no UPDATE is needed. If this becomes noisy at scale, an easy follow-up is to add a small in-memory throttle (per-workspace last bump attempt) so the CTE runs less frequently.

Copilot uses AI. Check for mistakes.
@johnstcn johnstcn merged commit 2f50e89 into main Mar 19, 2026
39 checks passed
@johnstcn johnstcn deleted the cian/chat-heartbeat-bump-deadline branch March 19, 2026 22:07
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants