Skip to content

Add nightly CI for optional-dependency testing (PyTorch, numba-cuda)#1987

Draft
leofang wants to merge 9 commits intoNVIDIA:mainfrom
leofang:ci-nightly-optdeps
Draft

Add nightly CI for optional-dependency testing (PyTorch, numba-cuda)#1987
leofang wants to merge 9 commits intoNVIDIA:mainfrom
leofang:ci-nightly-optdeps

Conversation

@leofang
Copy link
Copy Markdown
Member

@leofang leofang commented Apr 29, 2026

Add a nightly CI pipeline that tests cuda-python wheels against optional dependencies (PyTorch and numba-cuda) without rebuilding wheels. Wheels are downloaded from the latest successful CI run on main.

Design

  • ci-nightly.yml: New orchestrator workflow (2 AM UTC daily + workflow_dispatch for manual testing). Finds the latest successful CI run on main and passes its run-id to the existing test workflows.
  • test-wheel-linux/windows.yml: Extended with two new inputs:
    • run-id: enables actions/download-artifact to pull wheels from a different workflow run (defaults to github.run_id for backward compatibility)
    • test-mode: standard (default, current behavior), nightly-pytorch, or nightly-numba-cuda
  • test-matrix.yml: New nightly: entries with a MODE field. The orchestrator uses the existing matrix_filter input to select by mode.
  • run-tests: New nightly-install mode that installs all wheels without running standard tests.

Test matrix (14 jobs)

PyTorch (8 jobs: 4 linux-64 + 4 win-64)

TORCH_VER TORCH_CUDA CUDA_VER Platform
latest cu126 12.9.1 linux, windows
latest cu130 13.2.1 linux, windows
2.9.1 cu126 12.9.1 linux, windows
2.9.1 cu130 13.2.1 linux, windows

Tests: cuda_core/tests/test_utils.py (SMV/DLPack interop) + cuda_core/tests/example_tests/ (pytorch_example)

numba-cuda (6 jobs: 2 linux-64 + 2 linux-aarch64 + 2 win-64)

CUDA_VER Platforms
12.9.1 linux-64, linux-aarch64, win-64
13.2.1 linux-64, linux-aarch64, win-64

Tests: python -m numba_cuda.numba.cuda.tests (numba-cuda's bundled test suite)

How to test this PR

  1. Merge or push to branch
  2. Go to Actions → "CI: Nightly optional-deps" → "Run workflow"
  3. Optionally supply a specific run-id from a recent successful CI run

Standard CI (ci.yml) is unaffected — test-mode defaults to standard and run-id defaults to github.run_id.

-- Leo's bot

…ba-cuda)

Add ci-nightly.yml that downloads wheels from the latest successful CI
run on main and tests them against PyTorch and numba-cuda, without
rebuilding.

Key changes:
- ci-nightly.yml: new orchestrator (schedule 2 AM UTC + workflow_dispatch)
- test-wheel-linux/windows.yml: add run-id input for cross-run artifact
  downloads, and test-mode input (standard/nightly-pytorch/nightly-numba-cuda)
  with conditional test steps
- ci/test-matrix.yml: add nightly entries with MODE field (4 pytorch +
  6 numba-cuda across linux-64, linux-aarch64, win-64)
- ci/tools/run-tests: add nightly-install mode that installs all wheels
  without running standard tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Apr 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the CI/CD CI/CD infrastructure label Apr 29, 2026
@leofang leofang self-assigned this Apr 29, 2026
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 4aadce2

leofang and others added 2 commits April 29, 2026 02:33
- Add concurrency group matching ci.yml's pattern
- Replace jq one-liner with explicit cancelled/failure checks per
  ci.yml's battle-tested pattern (see long comment there for rationale)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove before merging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test ac5238c

leofang and others added 2 commits April 29, 2026 02:39
Full history is not needed — we only read ci/versions.yml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Artifact names embed the commit SHA from the build that created them.
When the nightly workflow downloads artifacts from a different CI run,
it must use that run's SHA — not github.sha (the nightly run's own
SHA) — to construct the correct artifact names.

- ci-nightly.yml: resolve head_sha from the source CI run via
  `gh run view --json headSha`, pass it to test workflows
- test-wheel-linux/windows.yml: add `sha` input (defaults to
  github.sha for backward compatibility), use it in env-vars

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 9286598

@github-actions
Copy link
Copy Markdown

@leofang leofang force-pushed the ci-nightly-optdeps branch from a279179 to 8720de0 Compare April 29, 2026 03:26
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 8720de0

@leofang leofang force-pushed the ci-nightly-optdeps branch from 8720de0 to 6976f8a Compare April 29, 2026 03:47
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 6976f8a

- Install ALL wheels (pathfinder + bindings + core) and optional dep
  (torch/numba-cuda) in a single pip call so pip resolves everything
  together and avoids costly reinstall cycles from version conflicts
- Fix "Display structure" step: show only artifact files (cuda_python*.whl,
  cuda_pathfinder/) instead of ls -lahR . which lists the entire repo
- Fix numba-cuda test command: python -m numba.runtests numba.cuda.tests
- Install Visual C++ Redistributable on Windows before PyTorch
  (pytorch/pytorch#166628)
- run-tests now does pip list at the end of nightly installs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang leofang force-pushed the ci-nightly-optdeps branch from 6976f8a to 0b7cc50 Compare April 29, 2026 03:54
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test fc1dc5d

@leofang leofang force-pushed the ci-nightly-optdeps branch from fc1dc5d to 3653e7a Compare April 29, 2026 04:21
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 3653e7a

@leofang leofang force-pushed the ci-nightly-optdeps branch from 3653e7a to bf62c2b Compare April 29, 2026 05:06
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test bf62c2b

@leofang leofang force-pushed the ci-nightly-optdeps branch 3 times, most recently from 0c98a26 to 5d653ce Compare April 29, 2026 05:38
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 5d653ce

@leofang leofang force-pushed the ci-nightly-optdeps branch 3 times, most recently from 24ea333 to 8586cf7 Compare April 29, 2026 06:00
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 8586cf7

@leofang leofang force-pushed the ci-nightly-optdeps branch 2 times, most recently from 8586cf7 to 6953cdd Compare April 29, 2026 13:47
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 6953cdd

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 294eee4

@leofang leofang force-pushed the ci-nightly-optdeps branch from 294eee4 to 4f409a7 Compare April 29, 2026 17:54
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test 4f409a7

CUDA_VER in the test environment should match TORCH_CUDA in
major.minor. BUILD_CUDA_VER (from build-ctk-ver input) is used
for artifact names, so CUDA_VER can differ.

- cu126 → CUDA_VER: 12.6.3 (was 12.9.1)
- cu130 → CUDA_VER: 13.0.2 (was 13.2.1)

For CUDA 12 entries, USE_BACKPORT_BINDINGS kicks in automatically
since BUILD_CUDA_MAJOR (13) \!= TEST_CUDA_MAJOR (12), pulling
bindings from the backport branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@leofang leofang force-pushed the ci-nightly-optdeps branch from 4f409a7 to edeaa76 Compare April 29, 2026 18:53
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

/ok to test edeaa76

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Apr 29, 2026

The failing numba-cuda tests will be fixed by a new release (where the fix is included, NVIDIA/numba-cuda#873).

The failing PyTorch tests will be fixed by #1988.

Comment thread .github/workflows/test-wheel-windows.yml Outdated
Comment thread .github/workflows/test-wheel-linux.yml Outdated
Comment thread .github/workflows/test-wheel-windows.yml Outdated
Comment thread ci/tools/patch-numba-cuda
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove this script after a new numba-cuda is out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant