Add Qwen 3.5 MoE to cuda-perf CI and add prefill throughput tracking by Gasoonjia · Pull Request #18903 · pytorch/executorch

Gasoonjia · 2026-04-15T08:02:52Z

Add PyTorchObserver stats output to qwen3_5_moe runner (enables cuda_benchmark.py parsing), --prompt_file flag, and GPU memory stats
Add prefill_throughput metric to cuda_benchmark.py (prefill tok/s alongside existing decode tok/s)
Add Qwen3.5-35B-A3B-HQQ-INT4 to cuda-perf.yml with >1000 token prompt and 512 output tokens, on linux.aws.a100
Align cuda-perf.yml triggers with cuda.yml (push main/release, ciflow/cuda tags, PR on backends/cuda and backends/aoti paths)
Remove random model selection and schedule trigger; always run all models when triggered

pytorch-bot · 2026-04-15T08:02:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18903

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 25 Pending, 1 Unrelated Failure

As of commit 7336416 with merge base 28c56fe ():

NEW FAILURES - The following jobs have failed:

pull / android / run-emulator (gh)
The process '/usr/local/lib/android/sdk/platform-tools/adb' failed with exit code 224
trunk / test-torchao-huggingface-checkpoints (lfm2_5_1_2b, linux.arm64.2xlarge, executorch-ubuntu-22.04-g... / linux-job (gh)
RuntimeError: Command docker exec -t 13ba2a762a54b4ab0b39bc45b1d34ef72d70a86eae2221ebc18c40d409573f97 /exec failed with exit code 1
trunk / test-torchao-huggingface-checkpoints (phi_4_mini, linux.arm64.2xlarge, executorch-ubuntu-22.04-gc... / linux-job (gh)
RuntimeError: Command docker exec -t 95b0017e20b9e4ecc6ce1faa320d2d03c396f8c4fb553657fab4f574f3df43fb /exec failed with exit code 1
trunk / test-torchao-huggingface-checkpoints (qwen3_4b, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc1... / linux-job (gh)
RuntimeError: Command docker exec -t b02f1d8a2b04d3331eb654ef2bc0318964344cfeb83c22c0219816b090c02b92 /exec failed with exit code 1

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-15T08:03:38Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

- Add PyTorchObserver stats output to qwen3_5_moe runner (enables cuda_benchmark.py parsing), --prompt_file flag, and GPU memory stats - Add prefill_throughput metric to cuda_benchmark.py (prefill tok/s alongside existing decode tok/s) - Add Qwen3.5-35B-A3B-HQQ-INT4 to cuda-perf.yml with >1000 token prompt and 512 output tokens, on linux.aws.a100 - Align cuda-perf.yml triggers with cuda.yml (push main/release, ciflow/cuda tags, PR on backends/cuda and backends/aoti paths) - Remove random model selection and schedule trigger; always run all models when triggered

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2026

Gasoonjia temporarily deployed to upload-benchmark-results April 15, 2026 11:33 — with GitHub Actions Inactive

Gasoonjia force-pushed the cuda-perf-qwen35-moe branch from a23b5b1 to 49d0aa1 Compare April 15, 2026 18:46

Gasoonjia deployed to upload-benchmark-results April 15, 2026 21:00 — with GitHub Actions Active

fix 0 first token

36213f9

Gasoonjia force-pushed the cuda-perf-qwen35-moe branch from e50e3fa to 36213f9 Compare April 15, 2026 22:15

Merge branch 'main' into cuda-perf-qwen35-moe

7336416

Gasoonjia marked this pull request as ready for review April 15, 2026 22:15

Gasoonjia requested a review from lucylq as a code owner April 15, 2026 22:15

Gasoonjia requested a review from digantdesai April 15, 2026 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen 3.5 MoE to cuda-perf CI and add prefill throughput tracking#18903

Add Qwen 3.5 MoE to cuda-perf CI and add prefill throughput tracking#18903
Gasoonjia wants to merge 3 commits intomainfrom
cuda-perf-qwen35-moe

Gasoonjia commented Apr 15, 2026

Uh oh!

pytorch-bot bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gasoonjia commented Apr 15, 2026

Uh oh!

pytorch-bot bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18903

❌ 4 New Failures, 25 Pending, 1 Unrelated Failure

Uh oh!

github-actions bot commented Apr 15, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot bot commented Apr 15, 2026 •

edited

Loading

This PR needs a `release notes:` label