Skip to content

feat: heterogeneous Multi-Backend support (CUDA/ROCm) for Linux environments#545

Open
joanfgarcia wants to merge 5 commits intomicrosoft:mainfrom
joanfgarcia:main
Open

feat: heterogeneous Multi-Backend support (CUDA/ROCm) for Linux environments#545
joanfgarcia wants to merge 5 commits intomicrosoft:mainfrom
joanfgarcia:main

Conversation

@joanfgarcia
Copy link
Copy Markdown

Abstract:
This PR extends the official bitnet.cpp inference framework to support full GPU acceleration on Linux via the llama.cpp backend. It stabilizes the interaction between the BitNet frontend and the underlying ggml kernels, enabling seamless, high-performance execution on NVIDIA and AMD hardware.

Highlights:

  • GPU Integration: Bridges the BitNet frontend with the new I2_S dequantization kernels provided by the llama.cpp submodule.
  • Linux Stability: Resolves path-masking and environment issues when deploying on Arch and Ubuntu-based systems.
  • ROCm Compatibility: Enables BitNet inference on AMD Instinct and Radeon hardware via HIP/ROCm 6.x.
  • Proven Reliability: Operational stability confirmed in a local sovereign AI project.

Testing & Validation:
Validation was conducted across a diverse set of hardware (Mixed-GPU clusters). Local benchmark scripts can be found in tests/bitnet_sovereign_bench.py for verification.

References:

Acknowledgments:
Thank you for providing the community with these groundbreaking 1-bit LLM foundations. This contribution is offered with respect and professional commitment.

A Lannister always pays his debts.

This is a Red Pill project roadmap document, not a BitNet contribution.
Relocated to the parent project where it belongs.
Red Pill project benchmark, not a BitNet upstream contribution.
Summary of changes:
- Decoupled BitNet kernels from ggml.c into src/ggml-bitnet-mad.cpp
- Established Sovereign Bridge in include/ggml-bitnet.h
- Fixed CPU multi-threading telemetry (thread-aware debug)
- Resolved CPU SEGFAULT via static buffer and internal quantization
- Verified stable inference on Falcon3-10B-1.58b
@joanfgarcia
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant