-
Notifications
You must be signed in to change notification settings - Fork 1k
Pull requests: mistralai/mistral-inference
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: pass explicit device to attention mask creation in cache
#267
opened Mar 2, 2026 by
abdelhadi703
Loading…
fix: move length_tensor to CUDA before NCCL broadcast in distributed inference
#266
opened Mar 2, 2026 by
abdelhadi703
Loading…
fix(readme): correct broken vLLM link in deploy section
#264
opened Mar 2, 2026 by
abdelhadi703
Loading…
Fix a broken Dockerfile and reduce the final image size using a multi-stage build
#263
opened Feb 21, 2026 by
framsouza
Loading…
Fix NCCL broadcast error on CPU tensors in distributed inference
#257
opened Oct 1, 2025 by
Pratham-Nayak1
Loading…
feat(model-service): add OpenAI-compatible wrapper (+ pm2 + env example) and update ignores
#254
opened Aug 27, 2025 by
MCVelasquez45
Loading…
Optimize main.py for inference efficiency and GPU throughput (torch.compile, memory tuning, warp alignment)
#253
opened Aug 3, 2025 by
abdullatifcodes
Loading…
Fix: Proper JSON chunk handling in streaming response (OpenRouter API)
#248
opened Jul 4, 2025 by
ktdjiren
Loading…
[fix] Correctly pass mask in TransformerBlock.forward in transformer_layers.py
#218
opened Sep 18, 2024 by
MarcSzafraniec
Loading…
Fix device error when using cuda device other than cuda:0
#216
opened Aug 28, 2024 by
cornzz
Contributor
Loading…
Correct grammatical error in markdown cells
#181
opened Jun 16, 2024 by
CharlesCNorton
Contributor
Loading…
fix: typo in ModelArgs dataclass definition
#177
opened Jun 6, 2024 by
CharlesCNorton
Contributor
Loading…
Fix: grammar in installation instructions
#170
opened Jun 4, 2024 by
CharlesCNorton
Contributor
Loading…
fix(README.md): correct verb agreement in model support statement
#166
opened May 30, 2024 by
CharlesCNorton
Contributor
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2026-04-10.