Add Zvdota/Zvbdota draft by aswaterman · Pull Request #2618 · riscv/riscv-isa-manual

aswaterman · 2026-01-29T03:29:13Z

No description provided.

nibrunieAtSi5

LGTM

regarding the suggested RTO behavior

Rounding to Odd behavior in dot product mode

Rounding to odd (RTO) is not part of the IEEE-754 standard (at least not until and including revision 2019).

The version used for the dot product operation admits two divergences with the generally accepted definition:

When overflowing, an infinity result is returned (rather than the largest magnitude normal number) see Overflow

A zero result is always positive (+0) whatever the sign of the actual zero term(s) of the dot product sum

Note on the suggested definition of RTO for the RVBNA: the IEEE-754 has started some discussion to possibly introduce a binary roundTowardsOdd mode in the next revision. One of the proposal rounds overflows to the maxValue (aligned with the generally accepted definition of RTO) with the argument that this allows to distinguish arithmetic on infinities from overflow (this argument would apply more generally to any saturating arithmetic but applies to RTO because maxValue is generally odd while infinity is "even").
I am not sure the argument holds for a large sum of products (as is the case BDOT). In particular during the rounding of the products: since this is done before the accumulation, rounding very large products to maxValue could lead to very large cancellation during the accumulation and loss of information about the overflow altogether (while rounding products which overflow to infinities would lead most likely to a NaN result when overflowing products of opposite signs are summed, or to a NaN if all overflow share the same sign).

I believe this is a desirable behavior, but wouldn't mind other people sharing their opinion.

The definition of RTO used here should make no difference (with the generally accepted definition) when the product exponent is much smaller than the accumulation (e.g. accumulation of [O]FP8 or FP16 products with a FP32 accumulator). The difference becomes visible when there is a possibility for the products to overflow in the accumulation format (e.g. BF16 products with FP32 accumulation).

aswaterman · 2026-01-31T22:40:54Z

It seems like an anti-feature that a negative overflow result and a positive overflow result sum to zero. The behavior we’ve defined hews more closely to what you’d get if you performed the arithmetic without bulk normalization, using more conventional rounding like nearest-even. Following the principle of least surprise is a good thing. Our deliberate divergence remains sound, IMO.

The 754 committee’s potential choice to make round-to-odd saturate at the maximum normal value makes sense to me, since it helps with successive conversion steps. E.g. if you’re trying to convert from double to half, using the dynamic rounding mode, using a single-precision intermediate, then the first step mustn’t generate an infinity, since the dynamic rounding mode could have been towards the opposite infinity.

… operands (#2767) * [RVBNA] Generalizing the definition of RVBNA to non symmetric product operands RVBNA was initially descrbed with the same format for both left hand side and right hand side products. This is not a requirement and RISC-V specifications (e.g. VME) actually require the support of non-uniform formats. This patch generalizes the definition to a different format for left hand side and right hand side. * updating RVBNA figure with distinct p_l and p_r * Apply suggestions from code review Co-authored-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> Signed-off-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> * linting (pre-commit run removed trailing whitespace including in SVG) * Cleaning up power-of-two display number formatting * Refactoring bias handling in RVBNA pseudo-code * Incorporating asymetric source bias * Clarifying how result exponent is biased * Clarifying leading digit bit of product (the leading digit bit of the product with the max exponent may not be the leading digit of any product in the dot product). * * Clarifying no-need for unbiasing to evaluate reference product exponent * Giving hint as to which exponent value can be used for zero-value product --------- Signed-off-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com> Co-authored-by: Nicolas Brunie <nibrunie@gmail.com>

Xu-Dsus4 · 2026-03-24T08:17:40Z

src/zvdota.adoc

+----
+# Zvfwdota16bf BF16 dot-product instruction
+# altfmt=1
+vfwdota.vv vd, vs2, vs1, vm   # vd[0] += vs2 dot vs1


Hi @aswaterman,
I’d like to enquire whether, for this instruction, the intermediate result format is always BF16, except for the last step where the accumulated intermediate result is added to the FP32 scalar value from vd[0]. Is that correct?
In other words, based on my understanding, this instruction can be described with the following code:

function clause execute (VFWDOTA_VV(vs2, vs1, vd, vm)) = { foreach (i from vstart to VL-1) { if (vm == 0b1) | (v0[i] == 0b1) then let (bfloat16) op1 = get_velem(vs1, SEW=16, i); let (bfloat16) op2 = get_velem(vs2, SEW=16, i); let (bfloat16) tmp_result += bf16_mul(op1, op2); } let (float32) fcvt_result = bf16_to_f32(tmp_result); let (float32) op3 = get_velem(vd, 32, 0); let (float32) result = f32_add(fcvt_result, op3); set_velem(vd, EEW=32, 0, result); // tail element handling follows VTA RETIRE_SUCCESS }

I’m working on the Spike implementation and would appreciate your confirmation. Thanks!

Hi @Xu-Dsus4,

There's already a Spike implementation of these instructions that has been merged into the master branch. It is just that the names are out of date, since the ARC requested a rename from bdot to bdota. If you want to contribute a PR to riscv-opcodes and then to riscv-isa-sim to fix the extension and instruction names, I would be very grateful.

To answer your semantics question, the intermediate result has full precision, and it is then either fed to RVBNA or accumulated in FP32 (or a combination of both). An implementation that rounds the products back to BF16 is not correct.

Thanks for your reply, I noticed the implementation with old instruction names. I am glad to contribute to Spike. I will review and updated the Spike code related to the dot product instruction.

aswaterman force-pushed the zvbdot branch from c81d994 to fa07a1b Compare January 29, 2026 22:56

Add Zvdota/Zvbdota draft

03c4abb

aswaterman force-pushed the zvbdot branch from fa07a1b to 03c4abb Compare January 29, 2026 23:04

nibrunieAtSi5 approved these changes Jan 31, 2026

View reviewed changes

nibrunieAtSi5 mentioned this pull request Mar 19, 2026

[RVBNA] Generalizing the definition of RVBNA to non symmetric product operands #2767

Merged

Xu-Dsus4 reviewed Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Zvdota/Zvbdota draft#2618

Add Zvdota/Zvbdota draft#2618
aswaterman wants to merge 2 commits intomainfrom
zvbdot

aswaterman commented Jan 29, 2026

Uh oh!

nibrunieAtSi5 left a comment

Uh oh!

aswaterman commented Jan 31, 2026 •

edited

Loading

Uh oh!

Xu-Dsus4 Mar 24, 2026

Uh oh!

aswaterman Mar 24, 2026

Uh oh!

Xu-Dsus4 Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aswaterman commented Jan 29, 2026

Uh oh!

nibrunieAtSi5 left a comment

Choose a reason for hiding this comment

Uh oh!

aswaterman commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xu-Dsus4 Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

aswaterman Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Xu-Dsus4 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aswaterman commented Jan 31, 2026 •

edited

Loading