[Opt](func) Improve AggFunc Percentile performance by linrrzqqq · Pull Request #62520 · apache/doris

linrrzqqq · 2026-04-15T07:18:27Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Key Optimizations

Reworked exact percentile state into a simpler values + PercentileLevels + inited_flag layout.
Removed the old Counts-based exact percentile path and replaced it with a lightweight PercentileLevels metadata structure.
Reduced add-path overhead by directly appending values into a contiguous buffer and initializing percentile levels only once.
Optimized multi-quantile finalize(percentile_array) by sorting quantiles through permutation once, then reusing incremental nth_element selection.

Performance

before:

Doris> select percentile(FUniqID, 0.91) from hits_100m;
+---------------------------+
| percentile(FUniqID, 0.91) |
+---------------------------+
|     8.749975206154981e+18 |
+---------------------------+
1 row in set (6.149 sec)

Doris> select percentile_array(FUniqID, [0.12, 0.23, 0.34, 0.45, 0.56]) from hits_100m;
+------------------------------------------------------------------------------------------------+
| percentile_array(FUniqID, [0.12, 0.23, 0.34, 0.45, 0.56])                                      |
+------------------------------------------------------------------------------------------------+
| [0, 5.097819197233503e+18, 5.68906185719365e+18, 6.281280528073601e+18, 6.869583632113269e+18] |
+------------------------------------------------------------------------------------------------+
1 row in set (28.984 sec)

Doris> select percentile_array(FUniqID, [0.12, 0.23, 0.34, 0.45, 0.56, 0.67, 0.78, 0.89, 0.91]) from hits_100m;
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[E-3113]string column length is too large: total_length=7200000149, element_number=0, rows=0

now

Doris> select percentile(FUniqID, 0.91) from hits_100m;
+---------------------------+
| percentile(FUniqID, 0.91) |
+---------------------------+
|     8.749975206154981e+18 |
+---------------------------+
1 row in set (2.107 sec)

Doris> select percentile_array(FUniqID, [0.12, 0.23, 0.34, 0.45, 0.56]) from hits_100m;
+------------------------------------------------------------------------------------------------+
| percentile_array(FUniqID, [0.12, 0.23, 0.34, 0.45, 0.56])                                      |
+------------------------------------------------------------------------------------------------+
| [0, 5.097819197233503e+18, 5.68906185719365e+18, 6.281280528073601e+18, 6.869583632113269e+18] |
+------------------------------------------------------------------------------------------------+
1 row in set (5.903 sec)

Doris> select percentile_array(FUniqID, [0.12, 0.23, 0.34, 0.45, 0.56, 0.67, 0.78, 0.89, 0.91]) from hits_100m;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| percentile_array(FUniqID, [0.12, 0.23, 0.34, 0.45, 0.56, 0.67, 0.78, 0.89, 0.91])                                                                                                          |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [0, 5.097819197233503e+18, 5.68906185719365e+18, 6.281280528073601e+18, 6.869583632113269e+18, 7.459044216605808e+18, 8.044902013692362e+18, 8.640523572825436e+18, 8.749975206154981e+18] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (8.866 sec)

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

Thearas · 2026-04-15T07:18:32Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

linrrzqqq · 2026-04-15T07:24:53Z

run buildall

hello-stephen · 2026-04-15T09:51:37Z

BE Regression && UT Coverage Report

Increment line coverage 82.76% (192/232) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.68% (27430/37227)
Line Coverage	57.31% (296122/516680)
Region Coverage	54.60% (246968/452351)
Branch Coverage	56.23% (106962/190230)

linrrzqqq · 2026-04-15T09:53:03Z

/review

github-actions

Requesting changes because this PR introduces a blocking compatibility regression in the serialized state format for percentile / percentile_array.

Critical checkpoint conclusions:

Goal of current task: Improve exact percentile aggregate performance. The new code likely improves the hot path by appending raw values and reusing ordered quantile selection, but the change is not safe to merge as-is because it breaks previously serialized aggregate states. The added unit tests cover only the new helper and do not prove compatibility or end-to-end correctness for persisted / exchanged agg states.
Scope and focus: The implementation is reasonably focused on the percentile aggregate, but it also changes the on-wire / on-disk representation of aggregate states by replacing Counts serialization with PercentileLevels + raw values serialization.
Concurrency: I did not find a new thread-safety issue here. This code runs on per-aggregate state and does not add shared mutable state or locking.
Lifecycle / static initialization: No special lifecycle or static-init issue found.
Configuration: No new configuration added.
Compatibility / storage format: Blocking issue. PercentileState::write/read now use a different binary layout, but there is no be_exec_version gate, fallback reader, or other compatibility handling. AggState columns persist aggregate bytes and later reconstruct them via deserialize(), so previously stored percentile states can no longer be read correctly. Mixed-version exchange/merge of serialized states is also at risk for the same reason.
Parallel code paths: The exact percentile path was updated consistently for both scalar and array forms; no separate unmodified exact-percentile path stood out.
Special conditional checks: Quantile validation and null checks are still present and understandable.
Test coverage: Insufficient for this change. The removed Counts test is not replaced with tests that exercise AggregateFunctionPercentile serialization/deserialization, agg-state persistence, or compatibility with old serialized bytes.
Test result changes: No result files involved.
Observability: No additional observability appears necessary for this perf optimization.
Transaction / persistence impact: Yes. This affects persisted agg-state bytes used by storage readers and block readers, and the compatibility path is missing.
Data writes / modifications: No table-write atomicity issue found, but persisted aggregate-state data correctness is affected by the format change above.
FE/BE variable passing: Not applicable.
Performance: The intended optimization is reasonable, but compatibility must be preserved.
Other issues: None beyond the blocking serialization compatibility gap.

Because this is a format-affecting change without backward-compatibility handling or tests, I cannot approve it in the current form.

github-actions · 2026-04-15T10:00:18Z

-            counts.serialize(buf);
+
+        levels.write(buf);
+        size_t size = values.size();


This changes the serialized layout of PercentileState in a backward-incompatible way.

Before this PR, the state was encoded as:
inited_flag + quantile list + serialized Counts objects

After this PR, it becomes:
inited_flag + quantile list + raw value count + raw values

percentile / percentile_array states are not just transient in-memory objects. They are also persisted and reloaded through AggState (DataTypeAggState -> storage reader -> deserialize()), and can be exchanged between BEs as serialized aggregate states. With the new read() implementation, any previously stored state bytes will interpret the old Counts payload as the new size_t size field and then read the wrong number of bytes into values.

I don't see any be_exec_version bump/gate, dual-format reader, or other compatibility path here, so this looks like a storage / wire-format break rather than a pure internal refactor. Please either preserve the old format, or add explicit versioned compatibility handling plus tests that cover reading old serialized percentile states.

linrrzqqq force-pushed the opt-percentile branch from 0744d49 to 809d8bc Compare April 15, 2026 07:22

[Opt](func) Improve AggFunc Percentile performance

a1a7e42

linrrzqqq force-pushed the opt-percentile branch from 809d8bc to a1a7e42 Compare April 15, 2026 09:48

github-actions bot requested changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Opt](func) Improve AggFunc Percentile performance#62520

[Opt](func) Improve AggFunc Percentile performance#62520
linrrzqqq wants to merge 1 commit intoapache:masterfrom
linrrzqqq:opt-percentile

linrrzqqq commented Apr 15, 2026 •

edited

Loading

Uh oh!

Thearas commented Apr 15, 2026

Uh oh!

linrrzqqq commented Apr 15, 2026

Uh oh!

hello-stephen commented Apr 15, 2026

Uh oh!

linrrzqqq commented Apr 15, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

linrrzqqq commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Key Optimizations

Performance

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Thearas commented Apr 15, 2026

Uh oh!

linrrzqqq commented Apr 15, 2026

Uh oh!

hello-stephen commented Apr 15, 2026

BE Regression && UT Coverage Report

Uh oh!

linrrzqqq commented Apr 15, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linrrzqqq commented Apr 15, 2026 •

edited

Loading