Open
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
d9cc61f to
4a353e3
Compare
Contributor
Author
|
/review |
Contributor
Author
|
run buildall |
Contributor
There was a problem hiding this comment.
Overall, I found no blocking issues in this PR.
Critical checkpoints:
- Goal / correctness: The change aims to defer Iceberg Parquet position-delete / deletion-vector loading until a row group actually survives row-group filtering. The implementation matches that goal:
IcebergTableReader::init_row_filters()now defers only the Parquet position-delete and DV paths, andParquetReader::_next_row_group_reader()executes the loader immediately before building the first surviving row-group reader. - Scope / minimality: The change is small and focused. It only adds a lazy loader hook in
ParquetReaderplus the Iceberg-side wiring needed to preserve delete-awareness before the loader runs. - Concurrency: I did not find a new concurrency risk. The deferred loader is per-reader state, runs on the scan path before row-group initialization, and does not introduce new lock ordering or shared mutable state beyond the existing KV-cache usage.
- Lifecycle / initialization: No special lifecycle or static-initialization hazard was introduced. The deferred callback is owned by the reader instance and cleared after successful execution.
- Configuration: No new config items.
- Compatibility: No FE/BE protocol, storage-format, or rolling-upgrade compatibility issue in this change.
- Parallel code paths: Equality deletes remain eager, and ORC keeps the existing eager behavior. That split is appropriate because only the Parquet position-delete / DV path can safely benefit from row-group filtering before delete rows are needed.
- Special condition checks: The single-DV validation is preserved, and the new eager-vs-deferred branches are documented clearly enough to explain why the behavior differs by delete type / file format.
- Test coverage: This is the main gap. I do not see added regression or unit coverage for the new lazy-loading behavior, so the PR currently relies on reasoning rather than a test that proves both cases: all row groups filtered => delete files are not read, and at least one row group survives => deletes are still applied correctly.
- Test result updates: None in this PR.
- Observability: Existing delete-file / delete-row profile counters remain in place. No additional observability looks required for this small optimization.
- Transaction / persistence: Not applicable; this is a read-path change only.
- Data writes / atomicity: Not applicable; no write-side logic changed.
- FE-to-BE variable passing: Not applicable; no new transmitted fields.
- Performance: The change should reduce unnecessary delete-file I/O when Parquet row-group pruning eliminates the file, and I did not find an obvious hot-path regression.
- Other issues: No additional blocking issues found in the surrounding call chain.
Residual risk:
- The correctness-sensitive delete path now depends on deferred execution, so adding targeted coverage would make this change much safer to maintain.
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)