Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
skip buildall |
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Findings
- : changed-file discovery now relies on . GitHub documents that endpoint as returning a maximum of 3000 files total, so and are truncated on very large PRs. In that case the incremental license config silently skips files after the cutoff, and the workflow can falsely pass unlicensed additions. The previous local path did not have this cap.
- : the previous implementation intentionally fell back to a full scan when changed-file discovery failed. Under the default Actions shell (), a transient failure now aborts the step before any fallback is written, turning a recoverable discovery failure into a hard CI failure.
Critical Checkpoint Conclusions
- Goal of the change: fix license-check failures by adding missing headers to Cloud , exempting non-source artifacts, and tightening PR changed-file detection. The header and ignore-list updates look correct for the touched files, but the workflow rewrite does not fully achieve the goal because it can miss changed files on large PRs and is less robust on discovery failures.
- Small, clear, focused: mostly yes, but the workflow change introduces a repo-wide CI behavior regression beyond the touched Cloud files.
- Concurrency: not applicable for these YAML, CMake, and config-file changes.
- Lifecycle and static initialization: not applicable.
- Configuration items: no runtime configuration added.
- Compatibility and incompatible changes: no FE/BE/storage compatibility impact.
- Parallel code paths: the push and manual paths still run the full scan; the PR path now becomes less complete than those paths.
- Special conditional checks: the changed-file filtering now depends on a REST API contract with a hard file-count limit and needs an uncapped or fallback approach.
- Test coverage: I did not see automated coverage or validation for the new changed-file enumeration path.
- Test result changes: none.
- Observability: sufficient for this scope.
- Transaction, persistence, data writes, and FE-BE variable passing: not applicable.
- Performance: the new API approach is lighter than fetching the base ref, but correctness regresses; correctness should take priority here.
- Other issues: I did not find problems in the added Cloud license headers or in the new ignore entries themselves.
There was a problem hiding this comment.
Blocking findings:
.github/workflows/license-eyes.yml: changed-file discovery now relies onGET /pulls/{number}/files, which GitHub documents as returning at most 3000 files total even with pagination. Large PRs will silently skip license checks on files past that cap, and the previous fallback to a full.licenserc.yamlscan on enumeration failure is also gone..licenserc.yaml:conf/ubsan_ignorelist.txtandextension/dbt-doris/dev-requirements.txtare Doris-owned text configs that support#comments. Ignoring them weakens header enforcement instead of fixing the missing headers in-place.
Critical checkpoints:
- Goal of current task: Partially achieved. The PR adds missing ASF headers to Cloud
CMakeLists.txtfiles and updates the workflow/config, but the new workflow regresses correctness for large PRs and the config broadens ignore coverage for first-party files. - Minimality/focus: Mostly focused on license-check remediation, but the two new ignores above are broader than necessary.
- Concurrency: N/A for these CMake/CI/config changes; no concurrency-sensitive runtime code is touched.
- Lifecycle/static initialization: N/A.
- Configuration changes: Yes. The workflow/config changes are not fully safe because changed-file enumeration can now be incomplete or fail hard.
- Compatibility/incompatible change: No runtime, storage, or FE/BE protocol compatibility impact.
- Parallel code paths: The same
pull_request_targetlicense-check path is used for future PRs, so the changed-file enumeration regression affects all large PRs on that path. - Special conditional checks: The
.licenserc.yamlshortcut itself is fine, but the underlying file list source is now bounded by the API limit. - Test coverage: No tests were added. For CI-only changes that can be acceptable, but there is no validation here for large-PR or API-failure behavior.
- Test result changes: None.
- Observability: No additional observability is needed; the main issue is correctness/reliability.
- Transaction/persistence/data writes/FE-BE variable passing: N/A.
- Performance: Acceptable; the concern is correctness and robustness, not runtime cost.
- Other issues: None beyond the two findings above.
Requesting changes.
| echo "config_file=.licenserc.yaml" >> "$GITHUB_OUTPUT" | ||
| exit 0 | ||
| fi | ||
| all=$(gh api repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files \ |
There was a problem hiding this comment.
Using GET /pulls/{number}/files here introduces a correctness regression. GitHub documents that this endpoint returns at most 3000 files total even with pagination, so on any larger PR all/acmr becomes incomplete and skywalking-eyes will silently skip license checking for files past that cap. It also removes the previous fallback to a full .licenserc.yaml scan when enumeration fails, because a transient gh api error now aborts the step under bash -e. Could we keep the local git diff "${base_sha}...HEAD" path (or another uncapped source) and only use the API as a fallback?
| @@ -96,15 +101,18 @@ header: | |||
| - "conf/mysql_ssl_default_certificate/client_certificate/ca.pem" | |||
There was a problem hiding this comment.
I do not think these first-party text configs should be excluded from header checking. Both conf/ubsan_ignorelist.txt and extension/dbt-doris/dev-requirements.txt accept # comments, and comparable files in-tree already carry ASF headers (pytest/requirements.txt, tools/cost_model_evaluate/requirements.txt, extension/dbt-doris/tox.ini, etc.). Adding them to paths-ignore weakens license coverage instead of fixing the missing headers in-place.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)