Skip to content

[ci](fix) license check#62534

Open
hello-stephen wants to merge 1 commit intoapache:masterfrom
hello-stephen:0415-license-check
Open

[ci](fix) license check#62534
hello-stephen wants to merge 1 commit intoapache:masterfrom
hello-stephen:0415-license-check

Conversation

@hello-stephen
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor Author

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Copy Markdown
Contributor Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor Author

skip buildall

@zclllyybb
Copy link
Copy Markdown
Contributor

/review

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings

  1. : changed-file discovery now relies on . GitHub documents that endpoint as returning a maximum of 3000 files total, so and are truncated on very large PRs. In that case the incremental license config silently skips files after the cutoff, and the workflow can falsely pass unlicensed additions. The previous local path did not have this cap.
  2. : the previous implementation intentionally fell back to a full scan when changed-file discovery failed. Under the default Actions shell (), a transient failure now aborts the step before any fallback is written, turning a recoverable discovery failure into a hard CI failure.

Critical Checkpoint Conclusions

  • Goal of the change: fix license-check failures by adding missing headers to Cloud , exempting non-source artifacts, and tightening PR changed-file detection. The header and ignore-list updates look correct for the touched files, but the workflow rewrite does not fully achieve the goal because it can miss changed files on large PRs and is less robust on discovery failures.
  • Small, clear, focused: mostly yes, but the workflow change introduces a repo-wide CI behavior regression beyond the touched Cloud files.
  • Concurrency: not applicable for these YAML, CMake, and config-file changes.
  • Lifecycle and static initialization: not applicable.
  • Configuration items: no runtime configuration added.
  • Compatibility and incompatible changes: no FE/BE/storage compatibility impact.
  • Parallel code paths: the push and manual paths still run the full scan; the PR path now becomes less complete than those paths.
  • Special conditional checks: the changed-file filtering now depends on a REST API contract with a hard file-count limit and needs an uncapped or fallback approach.
  • Test coverage: I did not see automated coverage or validation for the new changed-file enumeration path.
  • Test result changes: none.
  • Observability: sufficient for this scope.
  • Transaction, persistence, data writes, and FE-BE variable passing: not applicable.
  • Performance: the new API approach is lighter than fetching the base ref, but correctness regresses; correctness should take priority here.
  • Other issues: I did not find problems in the added Cloud license headers or in the new ignore entries themselves.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking findings:

  1. .github/workflows/license-eyes.yml: changed-file discovery now relies on GET /pulls/{number}/files, which GitHub documents as returning at most 3000 files total even with pagination. Large PRs will silently skip license checks on files past that cap, and the previous fallback to a full .licenserc.yaml scan on enumeration failure is also gone.
  2. .licenserc.yaml: conf/ubsan_ignorelist.txt and extension/dbt-doris/dev-requirements.txt are Doris-owned text configs that support # comments. Ignoring them weakens header enforcement instead of fixing the missing headers in-place.

Critical checkpoints:

  • Goal of current task: Partially achieved. The PR adds missing ASF headers to Cloud CMakeLists.txt files and updates the workflow/config, but the new workflow regresses correctness for large PRs and the config broadens ignore coverage for first-party files.
  • Minimality/focus: Mostly focused on license-check remediation, but the two new ignores above are broader than necessary.
  • Concurrency: N/A for these CMake/CI/config changes; no concurrency-sensitive runtime code is touched.
  • Lifecycle/static initialization: N/A.
  • Configuration changes: Yes. The workflow/config changes are not fully safe because changed-file enumeration can now be incomplete or fail hard.
  • Compatibility/incompatible change: No runtime, storage, or FE/BE protocol compatibility impact.
  • Parallel code paths: The same pull_request_target license-check path is used for future PRs, so the changed-file enumeration regression affects all large PRs on that path.
  • Special conditional checks: The .licenserc.yaml shortcut itself is fine, but the underlying file list source is now bounded by the API limit.
  • Test coverage: No tests were added. For CI-only changes that can be acceptable, but there is no validation here for large-PR or API-failure behavior.
  • Test result changes: None.
  • Observability: No additional observability is needed; the main issue is correctness/reliability.
  • Transaction/persistence/data writes/FE-BE variable passing: N/A.
  • Performance: Acceptable; the concern is correctness and robustness, not runtime cost.
  • Other issues: None beyond the two findings above.

Requesting changes.

echo "config_file=.licenserc.yaml" >> "$GITHUB_OUTPUT"
exit 0
fi
all=$(gh api repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using GET /pulls/{number}/files here introduces a correctness regression. GitHub documents that this endpoint returns at most 3000 files total even with pagination, so on any larger PR all/acmr becomes incomplete and skywalking-eyes will silently skip license checking for files past that cap. It also removes the previous fallback to a full .licenserc.yaml scan when enumeration fails, because a transient gh api error now aborts the step under bash -e. Could we keep the local git diff "${base_sha}...HEAD" path (or another uncapped source) and only use the API as a fallback?

Comment thread .licenserc.yaml
@@ -96,15 +101,18 @@ header:
- "conf/mysql_ssl_default_certificate/client_certificate/ca.pem"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think these first-party text configs should be excluded from header checking. Both conf/ubsan_ignorelist.txt and extension/dbt-doris/dev-requirements.txt accept # comments, and comparable files in-tree already carry ASF headers (pytest/requirements.txt, tools/cost_model_evaluate/requirements.txt, extension/dbt-doris/tox.ini, etc.). Adding them to paths-ignore weakens license coverage instead of fixing the missing headers in-place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants