Skip to content

[fix](column) Fix incorrect for-loop#62517

Open
Gabriel39 wants to merge 4 commits intoapache:masterfrom
Gabriel39:fix_0415
Open

[fix](column) Fix incorrect for-loop#62517
Gabriel39 wants to merge 4 commits intoapache:masterfrom
Gabriel39:fix_0415

Conversation

@Gabriel39
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Fix memory corruption in PredicateColumnType::insert_duplicate_fields for string types

Fixed a bug in the pointer arithmetic when inserting duplicate string values. The original code used dst += i * str.size() which incorrectly multiplied by the loop index, causing:

  • Iteration 0: no advance (correct by accident)
  • Iteration 1: advance by 1×size (correct)
  • Iteration 2: advance by 2×size (wrong - overwrites beyond allocated memory)

This resulted in memory corruption and incorrect string data when inserting more than 2 duplicates.

The fix changes to dst += str.size() to correctly advance the pointer by one string length per iteration.

Added unit tests covering:

  • String type with multiple duplicate insertions
  • Integer type duplicates
  • LargeInt (Int128) type duplicates

Introduced by #60530

==13605==ERROR: AddressSanitizer: use-after-poison on address 0x7d639d26bc30 at pc 0x55e8b71df91e bp 0x7b4e8fc1e510 sp 0x7b4e8fc1dcd0
WRITE of size 6 at 0x7d639d26bc30 thread T620 (ls_normal [work)
#0 0x55e8b71df91d in __asan_memcpy (/mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be+0x2205791d)
#1 0x55e8cce8b1ab in doris::PredicateColumnType<(doris::PrimitiveType)23>::insert_duplicate_fields(doris::Field const&, unsigned long) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/core/column/predicate_column.h:327:17
#2 0x55e8b7db4e83 in doris::ColumnNullable::insert_duplicate_fields(doris::Field const&, unsigned long) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/core/column/column_nullable.cpp:357:29
#3 0x55e8ccf8ba2c in doris::segment_v2::DefaultValueColumnIterator::_insert_many_default(doris::COWdoris::IColumn::mutable_ptrdoris::IColumn&, unsigned long) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/column_reader.cpp:2167:14
#4 0x55e8ccf3c705 in doris::segment_v2::DefaultValueColumnIterator::next_batch(unsigned long*, doris::COWdoris::IColumn::mutable_ptrdoris::IColumn&, bool*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/column_reader.cpp:2152:5
#5 0x55e8cd4f9313 in doris::segment_v2::ColumnIterator::next_batch(unsigned long*, doris::COWdoris::IColumn::mutable_ptrdoris::IColumn&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/column_reader.h:330:16
#6 0x55e8cd4f9313 in doris::segment_v2::SegmentIterator::_read_columns_by_index(unsigned int, unsigned short&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/segment_iterator.cpp:2291:13
#7 0x55e8cd50621b in doris::segment_v2::SegmentIterator::_next_batch_internal(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/segment_iterator.cpp:2643:5
#8 0x55e8cd4a0b82 in doris::segment_v2::SegmentIterator::next_batch(doris::Block*)::$_0::operator()() const /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/segment_iterator.cpp:2519:9
#9 0x55e8cd4a0b82 in doris::segment_v2::SegmentIterator::next_batch(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/segment_iterator.cpp:2518:19
#10 0x55e8cd37cca1 in doris::segment_v2::LazyInitSegmentIterator::next_batch(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/lazy_init_segment_iterator.h:46:33
#11 0x55e8cccc0b05 in doris::Status doris::BetaRowsetReader::_next_batchdoris::Block(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/rowset/beta_rowset_reader.h:113:35
#12 0x55e8ccc98668 in doris::BetaRowsetReader::next_batch(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/rowset/beta_rowset_reader.h:56:55
#13 0x55e8ca62ee8f in doris::VCollectIterator::Level0Iterator::_refresh() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.h
#14 0x55e8ca611036 in doris::VCollectIterator::Level0Iterator::refresh_current_row() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.cpp:523:24
#15 0x55e8ca601a08 in doris::VCollectIterator::Level0Iterator::ensure_first_row_ref() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.cpp:496:14
#16 0x55e8ca604ea2 in doris::VCollectIterator::Level1Iterator::ensure_first_row_ref() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.cpp:713:27
#17 0x55e8ca608b20 in doris::VCollectIterator::build_heap(std::vector<std::shared_ptrdoris::RowsetReader, std::allocator<std::shared_ptrdoris::RowsetReader>>&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.cpp:189:9
#18 0x55e8ca5dc1ae in doris::BlockReader::_init_collect_iter(doris::TabletReader::ReaderParams const&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/block_reader.cpp:152:9
#19 0x55e8ca5d7b9d in doris::BlockReader::init(doris::TabletReader::ReaderParams const&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/block_reader.cpp:226:19
#20 0x55e8d02df6d6 in doris::OlapScanner::_open_impl(doris::RuntimeState*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/olap_scanner.cpp:317:32
#21 0x55e8d02cd697 in doris::Scanner::open(doris::RuntimeState*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner.h:80:16
#22 0x55e8d02b576a in doris::ScannerScheduler::_scanner_scan(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner_scheduler.cpp:185:5
#23 0x55e8d02bf603 in doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner_scheduler.cpp:75:17
#24 0x55e8d02bf603 in doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()::operator()() const /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner_scheduler.cpp:74:27
#25 0x55e8d02bf603 in bool std::__invoke_impl<bool, doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&>(std::__invoke_other, doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&) /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:63:14
#26 0x55e8d02bf603 in std::enable_if<is_invocable_r_v<bool, doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&>, bool>::type std::__invoke_r<bool, doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&>(doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&) /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:116:9
#27 0x55e8d02bf603 in std::_Function_handler<bool (), doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()>::_M_invoke(std::_Any_data const&) /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292:9
#28 0x55e8d02bed85 in std::function<bool ()>::operator()() const /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:593:9
#29 0x55e8d02bed85 in doris::ScannerSplitRunner::process_for(std::chrono::duration<long, std::ratio<1l, 1000000000l>>) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner_scheduler.cpp:425:25
#30 0x55e8d02aa844 in doris::PrioritizedSplitRunner::process() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/task_executor/time_sharing/prioritized_split_runner.cpp:102:35
#31 0x55e8d0272674 in doris::TimeSharingTaskExecutor::_dispatch_thread() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/task_executor/time_sharing/time_sharing_task_executor.cpp:556:77
#32 0x55e8d3fa8ad1 in std::function<void ()>::operator()() const /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:593:9
#33 0x55e8d3fa8ad1 in doris::Thread::supervise_thread(void*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/util/thread.cpp:460:5
#34 0x55e8b71ddd26 in asan_thread_start(void*) (/mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be+0x22055d26)
#35 0x7f539b835ac2 in start_thread nptl/pthread_create.c:442:8
#36 0x7f539b8c78cf misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

0x7d639d26bc30 is located 816 bytes inside of 4096-byte region [0x7d639d26b900,0x7d639d26c900)
allocated by thread T620 (ls_normal [work) here:
#0 0x55e8b71e1b34 in malloc (/mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be+0x22059b34)
#1 0x55e8b78d2140 in doris::DefaultMemoryAllocator::malloc(unsigned long) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/core/allocator.h:90:55
#2 0x55e8b78d2140 in doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, true>::alloc(unsigned long, unsigned long) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/core/allocator.cpp:337:23
#3 0x55e8b7ccdfaf in doris::Arena::Chunk::Chunk(unsigned long, doris::Arena::Chunk*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/core/arena.h:59:63
#4 0x55e8b7ccdad9 in doris::Arena::_init_head_if_needed() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/core/arena.h:131:24
#5 0x55e8b7ccd758 in doris::Arena::alloc(unsigned long) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/core/arena.h:148:9
#6 0x55e8cce8b118 in doris::PredicateColumnType<(doris::PrimitiveType)23>::insert_duplicate_fields(doris::Field const&, unsigned long) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/core/column/predicate_column.h:325:32
#7 0x55e8b7db4e83 in doris::ColumnNullable::insert_duplicate_fields(doris::Field const&, unsigned long) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/core/column/column_nullable.cpp:357:29
#8 0x55e8ccf8ba2c in doris::segment_v2::DefaultValueColumnIterator::_insert_many_default(doris::COWdoris::IColumn::mutable_ptrdoris::IColumn&, unsigned long) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/column_reader.cpp:2167:14
#9 0x55e8ccf3c705 in doris::segment_v2::DefaultValueColumnIterator::next_batch(unsigned long*, doris::COWdoris::IColumn::mutable_ptrdoris::IColumn&, bool*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/column_reader.cpp:2152:5
#10 0x55e8cd4f9313 in doris::segment_v2::ColumnIterator::next_batch(unsigned long*, doris::COWdoris::IColumn::mutable_ptrdoris::IColumn&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/column_reader.h:330:16
#11 0x55e8cd4f9313 in doris::segment_v2::SegmentIterator::_read_columns_by_index(unsigned int, unsigned short&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/segment_iterator.cpp:2291:13
#12 0x55e8cd50621b in doris::segment_v2::SegmentIterator::_next_batch_internal(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/segment_iterator.cpp:2643:5
#13 0x55e8cd4a0b82 in doris::segment_v2::SegmentIterator::next_batch(doris::Block*)::$_0::operator()() const /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/segment_iterator.cpp:2519:9
#14 0x55e8cd4a0b82 in doris::segment_v2::SegmentIterator::next_batch(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/segment_iterator.cpp:2518:19
#15 0x55e8cd37cca1 in doris::segment_v2::LazyInitSegmentIterator::next_batch(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/segment/lazy_init_segment_iterator.h:46:33
#16 0x55e8cccc0b05 in doris::Status doris::BetaRowsetReader::_next_batchdoris::Block(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/rowset/beta_rowset_reader.h:113:35
#17 0x55e8ccc98668 in doris::BetaRowsetReader::next_batch(doris::Block*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/rowset/beta_rowset_reader.h:56:55
#18 0x55e8ca62ee8f in doris::VCollectIterator::Level0Iterator::_refresh() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.h
#19 0x55e8ca611036 in doris::VCollectIterator::Level0Iterator::refresh_current_row() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.cpp:523:24
#20 0x55e8ca601a08 in doris::VCollectIterator::Level0Iterator::ensure_first_row_ref() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.cpp:496:14
#21 0x55e8ca604ea2 in doris::VCollectIterator::Level1Iterator::ensure_first_row_ref() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.cpp:713:27
#22 0x55e8ca608b20 in doris::VCollectIterator::build_heap(std::vector<std::shared_ptrdoris::RowsetReader, std::allocator<std::shared_ptrdoris::RowsetReader>>&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/vcollect_iterator.cpp:189:9
#23 0x55e8ca5dc1ae in doris::BlockReader::_init_collect_iter(doris::TabletReader::ReaderParams const&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/block_reader.cpp:152:9
#24 0x55e8ca5d7b9d in doris::BlockReader::init(doris::TabletReader::ReaderParams const&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/storage/iterator/block_reader.cpp:226:19
#25 0x55e8d02df6d6 in doris::OlapScanner::_open_impl(doris::RuntimeState*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/olap_scanner.cpp:317:32
#26 0x55e8d02cd697 in doris::Scanner::open(doris::RuntimeState*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner.h:80:16
#27 0x55e8d02b576a in doris::ScannerScheduler::_scanner_scan(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner_scheduler.cpp:185:5
#28 0x55e8d02bf603 in doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()::operator()() const::'lambda'()::operator()() const /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner_scheduler.cpp:75:17
#29 0x55e8d02bf603 in doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()::operator()() const /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner_scheduler.cpp:74:27
#30 0x55e8d02bf603 in bool std::__invoke_impl<bool, doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&>(std::__invoke_other, doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&) /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:63:14
#31 0x55e8d02bf603 in std::enable_if<is_invocable_r_v<bool, doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&>, bool>::type std::__invoke_r<bool, doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&>(doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()&) /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/invoke.h:116:9
#32 0x55e8d02bf603 in std::_Function_handler<bool (), doris::ScannerScheduler::submit(std::shared_ptrdoris::ScannerContext, std::shared_ptrdoris::ScanTask)::$_0::operator()() const::'lambda'()>::_M_invoke(std::_Any_data const&) /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:292:9
#33 0x55e8d02bed85 in std::function<bool ()>::operator()() const /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:593:9
#34 0x55e8d02bed85 in doris::ScannerSplitRunner::process_for(std::chrono::duration<long, std::ratio<1l, 1000000000l>>) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner_scheduler.cpp:425:25
#35 0x55e8d02aa844 in doris::PrioritizedSplitRunner::process() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/task_executor/time_sharing/prioritized_split_runner.cpp:102:35
#36 0x55e8d0272674 in doris::TimeSharingTaskExecutor::_dispatch_thread() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/task_executor/time_sharing/time_sharing_task_executor.cpp:556:77
#37 0x55e8d3fa8ad1 in std::function<void ()>::operator()() const /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:593:9
#38 0x55e8d3fa8ad1 in doris::Thread::supervise_thread(void*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/util/thread.cpp:460:5

Thread T620 (ls_normal [work) created by T0 here:
#0 0x55e8b71c5e91 in pthread_create (/mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be+0x2203de91)
#1 0x55e8d3fa7b8a in doris::Thread::start_thread(std::__cxx11::basic_string<char, std::char_traits, std::allocator> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator> const&, std::function<void ()> const&, std::shared_ptrdoris::Thread) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/util/thread.cpp:412:15
#2 0x55e8d02849fd in doris::Status doris::Thread::create<void (doris::TimeSharingTaskExecutor::
)(), doris::TimeSharingTaskExecutor*>(std::__cxx11::basic_string<char, std::char_traits, std::allocator> const&, std::__cxx11::basic_string<char, std::char_traits, std::allocator> const&, void (doris::TimeSharingTaskExecutor::* const&)(), doris::TimeSharingTaskExecutor* const&, std::shared_ptrdoris::Thread) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/util/thread.h:47:16
#3 0x55e8d026e6b3 in doris::TimeSharingTaskExecutor::_create_thread() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/task_executor/time_sharing/time_sharing_task_executor.cpp:671:12
#4 0x55e8d026b628 in doris::TimeSharingTaskExecutor::_try_create_thread(int, std::lock_guardstd::mutex&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/task_executor/time_sharing/time_sharing_task_executor.cpp:360:25
#5 0x55e8d026967a in doris::TimeSharingTaskExecutor::init() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/task_executor/time_sharing/time_sharing_task_executor.cpp:221:27
#6 0x55e8d025a85a in doris::TaskExecutorSimplifiedScanScheduler::start(int, int, int, int) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/exec/scan/scanner_scheduler.h:288:9
#7 0x55e8d368d649 in doris::WorkloadGroup::upsert_thread_pool_no_lock(doris::WorkloadGroupInfo
, std::shared_ptrdoris::CgroupCpuCtl) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/runtime/workload_group/workload_group.cpp:571:38
#8 0x55e8d368f14d in doris::WorkloadGroup::upsert_task_scheduler(doris::WorkloadGroupInfo*) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/runtime/workload_group/workload_group.cpp:667:12
#9 0x55e8d36bd223 in doris::WorkloadGroupMgr::create_internal_wg() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/runtime/workload_group/workload_group_manager.cpp:900:5
#10 0x55e8d32bbd3d in doris::ExecEnv::_create_internal_workload_group() /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/runtime/exec_env_init.cpp:475:5
#11 0x55e8d32b27fc in doris::ExecEnv::_init(std::vector<doris::StorePath, std::allocatordoris::StorePath> const&, std::vector<doris::StorePath, std::allocatordoris::StorePath> const&, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator>, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator>>> const&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/runtime/exec_env_init.cpp:424:5
#12 0x55e8d32ad7ec in doris::ExecEnv::init(doris::ExecEnv*, std::vector<doris::StorePath, std::allocatordoris::StorePath> const&, std::vector<doris::StorePath, std::allocatordoris::StorePath> const&, std::set<std::__cxx11::basic_string<char, std::char_traits, std::allocator>, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator>>> const&) /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/runtime/exec_env_init.cpp:199:17
#13 0x55e8b722cc8b in main /mnt/disk3/pipeline/repo/selectdb-core_master/selectdb-core/be/src/service/doris_main.cpp:531:14
#14 0x7f539b7cad8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

Critical checkpoint conclusions:

  • Goal and correctness: The PR fixes the string-path pointer advancement bug in , which previously advanced by \ and corrupted the arena-backed buffer after the second duplicate. The new \ matches the allocated layout and correctly supports callers such as \ and schema-change default-value writes. The added unit test covers repeated string duplication and validates stored contents across two insertion batches.
  • Scope/minimality: The code change is minimal and focused: one-line logic fix plus targeted unit tests.
  • Concurrency: No new concurrency is introduced. The modified code runs on a single column instance with local arena pointer arithmetic and does not add shared-state or lock interactions.
  • Lifecycle/static init: No lifecycle or static initialization changes.
  • Configuration: No config changes.
  • Compatibility: No API, serialization, or storage-format compatibility impact.
  • Parallel code paths: The same helper is used by nullable/default-value and schema-change flows that convert to predicate columns, so the fix applies to the relevant parallel callers.
  • Special conditionals: No new conditional logic was introduced.
  • Test coverage: The new unit tests cover string, int, and largeint duplicate insertion. For the bug described in the PR, the string case is the key coverage and is sufficient to demonstrate the repaired pointer progression.
  • Test result updates: No golden/result files involved.
  • Observability: No new observability needed for this localized memory-corruption fix.
  • Transaction/persistence: Not involved.
  • Data writes/atomicity: Not a transactional write path change; the fix improves in-memory correctness of column materialization.
  • FE/BE variable passing: Not involved.
  • Performance: The fix removes incorrect pointer progression without adding overhead. No new performance concerns found.
  • Other issues: None identified in the modified scope.

Residual risk:

  • I did not run the BE unit tests locally in this review environment, so the conclusion is based on code inspection and the added test logic rather than executed test evidence.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

Critical checkpoint conclusions:

  • Goal and correctness: The PR fixes the string-path pointer advancement bug in PredicateColumnType::insert_duplicate_fields(), which previously advanced by i * str.size() and corrupted the arena-backed buffer after the second duplicate. The new dst += str.size() matches the allocated layout and correctly supports callers such as DefaultValueColumnIterator::_insert_many_default() and schema-change default-value writes. The added unit test covers repeated string duplication and validates stored contents across two insertion batches.
  • Scope/minimality: The code change is minimal and focused: one-line logic fix plus targeted unit tests.
  • Concurrency: No new concurrency is introduced. The modified code runs on a single column instance with local arena pointer arithmetic and does not add shared-state or lock interactions.
  • Lifecycle/static init: No lifecycle or static initialization changes.
  • Configuration: No config changes.
  • Compatibility: No API, serialization, or storage-format compatibility impact.
  • Parallel code paths: The same helper is used by nullable/default-value and schema-change flows that convert to predicate columns, so the fix applies to the relevant parallel callers.
  • Special conditionals: No new conditional logic was introduced.
  • Test coverage: The new unit tests cover string, int, and largeint duplicate insertion. For the bug described in the PR, the string case is the key coverage and is sufficient to demonstrate the repaired pointer progression.
  • Test result updates: No golden/result files involved.
  • Observability: No new observability needed for this localized memory-corruption fix.
  • Transaction/persistence: Not involved.
  • Data writes/atomicity: Not a transactional write path change; the fix improves in-memory correctness of column materialization.
  • FE/BE variable passing: Not involved.
  • Performance: The fix removes incorrect pointer progression without adding overhead. No new performance concerns found.
  • Other issues: None identified in the modified scope.

Residual risk:

  • I did not run the BE unit tests locally in this review environment, so the conclusion is based on code inspection and the added test logic rather than executed test evidence.

Copy link
Copy Markdown

@utafrali utafrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one-line fix is clearly correct: replacing dst += i * str.size() with dst += str.size() gives sequential offsets 0, s, 2s, … into the n*s-byte arena allocation, matching what the pre-allocation expects. The is_string_type branch covers all string types (TYPE_VARCHAR, TYPE_CHAR, etc.) so no parallel path is left unpatched. Tests cover the reported case (5-duplicate string), integers, and large integers. Only cosmetic issues remain.

}
}

} // namespace doris
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline at end of file. The diff shows \ No newline at end of file on the closing brace. Most editors and CI linters flag this; add a trailing newline to stay consistent with the rest of the file.

}
}

TEST(PredicateColumnTest, InsertDuplicateFieldsString) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three tests use PredicateColumnTest as the suite name but live inside column_nullable_test.cpp. They are testing PredicateColumnType directly, not ColumnNullable. Consider moving them to a dedicated predicate_column_test.cpp to keep test organization consistent with the class under test.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

eldenmoon
eldenmoon previously approved these changes Apr 15, 2026
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@Gabriel39
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.09% (20179/38008)
Line Coverage 36.66% (190016/518251)
Region Coverage 32.92% (147533/448172)
Branch Coverage 34.05% (64581/189674)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.57% (27383/37222)
Line Coverage 57.26% (295868/516668)
Region Coverage 54.32% (245670/452296)
Branch Coverage 56.04% (106619/190250)


namespace doris {

TEST(PredicateColumnTest, InsertDuplicateFieldsString) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add more test to predicate column to coverer all the path of the class.
And add more type to the test especially for datev1 and decimalv2, string.

@Gabriel39
Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 15, 2026
@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (1/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.68% (27410/37203)
Line Coverage 57.36% (296344/516606)
Region Coverage 54.75% (247621/452275)
Branch Coverage 56.32% (107155/190248)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants