Skip to content

Fix alloc-dealloc-mismatch in <str_vec> destructor; drop unused DELTA token#4

Open
Licht-T wants to merge 1 commit intoenvoyproxy:mainfrom
Licht-T:fix-str-vec-destructor-and-drop-delta
Open

Fix alloc-dealloc-mismatch in <str_vec> destructor; drop unused DELTA token#4
Licht-T wants to merge 1 commit intoenvoyproxy:mainfrom
Licht-T:fix-str-vec-destructor-and-drop-delta

Conversation

@Licht-T
Copy link
Copy Markdown

@Licht-T Licht-T commented Apr 27, 2026

Fixes the heap corruption observed as envoyproxy/envoy#36471 (postgres_proxy crash under pgbench).

Bugs

1. <str_vec> %destructor uses delete on malloc-ed pointers

The shared destructor at bison_parser.y#L147-L154 covers <str_vec> together with <table_vec>, <column_vec>, etc., and calls delete ptr on each element. <str_vec> elements are char* produced by the lexer via strdup() (unquoted IDENTIFIER) or hsql::substr() (quoted IDENTIFIER, also malloc-backed). Mixing free/delete is undefined behavior. Under tcmalloc with -fsized-deallocation, sized operator delete(void*, size_t) trusts the static type size (1 for char) and returns the chunk to the wrong size-class freelist; after enough mismatched frees an unrelated allocation segfaults.

This is the same bug already fixed upstream in hyrise/sql-parser#221 (Oct 2022). This fork was branched before that PR landed and never resynced.

AddressSanitizer flags it cleanly:

ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete)
    #0 operator delete(void*)
    #1 yydestruct                bison_parser.y:156
    #2 hsql_parse
    ...
allocated by:
    #0 __interceptor_strdup
    #1 hsql_lex                  flex_lexer.l:250

2. DELTA is a dead reserved keyword

%token DELTA is declared (and listed in sql_keywords.txt#L95 and flex_lexer.l#L128) but referenced by no grammar rule. The lexer therefore tokenizes delta as DELTA and the parser has nowhere to consume it, so any SQL using delta as an identifier (notably pgbench's pgbench_history.delta) fails to parse - which is precisely what triggers the destructor cleanup path that exhibits bug #1.

Fix

  • Split the <str_vec> destructor out of the shared block and use free() for its elements. Keep delete for the other vector types (which hold new'd objects).
  • Drop DELTA from sql_keywords.txt, bison_parser.y (%token declaration), and flex_lexer.l (lexer rule).
  • Regenerate the committed bison_parser.{cpp,h} and flex_lexer.{cpp,h} accordingly. (This accounts for the bulk of the diff — the hand-written changes are small.)

Tests

test/regression_tests.cpp adds:

  • DeltaIsAValidIdentifier - INSERT INTO pgbench_history (tid, bid, aid, delta) VALUES (1, 2, 3, 4) parses to a valid InsertStatement with delta as the fourth column.
  • RepeatedFailedInsertParseDoesNotCorruptHeap - runs an INSERT whose column list contains a still-reserved keyword 1000× to exercise the destructor cleanup path. Under ASAN this catches the regression on the very first iteration; in release builds it serves as a smoke test.

Verified:

  • make test: all 87 grammar tests + 12 unit tests + 2 new regression tests pass.
  • valgrind memory-leak check clean.
  • Grammar conflict check clean.
  • ASAN build clean against the new tests; reverting the destructor reproduces the exact diagnostic above.
  • Built envoyproxy/envoy main against this branch via --override_repository and reran the issue's pgbench scenario (50 clients × 30 s, plaintext, enable_sql_parsing: true) - no crash.

… token

The shared %destructor for <str_vec> <table_vec> ... called `delete` on
each element pointer, but <str_vec> elements are char* allocated by the
lexer via strdup() (unquoted IDENTIFIER) or hsql::substr() (quoted
IDENTIFIER) — both malloc-backed. Mixing free/delete is undefined
behavior. Under tcmalloc with -fsized-deallocation, sized operator
delete trusts the static type size (1 for char) and returns the chunk
to the wrong size-class freelist, eventually crashing on an unrelated
allocation (envoyproxy/envoy#36471).

This is the same bug fixed upstream in hyrise#221.

Split the destructor: <str_vec> uses free(); the rest hold pointers to
new-allocated objects and stay on delete.

Also drop the dead DELTA token. It is declared in sql_keywords.txt /
bison_parser.y / flex_lexer.l but referenced by no grammar rule, so any
SQL using `delta` as an identifier (e.g. pgbench's pgbench_history.delta)
fails to parse — which is what trips the destructor cleanup path that
triggers the UB above.

Regenerate bison_parser.{cpp,h} and flex_lexer.{cpp,h} accordingly.

Add regression tests:
  - DeltaIsAValidIdentifier: confirms `delta` parses as IDENTIFIER.
  - RepeatedFailedInsertParseDoesNotCorruptHeap: stresses the
    failed-parse cleanup; under ASAN catches the alloc-dealloc-mismatch
    immediately on regression.

Signed-off-by: Rito Takeuchi <licht-t@outlook.jp>
@Licht-T Licht-T force-pushed the fix-str-vec-destructor-and-drop-delta branch from 3c73072 to 80cc4ae Compare April 27, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant