Fix alloc-dealloc-mismatch in <str_vec> destructor; drop unused DELTA token#4
Open
Licht-T wants to merge 1 commit intoenvoyproxy:mainfrom
Open
Fix alloc-dealloc-mismatch in <str_vec> destructor; drop unused DELTA token#4Licht-T wants to merge 1 commit intoenvoyproxy:mainfrom
Licht-T wants to merge 1 commit intoenvoyproxy:mainfrom
Conversation
… token The shared %destructor for <str_vec> <table_vec> ... called `delete` on each element pointer, but <str_vec> elements are char* allocated by the lexer via strdup() (unquoted IDENTIFIER) or hsql::substr() (quoted IDENTIFIER) — both malloc-backed. Mixing free/delete is undefined behavior. Under tcmalloc with -fsized-deallocation, sized operator delete trusts the static type size (1 for char) and returns the chunk to the wrong size-class freelist, eventually crashing on an unrelated allocation (envoyproxy/envoy#36471). This is the same bug fixed upstream in hyrise#221. Split the destructor: <str_vec> uses free(); the rest hold pointers to new-allocated objects and stay on delete. Also drop the dead DELTA token. It is declared in sql_keywords.txt / bison_parser.y / flex_lexer.l but referenced by no grammar rule, so any SQL using `delta` as an identifier (e.g. pgbench's pgbench_history.delta) fails to parse — which is what trips the destructor cleanup path that triggers the UB above. Regenerate bison_parser.{cpp,h} and flex_lexer.{cpp,h} accordingly. Add regression tests: - DeltaIsAValidIdentifier: confirms `delta` parses as IDENTIFIER. - RepeatedFailedInsertParseDoesNotCorruptHeap: stresses the failed-parse cleanup; under ASAN catches the alloc-dealloc-mismatch immediately on regression. Signed-off-by: Rito Takeuchi <licht-t@outlook.jp>
3c73072 to
80cc4ae
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the heap corruption observed as envoyproxy/envoy#36471 (postgres_proxy crash under pgbench).
Bugs
1.
<str_vec>%destructorusesdeleteonmalloc-ed pointersThe shared destructor at
bison_parser.y#L147-L154covers<str_vec>together with<table_vec>,<column_vec>, etc., and callsdelete ptron each element.<str_vec>elements arechar*produced by the lexer viastrdup()(unquotedIDENTIFIER) orhsql::substr()(quotedIDENTIFIER, alsomalloc-backed). Mixingfree/deleteis undefined behavior. Under tcmalloc with-fsized-deallocation, sizedoperator delete(void*, size_t)trusts the static type size (1forchar) and returns the chunk to the wrong size-class freelist; after enough mismatched frees an unrelated allocation segfaults.This is the same bug already fixed upstream in hyrise/sql-parser#221 (Oct 2022). This fork was branched before that PR landed and never resynced.
AddressSanitizer flags it cleanly:
2.
DELTAis a dead reserved keyword%token DELTAis declared (and listed insql_keywords.txt#L95andflex_lexer.l#L128) but referenced by no grammar rule. The lexer therefore tokenizesdeltaasDELTAand the parser has nowhere to consume it, so any SQL usingdeltaas an identifier (notably pgbench'spgbench_history.delta) fails to parse - which is precisely what triggers the destructor cleanup path that exhibits bug#1.Fix
<str_vec>destructor out of the shared block and usefree()for its elements. Keepdeletefor the other vector types (which holdnew'd objects).DELTAfromsql_keywords.txt,bison_parser.y(%tokendeclaration), andflex_lexer.l(lexer rule).bison_parser.{cpp,h}andflex_lexer.{cpp,h}accordingly. (This accounts for the bulk of the diff — the hand-written changes are small.)Tests
test/regression_tests.cppadds:DeltaIsAValidIdentifier-INSERT INTO pgbench_history (tid, bid, aid, delta) VALUES (1, 2, 3, 4)parses to a validInsertStatementwithdeltaas the fourth column.RepeatedFailedInsertParseDoesNotCorruptHeap- runs anINSERTwhose column list contains a still-reserved keyword 1000× to exercise the destructor cleanup path. Under ASAN this catches the regression on the very first iteration; in release builds it serves as a smoke test.Verified:
make test: all 87 grammar tests + 12 unit tests + 2 new regression tests pass.envoyproxy/envoymainagainst this branch via--override_repositoryand reran the issue'spgbenchscenario (50 clients × 30 s, plaintext,enable_sql_parsing: true) - no crash.