Frequent HLL bitstream_unpack crashes
We’ve been seeing nearly daily crashes from a PostgreSQL 9.6 application that is heavily dependent on the HLL extension (v 2.10.2). All these crashes are from inside the HLL bitstream_unpack function. Usually they’re from an INSERT VALUES statement, but occasionally they are from an hll_cardinality call in a query. The format of the HLL cells is always set using expressions like: hll_empty(17,5,-1,1) || hll_hash_bigint(879826435) || ...
I think I’ve identified the root cause, but I’d like someone who is familiar with the code in the HLL library to confirm my hypothesis:
In bitstream_unpack it pulls a full quadword of data out of the bitstream using the brc_curp pointer. Usually this is not a problem. However, if the brc_curp pointer is less than 8 bytes from the end of the bitstream data, then that quadword read is reading past the end of the actual bitstream data. Because of the subsequent bit reordering, shifting, and masking this has no effect of the answers. However, when the end of the bitstream is very close to the end of an OS page then the quadword read will attempt to read some bytes off the next OS page, and if that next OS page does not exist in this process, then it will SEGV.
Assuming I’ve correctly identified the problem, the attached patch should fix the issue.
BTW, I also posted this issue on GitHub 2 weeks ago, but got no response there.
Regards, Steve Kirk
P.S. The crash stacks all look roughly like:
#0 bitstream_unpack (brcp=brcp@entry=0x7ffc69814290) at hll.c:263 #1 0x0000147cbfcbe8ac in sparse_unpack (i_size=, i_bitp=0x7ffc69814327 "", i_nfilled=10403, i_log2nregs=, i_width=5, i_regp=0x7ffc69814320 "\002") at hll.c:359 #2 multiset_unpack (o_msp=o_msp@entry=0x7ffc698142f0, i_bitp=i_bitp@entry=0x147a16bf903c "\023\221\177", i_size=, o_encoded_type=o_encoded_type@entry=0x0) at hll.c:1188 #3 0x0000147cbfcc0823 in hll_union (fcinfo=) at hll.c:2042 #4 0x0000000000617132 in ExecMakeFunctionResultNoSets (fcache=0x147cbb3eb948, econtext=0x147a1d6cd530, isNull=0x147a1d6ce09d "", isDone=) at execQual.c:2041 #5 0x000000000061cace in ExecTargetList (tupdesc=, isDone=0x0, itemIsDone=0x147a1d6cffa0, isnull=0x147a1d6ce090 "", values=0x147a1d6ce000, econtext=0x147a1d6cd530, targetlist=0x147cbb3ecc28) at execQual.c:5423 #6 ExecProject (projInfo=, isDone=isDone@entry=0x0) at execQual.c:5647 #7 0x000000000062f10b in ExecOnConflictUpdate (returning=, canSetTag=1 '\001', estate=0x147a1d6c8038, excludedSlot=0x147a1d6c9bc0, planSlot=0x147a1d6c9bc0, conflictTid=0x7ffc69854520, resultRelInfo=0x147a1d6c81c8, mtstate=0x147a1d6c82d8) at nodeModifyTable.c:1234 #8 ExecInsert (canSetTag=1 '\001', estate=0x147a1d6c8038, onconflict=ONCONFLICT_UPDATE, arbiterIndexes=0x147a1654fd28, planSlot=0x147a1d6c9bc0, slot=0x147a1d6c9bc0, mtstate=0x147a1d6c82d8) at nodeModifyTable.c:410 #9 ExecModifyTable (node=node@entry=0x147a1d6c82d8) at nodeModifyTable.c:1512 #10 0x00000000006162a8 in ExecProcNode (node=node@entry=0x147a1d6c82d8) at execProcnode.c:396 #11 0x0000000000612727 in ExecutePlan (dest=0x147a1654d888, direction=, numberTuples=0, sendTuples=, operation=CMD_INSERT, use_parallel_mode=, planstate=0x147a1d6c82d8, estate=0x147a1d6c8038) at execMain.c:1567 #12 standard_ExecutorRun (queryDesc=0x147cbb3c6438, direction=, count=0) at execMain.c:339