Ruby k-nucleotide using by Maarten Brouwers (using Ractor)

Source code

Parallelism using Ractor (current fastest ruby variant is using process forking). I'd say pretty idiomatic ruby code, albeit with some tuning for performance.

knucleotide.ruby-9.ruby

Provide an example build command-line

Command (with time output):

ruby --yjit-call-threshold=1 --yjit -W0 ./knucleotide.ruby-9.ruby < input25000000.txt       # 216,73s user 20,34s system 389% cpu 1:00,94 total

Some locally ran comparisons (not controlled):

ruby --yjit -W0 knucleotide.ruby-1.ruby 0 < input25000000.txt                              # 189,42s user 1,88s system 374% cpu 51,048 total
ruby --yjit-call-threshold=1 --yjit ./knucleotide.ruby-1.ruby < input25000000.txt          # 184,56s user 1,50s system 537% cpu 34,617 total
ruby --yjit-call-threshold=1 --yjit ./knucleotide.ruby-1-ractor.ruby < input25000000.txt   # 241,11s user 24,15s system 418% cpu 1:03,34 total
ruby --yjit-call-threshold=1 --yjit ./knucleotide.ruby-7.ruby < input25000000.txt          # 130,65s user 0,90s system 97% cpu 2:14,36 total

Notes:

I ran ruby-1 version both with the originally provided call, and the more optimal one, one that is compiling methods on first call, instead of after 30th or so
I also included a knucleotide.ruby-1-ractor.ruby variant, just to compare the fastest parallel running version using process-forks vs ruby native Fibers.

PS: benchmarking the implementation I found that most time with ruby is actually lost in GC, so for ruby this is mostly a GC test, instead of an IO one. That is because every selection of substrings is another object. I couldn't think of a faster version (converting it in an array of integers for example wouldn't help, still same issue).

Edited Mar 13, 2025 by Maarten Brouwers