Skip to content

revcomp: interpreting "line-by-line"

CHANGE

Description

Ben Harshbarger on 2017-12-08 00:23

The reverse-complement benchmark should "read line-by-line a redirected FASTA format file from stdin". I was looking through some of the top entries recently and what I saw made me question whether I understood what is meant by "line-by-line".

The top u64q entry Rust #2 uses a BufReader, which will read chunks of bytes from stdin without looking at the contents. Calling 'read_until' will:

  • read a chunk of bytes from stdin if the buffer has been fully scanned
  • scan the remainder of the buffer for the desired character
  • append a Vec with the bytes leading up to and including the desired character (or all remaining bytes if not found) This entry uses read_until to skip past the section header, then uses read_until(b'>', ...) to find the next section.

The 2nd-place entry gcc #6 reads chunks of bytes until the '>' character, and then starts looking for newlines:

The fourth-place entry Rust #3 uses BufReader as well, but only copies bytes into a Vec one line at a time.

Are approaches like these legal? If so, what is the "line-by-line" qualifier intended to suggest?

It looks like there was a similar discussion when Rust #2 was submitted

Proposal

Since Rust #2 was accepted, should the benchmark's rules be updated to permit such approaches?

Or should only Rust #3 have been accepted? Or neither?

Edited by Isaac Gouy