Skip to content
Snippets Groups Projects
Commit 6790469f authored by Brandon Maier's avatar Brandon Maier Committed by Chris Lamb
Browse files

Fix missing diff output on large diffs.

When there is a large diff chunk, match_lines() will skip running the
difflib.Differ.compare(). However this causes the following issues:

- It does not empty the `self.buf` buffer. This means that all future
  calls to match_lines() for that file will always be too large. So
  effectively no more diffs from the file get output.

- It outputs a debug message, but does not output anything to the
  side-by-side diff, so a user looking at the side-by-side diff may be
  misled into thinking the rest of the file has no differences.

We can fix these issue by falling back to a lazy line-by-line diff. This
produces suboptimal output, but it runs in linear O(n) time while
providing some form of output. We include a comment in the diff so the
user knows the following output is using a lazy diff algorithm.
parent 3ab6acb8
No related branches found
No related tags found
No related merge requests found
......@@ -28,6 +28,7 @@ import threading
import subprocess
from difflib import Differ
from itertools import zip_longest
from multiprocessing.dummy import Queue
from .tools import get_tool_name, tool_required
......@@ -551,11 +552,9 @@ class SideBySideDiff:
if len(l0) + len(l1) > 750:
# difflib.Differ.compare is at least O(n^2), so don't call it if
# our inputs are too large.
logger.debug(
"Not calling difflib.Differ.compare(x, y) with len(x) == %d and len(y) == %d",
len(l0),
len(l1),
)
yield "C", "Diff chunk too large, falling back to line-by-line diff ({} lines added, {} lines removed)".format(self.add_cpt, self.del_cpt)
for line0, line1 in zip_longest(l0, l1, fillvalue=""):
yield from self.yield_line(line0, line1)
return
saved_line = None
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment