Will LLMs Eclipse The Classic Code Diff Algorithms?

Code Diff Basics

Identifying the variations between two text files is a routine task for Git. The fundamental steps include:

  1. Looking at the difference between the HEAD and the working directory.
  2. Assessing the changes across two commits.
  3. Creating a patch.

Finally your preferred version control system will elegantly display those code diffs on its platform.

A small but sweet diff in our GitHub repo, upgrading the GPT version we use

The Evolution of Code Diff Algorithms

The two most important variables to measure in a code diff algorithm are speed and quality. 

In the early 1970s the first version of diff was developed based on the Hunt-Mcllroy (expected performance of O(n log n)) algorithm for Unix. It later evolved thanks to Myer’s work on An O(ND) Difference Algorithm and Its Variations. 

Myers' Diff Algorithm compares two sequences by finding the minimum number of edits needed to transform one sequence into the other and then constructing an "edit script" of these changes. It does this by creating an edit graph to determine the shortest path of edits and then backtracking to detail the exact insertions, deletions, and matches required.

Taking into account that quality is higher and the complexity of the algorithm went from linearithmic to linear. To this day, it’s still the default way we process a code diff. That being said, qualitative is a subjective measure. 

Now, let’s talk  about what everyone is talking about today. LLMs pose an interesting tradeoff between speed and quality. It’s clear that LLMs are too computationally expensive to run, but their quality in the code diff use case is amazing. Let’s take a deeper look at this.

Comparing LLMs and Traditional Code Diff Tools

Consider the following example, where we are changing variable and function names, but the essence stays the same.

This is a selective example where Myers’ algorithm doesn’t perform with the greatest quality you can possibly imagine. The algorithm is line-based and seeks the shortest edit sequence, which doesn’t produce the most human-readable code diff between Version A and B.

The git diff command treats this change as if the whole block changed, when we're only changing variable and function names. If only the changed variable and function names where colored with red and green, the diff would be more human-readable

When we instruct ChatGPT with:

“You are a code diff algorithm, that inserts "+" in line additions and "-" in deletions. Highlight the difference between Versions A and B at the character level instead of the line level as Myer's diffing algorithm would do. Render the code diffs per word. Be as granular as possible.”

we get the following:

We get the most granular code diff possible, at the word level. 

If you are very well versed in Git, you are thinking about git --word-diff-regex right now. 

This Git command with this flag allows you to replace the default delimiter for words (the whitespace) with whatever delimiter you want, built with a regular expression. 

However building a master regex that would handle all cases possible would be very hard. Version control systems could potentially build a code diff system for non-time critical use cases that’s as accurate and granular as possible. 

Breakthroughs in LLMs for Code Analysis

This got even more exciting thanks to Open AI’s recent announcement of a 128k token context window size. 

To put it in perspective, Open AI’s November 2023 update allows you to pass any of the following to the context window:
- 3200 tweets

- 120 blog posts

- 40 conferences

- 25 international trade agreements

- 20 days of an average person talking

- 14 court cases or presidential speeches

- 10 scientific papers

- 4 PhD theses

- 2 fiction novels

- 1 in-depth textbook about economics, law, engineering

Just imagine what the possibilities are in the software developer’s workflow. It will keep growing! 

We imagine that in the near future we will have a whole family of developer tools that index all of a company’s Git repositories to build different RAG-powered developer experiences. Make an LLM reason about the codebase to generate better code, to write and chat with docs, to improve testing, and to expedite code review among other use cases. 

We can also imagine a future in which version control systems use the codebase as the necessary context to make an LLM aware of the company’s coding styles and therefore do better code diffing based on such. Understanding which patterns repeat, what’s most relevant to show (because for example, GitHub collapses code diffs that are too large), etc. 

At Watermelon we’re building an open source copilot for code review and we’re paying very close attention to this. 

The Future of Code Diffing: Integration or Replacement?

Will GitHub add LLMs to improve their code diff algorithm for certain edge cases that aren’t time critical? Will it all be LLM-based once we have more compute and they’re able to run faster?

The space is moving very fast. Current code diffing methods are very efficient and have close to perfect accuracy, but perhaps as LLMs become more computationally cheap to run, we’ll see more accurate and granular code diff methods emerge on version control systems.

As we ponder these developments, several key questions arise: How will the integration of LLMs influence the skillset required for software developers and reviewers? Will the increased granularity and accuracy of LLM-powered diffs necessitate new best practices in code review and collaboration? And crucially, how might these advancements in code diffing algorithms affect the overall software development lifecycle, from design to deployment? The potential is vast, but so is the need for careful consideration of how these tools are implemented and used.