First we represent the file at each end as a tree. This is done concurrently at each end without coordination. Then a coordinated recursive descent begins. When hashes of subtrees are the same we conclude that the subtree and file portion that produced the subtree are the same. If the root hashes (level 0) at the two ends are different we transmit the level 1 hashes from one end to the other. For each hash from one end that matches a hash from the other, we have identified equal portions of the remote files. For unmatched subtrees, we descend farther. We notice coincident hashes even when they do not belong immediately to the same subtree. A sort makes this task efficient. We transmit those file portions without matching hashes whenever we reach the bottom.
If there is one file insertion then probably one sub-tree of each level will differ and hashes must be transmitted at each level. The time to find just one difference is the product of the network latency and the log of the file size. Multiple differences can overlap network latencies.
Of course it is easy with this information to modify one of the files to make it like the file at the other end.