Tricks with Flash Memory

This note is to record some slightly unobvious uses of flash memory. More accurately it is for memories that limit the number of writes and write large blocks, which I assume describes flash memory.

It is common for programs to modify storage, which encodes some state, by modifying a few bits or a few words while leaving large blocks on either side unchanged. For a long running program large collections of data evolve by scattered modifications of storage. The application or file system logic, in or out of the kernel, is in a position to locate these modifications and building a forward delta chain allowing quick recreation of the modified data. Paging hardware can be used in a “copy-on-write” mode to limit searches to find the modifications to single pages. Hardware help could go further if this were strategic.

Generational garbage collection already does much of the job of segregating by address infrequently modified data. When data deemed unlikely to change is indeed changed, the software doing so is sometimes in a position to recognize this are produce a change record instead. If the data is already only in Flash then paging can provide the illusion that the big page has been modified without actually modifying it.

IBM’s text editor for CMS was written to produce ‘change mods’ instead of producing an entirely new file after each editing session. This afforded the luxury of presenting all of the previous versions of the file at small cost. Commonly available libraries were available to deliver the content of such “files” serially. Modifications of this sort virtually change the length of the file, but not actually; they just add more deltas. Such files can be mapped to virtual memory at a cost proportional to the number of pages viewed, and the number of mods.

Many mods can be accumulated in non flash memory and then share a page of flash.

If the cost of consulting many delta files becomes burdensome one may create a new real base file. If previous versions are still needed one may easily produce a backwards deltas.

Depending on the format of the mods algorithms exist to consolidate and sort mods to improve processing without access to the underlying data.

Few if any of these tricks can be programmed with a conventional interfaces to storage devices between the program and flash. Few if any of these tricks can be done alone in firmware connected to the flash hardware. Flash is not disk!

Errors discovered by techniques such as erasure coding may be fixed by a separate set of updates. This must be coordinated with health estimates of underlying storage.

Many Flash algorithms depend on small frequently written (persistent) stable storage. Being small it is OK if it much more expensive per bit than the flash memory.

Some earlier notes

I wonder how many tricks such as these are possible via the software that accompanies commodity flash memory today.

Closer look,
This document, the “Common Flash Interface” standard seems to be missing a couple of early chapters. For instance what is “Query command code”? It seems to describe an interface between software and hardware where the hardware responds as memory to software. It allows the software to interrogate hardware parameters. It also describes how software can read information whose format is unspecified but information which pertains to specific ‘algorithms’ supplied by the vendor which somehow execute on the far side of the interface. No clues are given about the difference between the ‘Primary’ and ‘Alternate’ algorithms.

I suspect that the afore mentioned algorithms to apply ECC or erasure coding to the flash memory via mods, must be carried out at an abstraction level distinct from the higher level algorithms. I suspect further that the former are more closely connected with physical nature of the flash memory. Yet such a division precludes certain powerful techniques. Accumulating bloom filters comes to mind.