Performance Hacks

This note may apply to several of the plans in this section. I am currently thinking in terms of easy slivers now. We choose the compile time parameter n and cause each of the 20 faces of the icosahedron to be dived into n2 facets. n = 20 for now. That gives us about 10n2 vertices and 2(10n2) facets. Each evaluation of a shape requires computing the potential between each vertex and each sliver, or roughly 2(10n2)2 = 200n4 steps. Performance is an issue. I consider here impact of cache size.

One trick is to collect k facets, compute their normals, and k vertices and compute the potentials for each of the k2 pairs of facet and vertex. If the k facets fit in some level of cache then these k2 calculations will go much faster. Each of the vertices needs to accumulate a potential and these k potentials might be kept in cache. The cache must hold the centroid and normal for each facet (6 floats), and the location and accumulating potential for each vertex (4 floats). (This is reminiscent of IBM 704 tape logic.)

The “cache savvy” alternative in this code uses this idea.