Performance

Performance Hacks

This note may apply to several of the plans in this section. I am currently thinking in terms of easy slivers now. We choose the compile time parameter n and cause each of the 20 faces of the icosahedron to be dived into n² facets. n = 20 for now. That gives us about 10n² vertices and 2(10n²) facets. Each evaluation of a shape requires computing the potential between each vertex and each sliver, or roughly 2(10n²)² = 200n⁴ steps. Performance is an issue. I consider here impact of cache size.

One trick is to collect k facets, compute their normals, and k vertices and compute the potentials for each of the k² pairs of facet and vertex. If the k facets fit in some level of cache then these k² calculations will go much faster. Each of the vertices needs to accumulate a potential and these k potentials might be kept in cache. The cache must hold the centroid and normal for each facet (6 floats), and the location and accumulating potential for each vertex (4 floats). (This is reminiscent of IBM 704 tape logic.)

The “cache savvy” alternative in this code uses this idea.