If a 2D array in memory is square, n by n, the following transposes it in place, quickly and rather efficiently:
type tmp A[n][n]; int i, j; ... for(i=0; i<n; ++i) for(j=0; j<i; ++j) {type tmp = A[i][j]; A[i][j] = A[j][i]; A[j][i] = tmp;}It is much harder for a m by n array that might not be square. Usually memory is available for a temporary copy of the array in which case it is easy. Otherwise we have an obscure combinatorial problem if we restrict ourselves to just two cells, such as tmp above to hold an array element.
Each array cell is moved to an easily computed different cell but in general this array permutation is composed of many cyclic permutations. The declaration A[m][n] is not suitable for the transposed array and the code below will assume that A is declared type A[m*n];. Permuting the cycle that passes thru element t can be done as follows:
void cycle(int T){int t = T; type tmp = A[t]; do { {int i = t/n, j = t%n; /* 0<=i<m and 0<=j<n */ t = j*m + i;} {type w = A[t]; A[t] = tmp; tmp = w;}} while(t!=T);}The code above is moderately obscure but the really difficult part is to know for which values to call cycle. Permuting a cycle more than once would be not only inefficient, but would yield the wrong result!
For a 4 by 7 array the cycles are:
(0),
(1 4 16 10 13 25 19 22 7),
(2 8 5 20 26 23 11 17 14),
(3 12 21),
(6 24 15),
(9),
(18),
(27)
Omitting unit cycles is good but unnecessary.
For this shape “cycle(1); cycle(2); cycle(3); cycle(6);” does the transpose.
We need an algorithmic axiom of choice to discover just one member of each cycle.
If we can afford a m by n bit array we can remember which elements we have permuted.
This solves the problem as originally posed. Its computational cost is about (m*n)3/2, worse than I had hoped. It makes minimal moves, however.