type tmp A[n][n]; int i, j; ... for(i=0; i<n; ++i) for(j=0; j<i; ++j) {type tmp = A[i][j]; A[i][j] = A[j][i]; A[j][i] = tmp;}It is much harder for a m by n array that might not be square. Usually memory is available for a temporary copy of the array in which case it is easy. Otherwise we have an obscure combinatorial problem if we restrict ourselves to just two cells, such as

Each array cell is moved to an easily computed different cell but in general this array permutation is composed of many cyclic permutations.
The declaration `A[m][n]` is not suitable for the transposed array and the code below will assume that A is declared `type A[m*n];`.
Permuting the cycle that passes thru element t can be done as follows:

void cycle(int T){int t = T; type tmp = A[t]; do { {int i = t/n, j = t%n; /* 0<=i<m and 0<=j<n */ t = j*m + i;} {type w = A[t]; A[t] = tmp; tmp = w;}} while(t!=T);}The code above is moderately obscure but the really difficult part is to know for which values to call

For a 4 by 7 array the cycles are:
(0),
(1 4 16 10 13 25 19 22 7),
(2 8 5 20 26 23 11 17 14),
(3 12 21),
(6 24 15),
(9),
(18),
(27)

Omitting unit cycles is good but unnecessary.
For this shape “`cycle(1); cycle(2); cycle(3); cycle(6);`” does the transpose.
We need an algorithmic axiom of choice to discover just one member of each cycle.

If we can afford a m by n bit array we can remember which elements we have permuted.

Lee Corbin proposes the following solution: Visit each cell just once in some convenient order. Compute the cycle thru that cell without moving any elements. If the address of the cell is the least of the cycle, then move that cycle of cells.

This solves the problem as originally posed.
Its computational cost is about (m*n)^{3/2}, worse than I had hoped.
It makes minimal moves, however.