This is an idea that is a variation of similar ideas but not quite in this form. Instruction sets are often crowded and selecting a particular operation coding is fraught with unfortunate tradeoffs. Here is a scheme that ameliorates this.
The idea is that the decoding of an instruction depends on not only the op-code in the instruction, but also on a CS (Code State) occupying a few high order bits of the instruction pointer. I presume that the CS is changed infrequently, probably no more often than branch instructions are needed for other reasons. Basic blocks would thus use a single CS. One CS might favor floating point ops and provide ample code space for the many variations that are easy for floating point hardware to provide but normally expensive to allocate opcodes for. Another CS might facilitate fixed point arithmetic with special overflow definitions. Yet another CS might execute code written for legacy hardware. I have not seen the need for more than 2 or 4 CS values. It might be tempting to put floating point rounding states in the CS to ameliorate the cost of tailoring those states to the routine at hand. In my experience the subroutine needs its own FP rounding control values and can scarcely afford to save and restore that of the caller. In short the floating control values belong in an unprivileged part of the program status space—status that depends on what code you are executing.
I also assume that the memory map is arranged to ignore the CS. The 68K has a register indicating how many high bits of the virtual address to ignore. This mapping arrangement allows instructions for different CS values to fit in one page. Subroutines could use whatever CS value was convenient since whatever logic locates the routine also sets the appropriate CS at no extra cost. Returning to the caller restores the CS concomitantly with restoring the old instruction pointer. This scheme assumes that the various CS values agree on the register file and data formats as well as byte ordering. A significant set of instructions must presumably be available in each CS space.
Some modes might specify only two register file operands and thus make room for more op-code bits. I think that the ARM architecture has this option to support smaller 16 bit instructions to allow denser less efficient programs. Note that the decode hardware has plenty of advanced warning of the CS value. Parallel decode logic is not plagued by late format information.