It doesn't make much sense to build hardware to run an intermediate language. Java bytecode isn't optimised; almost all optimisation is meant to happen in the JIT. If you build a processor that runs bytecode directly, when does the optimisation occur? Presumably never.
Low-horsepower platforms are probably the best place to give it a go, as they may struggle to run a respectable JIT compiler, but as you say, Jazelle didn't catch on.
The hardware would perform the optimisation using typical means. Instruction lookahead can be used for caching the stack slots to be used in registers, for instance.