The fitness landscape concept aids intuition on adaptive evolution through low fitness genotypes. Evolutionary processes become complex when environments and therefore fitnesses fluctuate. Antibiotic resistance evolution in bacteria is an important example of such dynamics. Resistance bears a cost in the drug-free environment, but compensatory mutation can lower this cost, creating a fitness valley. With the drug present, the valley becomes a hill that is easily climbed. Once a population is dominated by resistant-compensated genotypes, reversion to sensitivity is difficult: this phenomenon has been described as an evolutionary lobster trap. With increasing frequencies of drug resistance among pathogenic bacteria, it is critical to understand how this trap can be escaped. Here, we develop stochastic models to investigate these dynamics. The residual fitness cost (the cost remaining after compensatory mutation has occurred) is a key parameter. Reversion to sensitivity is favored when the time spent in the absence of the drug relative to its presence is high compared to the residual fitness cost. Population sizes are also important: in large populations, resistant-compensated mutants appear in resistant-uncompensated or sensitive-compensated genotypes without fixation of these intermediates. This stochastic tunneling effect occurs when sufficient time is allowed by the rates of environmental fluctuation.