This is the case for any sharp filter. It is not unique to the FFT approach. It doesn't matter if you use a linear phase FIR; any time you "remove" frequencies you can increase your peak levels. Try graphing sin(x) + 0.2sin(3x) and then try removing/filtering out the 3x component.
It's even true for reconstruction. A digital waveform can represent peak levels far above "digital peak", in between samples.
This is why if you're mastering songs, you'd better keep your peak levels at -0.5dB or -1dB so (so the filtering from lossy compression won't make it clip), and why you'd better use an oversampling limiter. Especially if you're doing loudness war style brutal limiting, because that's the stuff that really creates inter sample peaks. But you shouldn't be doing that, because Spotify and YouTube will just turn your song down to -14 LUFS anyway and all you'll have accomplished is making it sound shitty :-)
not quite, designing minimum passband distortion filters with no amplitude increase anywhere is slightly harder, but its not impossible. Even if you are strictly removing some specific frequencies, as you can design the filter in such a way that the spectrum amplitude ringing strikes zero for them. In practice though, simply reducing these frequencies by a factor of 100 is good enough and thats possible for bands without needing to have any amplitude above 1.
You didn't understand my example. This isn't about spectral ringing. It doesn't matter if you have zero spectral ringing, and no amplitudes above 1. There is no way to have a sharp filter that removes (or almost removes) certain frequencies, even if it has zero spectral ringing, while guaranteeing it doesn't increase peak levels in the time domain. The filter will decrease the total energy of the signal, but a decrease in signal energy can still cause an increase in peak levels. This is because the addition of a frequency component can decrease peak levels by lining up with the existing peaks in such a way, and thus removing it can conversely increase peak levels.
Just punch sin(x) + 0.2sin(3x) into a graphing calculator, then remove the 0.2sin(3x) component and look at peak levels increase. No filter can fix that without also decreasing the sin(x) component significantly to compensate.
It's even true for reconstruction. A digital waveform can represent peak levels far above "digital peak", in between samples.
This is why if you're mastering songs, you'd better keep your peak levels at -0.5dB or -1dB so (so the filtering from lossy compression won't make it clip), and why you'd better use an oversampling limiter. Especially if you're doing loudness war style brutal limiting, because that's the stuff that really creates inter sample peaks. But you shouldn't be doing that, because Spotify and YouTube will just turn your song down to -14 LUFS anyway and all you'll have accomplished is making it sound shitty :-)