I'm fairly certain that the lower bits are masked away on memory reads by pretty much everything that has an on-board cache anyhow, so they're fair game. Some ISAs even mandate this masking-away for large-than-byte loads.
My guesswork for x64 would be that all is good if dereferencing the tagged value would hit in the same cache line as dereferencing the untagged value. Though I could also be persuaded that x64 completely ignores the top 16 bits until the last moment (to check consistency with the 17th bit) in which case high tagging would be free. It seems relatively likely to be something that is different across different x64 implementations. But so far I'm just running with "it's probably fine, should benchmark later"