I wonder if a hacky solution may be to have some kind of intermediate model to serialize the text (whether from an image of it or the raw PDF data) into LaTeX? I imagine the LM has seen enough formulas in TeX to understand it, but in most PDFs formulas are just jumbles of letters.
And we'll push an update soon with better loading :)