I agree many of these things are a pain. This often reflects a workflow that is approaching things from entirely the wrong direction. ("If I wanted to go there, I wouldn't start from here.")
E.g. instead of trying to OCR a PDF, go back to the source document or database or whatever from which the PDF was generated. (Yes, I know that's not always an option. But it should be the first avenue to explore. We should push back against people who send around PDFs as though they were an all-purpose interchange format for textual or structured data.)
I'm a bit puzzled by (3), though:
> Office to PDF ... it's not easy ... when people see their PDF looks very different than what they saw on Word, they get upset
To get a PDF that looks the same as the Word document, just tell them to use the Print to PDF driver from right there within Word.
I think you recognize this already, but to add a bit of color, in highly regulated industries (e.g. financial services) and B2B settings with lots of peers (e.g. supply chain), "going back to the source document or database or whatever" requires an insane amount of consensus (which is not currently incentivized).
To add to that, a lot of PDFs (e.g. financial reports) are generated procedurally with ancient code that would have to be rewritten to generate a different format. The underlying database format is often many layers of abstraction different than the final output.
E.g. instead of trying to OCR a PDF, go back to the source document or database or whatever from which the PDF was generated. (Yes, I know that's not always an option. But it should be the first avenue to explore. We should push back against people who send around PDFs as though they were an all-purpose interchange format for textual or structured data.)
I'm a bit puzzled by (3), though:
> Office to PDF ... it's not easy ... when people see their PDF looks very different than what they saw on Word, they get upset
To get a PDF that looks the same as the Word document, just tell them to use the Print to PDF driver from right there within Word.