gb's of storage, potentially 10+ for very large datasets.
Minutes, not days. Very big data sets might take 30+ minutes (or even a couple of hours), but usefulness starts in the first few minutes (because of the priority algorithm)
Actually a few hundred documents is really no biggy, my current benchmarks is in the range of <250ms (instant feeling) for hundreds of thousands of paragraphs.
Certainly not, presuming it doesn't escalate to the level of trademark/service mark infringement (to be fair, IANAL). Just a risk consideration...your product, your call.
But I think there's value in at least recognizing that the namespace is quite crowded given the collisions that two interweb randoms were able to identify in short order.
I’m using a fine-tuned t5-small model, I fune-tuned it for two tasks, question answering from a paragraph, and highlighting relevant text of search results.
I'll keep it haha