Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great concept, bravo!

Here is some feedback:

1. I am trying it right now, but it seems some of my documents are not getting embedded.

I've tried adding 3 documents from their URL. I can see them if I hit Datasets > my dataset > Documents > View. But there is nothing (zero rows) in the dataset. And when I try using the Playground, my context (result of the search) returns nothing.

Am I doing something wrong?

2. It seems to work if I manually upload the same documents in the UI. However, for some reason your upload system restricts the file types, so I can only upload them as .txt files, but not as .md files.

3. I don't see a way to show a delimiter in-between the returned search results in the {{context}}. This means that the context is one mashup of content from several documents, and it confuses GPT sometimes. I would prefer to get "n" context variables that I can use as I want in my prompt: {{context-1}},{{context-2}} and so on.

4. Similarly, I don't see a way to have the metadata included in the "context". It's too bad, because there seems to be no way to tell the LLM that "this content is in document ABC and the other is from document BDF". If it was possible, then the LLM would be able to point the user to the right resource in its answer. So I would prefer {{context}} to be a JSON (which GPT models understand perfectly) of the chunks, with their text and metadata, rather than just a hodgepodge concatenation of all the chunks.

5. I need to be able to provide "reference links" for a completion. Is there a way to identify the chunks used in the API response for completion? Right now it seems like I need to run a bunch of queries:

- a `search` query to get access to the relevant chunks (rows), which doesn't tell me which document they are from,

- a `/api/datasets/{id}/rows` query to list all the rows and find the associated documents for each (which could be a request with a gigantic payload or multiple paginated requests)

- and then the completion query.

Couldn't the completion query return : here are the rows used in this completion, and here is the associated document for each row?

I am loving the concept of Baseplate and we'll very likely integrate this right away as it solves a big need we had for an upcoming feature of our own. Nice work, this is impressive!

Sébastien



Thanks for the feedback Sébastien! 1. Are you adding documents through the API with URLs? URL parsing is only supported through the UI right now, through the API it will just add that url as the metadata. 2. Will add md support soon! 3/4. We are working on ways to give more configuration to how the context is formatted like you mentioned, keep an eye out for updates! 5. The response from the completion API should return a "search_results" array which includes the chunks and other columns/metadata/images. If the array is empty there may be something wrong in the search config, happy to hop on a call some time or help in our Discord

Hope this helps!


Thanks, I appreciate the thorough response!

I would love it if we get more customization on the context. I just tested on a use case that is close to what our app needs for the back-end, and I'm getting chunks that are too short. It seems I can set how many chunks come up in the context, but not their length.

Other than that, you've got a subscriber. Congrats! I've just entered our CC number.


Glad to hear that! Btw you should be able to configure the chunk size and overlap when you upload through the UI, or when using our /upload API (https://docs.baseplate.ai/api-reference/documents/upload-doc...).


Perfect! I played around with this and got exactly what I needed. Thanks :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: