Hacker Newsnew | past | comments | ask | show | jobs | submit | saigal's commentslogin

we intend to


Yes totally agree. You can easily sniff out products that are simple wrap of GPT


Wouldn't a full featured OS GUI be a simple wrap of the command line? Would this make it less valuable to have?


I think it would make it unusably slow.


We also encounter a lot of build vs buy conversations with businesses.


Enterprises are spending lots of time and money on this. The biggest issue that has slowed down sales cycle at this stage has been data governance. Most folks think it’s about accuracy or latency (which of course is an issue) but data governance can make this whole thing a non starter.


Can you explain more about why governance is the issue with a service like this? Companies not wanting their data to go off prem?


yes. some want BYOC solutions. others don't want to even be perceived as being used to train an LLM. not to mention CCPA, GDPR, etc etc etc.

lots of questions around what data is being sent to the LLM, or just schema.


Interesting. So by open sourcing you think companies can self host and it negates some of these issues? Or is your goal into increase future contributions to keep the project alive and developing?

What % of the NL -> SQL problem is solved in the current version? Ie is this something ready for some type of prod work now, or is it “in 2-3 years we’ll be there”?


Not OP, but there was an EHR SaaS company on HN a day or two back with a similar proposition: it’s open source, so it can be independently verified from a security perspective. It was interesting to me because the code was unusable to normal folks, and even other companies - one of the founders described their moat being the trouble of actually integrating with the ecosystem, and weren’t worried about competitors using it. It really hammered home to me how open source is more and more a marketing lever lately.


There are organizations using Dataherald in production right now.

The latency is ~20-30s and it takes some set up, so as long as those are not blockers it can be used in prod.


For companies that are willing to put in some effort, the self hosting option is a great one. There are certain use cases where this works now, and is already in production. These tend to be use cases with some constraints and don’t deal with very sensitive data.


https://discord.com/invite/A59Uxyy2k9

discord invite in case anything comes up


there is a middle ground here. the most complicated queries will need the intel and business context of a smart data scientist. there are however so many types of queries where automation would make the world so much easier and allow more self-serve type data inquiries. too often the rhetoric around these topics is binary as in "it works" or "it doesn't work." in reality, there are certain use cases that work now and others that don't yet.


fantastic. let us know how it goes :-)


is there an easier way?


Yes write SQL


The question was around row level security


Yes please do. We’d love your feedback and or to hear whether you see material improvement over what you have now


No


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: