Hacker Newsnew | past | comments | ask | show | jobs | submit | jeffreyw128's commentslogin

This was my favorite course in college!


This is cool!

If you want to add embeddings over internet as a source, you should try out exa.ai. Includes: wikipedia, tens of thousands of news feeds, Github, 70M+ papers including all of arxiv, etc.

disclaimer: I am one of the founders (:


I will add it. Its very easy to integrate new search engines.


looks siiiick. congrats + good luck


Hey Paulista,

Cofounder Jeff here. Both!


Exa | San Francisco | In person | Full time | $130K-350K

Jeff, cofounder of Exa.ai here. LLMs represent a brand new opportunity to organize humanity's knowledge, in a way that hasn't been done before. We're an AI research lab focused on AI-powered search algorithms (using embeddings), currently applied to vast swaths of the web (we make our money as a search API).

A little about us: - Raised series A a few months ago. https://techcrunch.com/2024/07/16/exa-raises-17m-lightspeed-... - 15 people, fully in person in SF. Our team - https://exa.ai/team - Our mission: https://exa.ai/blog/superknowledge

We're hiring pretty broadly across engineering - AI research, high performance Rust (e.g., we build an in-house vector DB), and full stack. If the mission of organizing the Internet motivates you, it's a good fit :)

https://exa.ai/careers


Can I email you my resume? I am interested in the backend engineer role!


Exa | San Francisco | In person | Full time | $130K-350K

Jeff, cofounder of Exa.ai here. LLMs represent a brand new opportunity to organize humanity's knowledge. We're an AI research lab focused on AI-powered search algorithms (using embeddings), currently applied to vast swaths of the web (we make our money as a search API).

A little about us:

- Raised series A a few months ago. https://techcrunch.com/2024/07/16/exa-raises-17m-lightspeed-...

- 15 people, fully in person in SF. Our team - https://exa.ai/team

- Our mission: https://exa.ai/blog/superknowledge

We're hiring pretty broadly across engineering - AI research, high performance Rust (e.g., we build an in-house vector DB), and full stack. If the mission of organizing the Internet motivates you, it's a good fit :)

https://exa.ai/careers


Exa | San Francisco | In person | Full time | $130K-180K

Jeff, cofounder of Exa here. LLMs represent a brand new opportunity to organize humanity's knowledge, in a way that hasn't been done before. We're an AI research lab focused on AI-powered search algorithms (using embeddings), currently applied to vast swaths of the web (we make our money as a search API).

We're hiring pretty broadly across engineering - AI research, high performance Rust (e.g., we build an in-house vector DB), and full stack. If the mission of organizing the Internet motivates you, it's a good fit :)

https://exa.ai/careers


Missed exa.ai! Embeddings-based search engine with its own index


How does an embeddings based search work? Without hallucinating bad links?


Not sure what they are doing but embeddings and hallucination are completely separable imo (you can have hallucination even without embedding-based retrieval). Likely you have an embedding for the query which is close to the embedding of the doc for some measure of similarity. That could be semantic similarity or even user behavior.


Embeddings arnt grnerative AI.

Theyre just vecotors of arbitrary.dimension and similarity is calculated by a ndimensional fnction.


Exa | San Francisco | In person | Full time | $130K-180K

Jeff, cofounder of Exa here. LLMs represent a brand new opportunity to organize humanity's knowledge, in a way that hasn't been done before. We're an AI research lab focused on AI-powered search algorithms (using embeddings), currently applied to vast swaths of the web (we make our money as a search API).

We're hiring pretty broadly across engineering - AI research, high performance Rust (e.g., we build an in-house vector DB), and full stack. If the mission of organizing the Internet motivates you, it's a good fit :)

https://exa.ai/careers


Hi Jeff,

How does Exa's search product compete with https://en.wikipedia.org/wiki/Perplexity.ai?


The issue with traditional search engines is that keyword-first algorithms are extremely gameable.

Try https://search.metaphor.systems - it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

So in the mentioned example of searching for Youtube downloaders, with Metaphor you'll get only Youtube downloaders (https://search.metaphor.systems/search?q=This%20is%20the%20b...)

Full disclosure - I work there :p


How is that different from keywords? Embeddings aren't magic, they're just page content. Content is trivial to game since it's controlled by the website owner.

edit: The results are also from my quick QA not that great. Searching for "what is the best mouse to buy" leads to links to buy random mice versus review summaries or online discussions on mice. One of the recommended queries of "Here is a great fun concert in San Francisco" leads to some really bizarre results in non-English languages that have nothing to do with either SF or concerts.

edit2: Also, Google has been using LLMs part of their search since at least 2018 so definitely not just keyword matching there.


Yup, definitely still gameable but if the model learns what high quality content is like and what high quality webpages there are (which it does), then the only way to game would be to be great :)

For your search - I would recommend turning autoprompt off and searching something like "Here is a great summary of the best computer mice to use:".

Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it


> Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

So it's not high quality web pages but web pages that people talk about a lot which is expected since no one has an oracle that says what high quality is. The embeddings are merely a proxy and generalization for "how links are talked about on the Internet." That can be gamed at scale just like every other signal any popular search engine has been based off of.


That's true. Although should be much harder


The first result vtubego.com is a 144MB downloader app. The page contains "Pricing Plans Lorem ipsum dolor sit amet, placerat verterem luptatum phaedrum vis, impetus mandamus id vix fabulas vim." above its 3 paid plans (there is no free plan).

I haven't installed the downloader app, so I'm not sure if it lets me download youtube videos for free.

The second result "ytder.com" is a redirect to "https://poperblocker.com/edge/" which seems to be a browser extension for Microsoft Edge that protects the user from the Holy See. I'm not using Edge and I'm trying to download a Youtube video.

The third result download-video.net says that it can download videos from a list of sites. Youtube is not in the list, but let's try anyway. If you put "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box and click "download" you get "500 SyntaxError: Unexpected token '<', ""

At this point I gave up, but please let me know if any of the results work.


This is excellent!

Definitely excited to see how it holds up to daily use.

So far it gave me exactly what I wanted at the top for all of my test queries that were well formed.

As for asking “ignorant” questions both your service and the goog failed where phind gave me an actionable starting point (after a prodding follow up question: https://www.phind.com/search?cache=hmul4znpn7y4ei6qa64fosmc )

“max-height like css property for top and left”

Unsure if this sort of thing is even a goal of your project, but you won over a new user.

Wish you and your team all the best.


> with Metaphor you'll get only Youtube downloaders

I clicked into the top 5 results, none of them were real youtube downloaders that worked, so I clicked the next 5 results, then I finally got one single (really slow) downloader that worked. 1 out of 10 top results


https://getthatvideo.com/ Is the first result for downloading YouTube videos. Seems super sus (especially since the site doesn’t load).

Auto-prompted to: "Here's a helpful website for downloading YouTube videos:"

Also, this result is horrible:

“What does it mean if someone is not covered in nfl football?”


>it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

What prevents websites from gaming their embedding? Switching to a similarity search doesn't prevent the results from being gamed.


So far so good. I'll try using this first from now on, and see how it does. Good luck!


How do you deal with dynamically/contextually generated content? And how about paywalls and login-required content?


Do our best at getting the right content.

For paywalls/login - we play pretty straight, always obey robots.txt, etc.


Just wait until the content farms adapt


This is the dumbest thing I've read in a long time


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: