More

jeffreyw128 · 2025-08-04T03:34:15 1754278455

This was my favorite course in college!

jeffreyw128 · 2025-03-11T16:41:40 1741711300

This is cool!

If you want to add embeddings over internet as a source, you should try out exa.ai. Includes: wikipedia, tens of thousands of news feeds, Github, 70M+ papers including all of arxiv, etc.

disclaimer: I am one of the founders (:

learningcircuit · 2025-03-11T21:56:38 1741730198

I will add it. Its very easy to integrate new search engines.

nhggfu · 2025-03-11T20:32:11 1741725131

looks siiiick. congrats + good luck

jeffreyw128 · 2025-02-25T19:04:09 1740510249

Hey Paulista,

Cofounder Jeff here. Both!

jeffreyw128 · on Nov 2, 2024

Exa | San Francisco | In person | Full time | $130K-350K

Jeff, cofounder of Exa.ai here. LLMs represent a brand new opportunity to organize humanity's knowledge, in a way that hasn't been done before. We're an AI research lab focused on AI-powered search algorithms (using embeddings), currently applied to vast swaths of the web (we make our money as a search API).

A little about us: - Raised series A a few months ago. https://techcrunch.com/2024/07/16/exa-raises-17m-lightspeed-... - 15 people, fully in person in SF. Our team - https://exa.ai/team - Our mission: https://exa.ai/blog/superknowledge

We're hiring pretty broadly across engineering - AI research, high performance Rust (e.g., we build an in-house vector DB), and full stack. If the mission of organizing the Internet motivates you, it's a good fit :)

https://exa.ai/careers

shukla1289 · on Nov 6, 2024

Can I email you my resume? I am interested in the backend engineer role!

jeffreyw128 · on Oct 1, 2024

Exa | San Francisco | In person | Full time | $130K-350K

Jeff, cofounder of Exa.ai here. LLMs represent a brand new opportunity to organize humanity's knowledge. We're an AI research lab focused on AI-powered search algorithms (using embeddings), currently applied to vast swaths of the web (we make our money as a search API).

A little about us:

- Raised series A a few months ago. https://techcrunch.com/2024/07/16/exa-raises-17m-lightspeed-...

- 15 people, fully in person in SF. Our team - https://exa.ai/team

- Our mission: https://exa.ai/blog/superknowledge

We're hiring pretty broadly across engineering - AI research, high performance Rust (e.g., we build an in-house vector DB), and full stack. If the mission of organizing the Internet motivates you, it's a good fit :)

https://exa.ai/careers

jeffreyw128 · on July 1, 2024

Exa | San Francisco | In person | Full time | $130K-180K

Jeff, cofounder of Exa here. LLMs represent a brand new opportunity to organize humanity's knowledge, in a way that hasn't been done before. We're an AI research lab focused on AI-powered search algorithms (using embeddings), currently applied to vast swaths of the web (we make our money as a search API).

We're hiring pretty broadly across engineering - AI research, high performance Rust (e.g., we build an in-house vector DB), and full stack. If the mission of organizing the Internet motivates you, it's a good fit :)

https://exa.ai/careers

jeffreyw128 · on June 9, 2024

Missed exa.ai! Embeddings-based search engine with its own index

HeatrayEnjoyer · on June 9, 2024

How does an embeddings based search work? Without hallucinating bad links?

janalsncm · on June 9, 2024

Not sure what they are doing but embeddings and hallucination are completely separable imo (you can have hallucination even without embedding-based retrieval). Likely you have an embedding for the query which is close to the embedding of the doc for some measure of similarity. That could be semantic similarity or even user behavior.

cyanydeez · on June 9, 2024

Embeddings arnt grnerative AI.

Theyre just vecotors of arbitrary.dimension and similarity is calculated by a ndimensional fnction.

jeffreyw128 · on June 3, 2024

Exa | San Francisco | In person | Full time | $130K-180K

Jeff, cofounder of Exa here. LLMs represent a brand new opportunity to organize humanity's knowledge, in a way that hasn't been done before. We're an AI research lab focused on AI-powered search algorithms (using embeddings), currently applied to vast swaths of the web (we make our money as a search API).

We're hiring pretty broadly across engineering - AI research, high performance Rust (e.g., we build an in-house vector DB), and full stack. If the mission of organizing the Internet motivates you, it's a good fit :)

https://exa.ai/careers

skenderbeu · on June 4, 2024

Hi Jeff,

How does Exa's search product compete with https://en.wikipedia.org/wiki/Perplexity.ai?

jeffreyw128 · on Dec 31, 2023

The issue with traditional search engines is that keyword-first algorithms are extremely gameable.

Try https://search.metaphor.systems - it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

So in the mentioned example of searching for Youtube downloaders, with Metaphor you'll get only Youtube downloaders (https://search.metaphor.systems/search?q=This%20is%20the%20b...)

Full disclosure - I work there :p

marcinzm · on Dec 31, 2023

How is that different from keywords? Embeddings aren't magic, they're just page content. Content is trivial to game since it's controlled by the website owner.

edit: The results are also from my quick QA not that great. Searching for "what is the best mouse to buy" leads to links to buy random mice versus review summaries or online discussions on mice. One of the recommended queries of "Here is a great fun concert in San Francisco" leads to some really bizarre results in non-English languages that have nothing to do with either SF or concerts.

edit2: Also, Google has been using LLMs part of their search since at least 2018 so definitely not just keyword matching there.

jeffreyw128 · on Dec 31, 2023

Yup, definitely still gameable but if the model learns what high quality content is like and what high quality webpages there are (which it does), then the only way to game would be to be great :)

For your search - I would recommend turning autoprompt off and searching something like "Here is a great summary of the best computer mice to use:".

Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

marcinzm · on Dec 31, 2023

> Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

So it's not high quality web pages but web pages that people talk about a lot which is expected since no one has an oracle that says what high quality is. The embeddings are merely a proxy and generalization for "how links are talked about on the Internet." That can be gamed at scale just like every other signal any popular search engine has been based off of.

jeffreyw128 · on Dec 31, 2023

That's true. Although should be much harder

anonymoushn · on Dec 31, 2023

The first result vtubego.com is a 144MB downloader app. The page contains "Pricing Plans Lorem ipsum dolor sit amet, placerat verterem luptatum phaedrum vis, impetus mandamus id vix fabulas vim." above its 3 paid plans (there is no free plan).

I haven't installed the downloader app, so I'm not sure if it lets me download youtube videos for free.

The second result "ytder.com" is a redirect to "https://poperblocker.com/edge/" which seems to be a browser extension for Microsoft Edge that protects the user from the Holy See. I'm not using Edge and I'm trying to download a Youtube video.

The third result download-video.net says that it can download videos from a list of sites. Youtube is not in the list, but let's try anyway. If you put "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box and click "download" you get "500 SyntaxError: Unexpected token '<', ""

At this point I gave up, but please let me know if any of the results work.

optshun · on Dec 31, 2023

This is excellent!

Definitely excited to see how it holds up to daily use.

So far it gave me exactly what I wanted at the top for all of my test queries that were well formed.

As for asking “ignorant” questions both your service and the goog failed where phind gave me an actionable starting point (after a prodding follow up question: https://www.phind.com/search?cache=hmul4znpn7y4ei6qa64fosmc )

“max-height like css property for top and left”

Unsure if this sort of thing is even a goal of your project, but you won over a new user.

Wish you and your team all the best.

KomoD · on Jan 3, 2024

> with Metaphor you'll get only Youtube downloaders

I clicked into the top 5 results, none of them were real youtube downloaders that worked, so I clicked the next 5 results, then I finally got one single (really slow) downloader that worked. 1 out of 10 top results

ec109685 · on Dec 31, 2023

https://getthatvideo.com/ Is the first result for downloading YouTube videos. Seems super sus (especially since the site doesn’t load).

Auto-prompted to: "Here's a helpful website for downloading YouTube videos:"

Also, this result is horrible:

“What does it mean if someone is not covered in nfl football?”

charcircuit · on Dec 31, 2023

>it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

What prevents websites from gaming their embedding? Switching to a similarity search doesn't prevent the results from being gamed.

ShadowBanThis01 · on Dec 31, 2023

So far so good. I'll try using this first from now on, and see how it does. Good luck!

standardUser · on Dec 31, 2023

How do you deal with dynamically/contextually generated content? And how about paywalls and login-required content?

jeffreyw128 · on Dec 31, 2023

Do our best at getting the right content.

For paywalls/login - we play pretty straight, always obey robots.txt, etc.

croes · on Dec 31, 2023

Just wait until the content farms adapt

jeffreyw128 · on Nov 28, 2023

This is the dumbest thing I've read in a long time