Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Now the question is how can I, someone without a PhD in history but currently a PhD candidate in another discipline, use these tools to reliably interrogate topics of interest and produce at least a graduate level understanding of them?

I know this is possible, but the further away I get from my core domains, the harder it is for me to use these tools in a way that doesn’t feel like too much blind faith (even if it works!)



I think the trick here is to treat everything these models tell you as part of a larger information diet.

Like if you have a friend who's very well-read and talkative but is also extremely confident and loves the sound of their own voice. You quickly learn to treat them as a source of probably-correct information, but only part of they way you learn any given topic.

I do this with LLMs all the time: I'm constantly asking them clarifying questions about things, but I always assume that they might be making mistakes or feeding me convincing sounding half-truths or even full hallucinations.

Being good at mixing together information from a variety of sources - of different levels of accuracy - is key to learning anything well.


This strikes me as an odd claim. You don't hang around with a friend who makes things up because they somehow enhance your learning process. You hang around with them despite the fact they're annoyingly unreliable, presumably because you value their company for other reasons.

Let's say you're trying to get a university degree, but having a professor who makes up 20% of what they say. Is that helping you "learn well"?


Once everything has made it through the jungle telephone you're lucky if it's 80% correct. 20% wrong is a downright reliable source by human standards, at least about topics which people care about.


But a human can tell you if they are not too sure and completely sure.


This is cute, but ultimately not true nor helpful.


You might want to read the academic criticisms of an influential pop history book written by an academic, such as Sapiens.

And 20% is way overstated, esp for a SOTA model when it comes to verifiable facts.


I don't think that's a useful comparison. Humans writing history books have agendas and biases, but they're usually fairly transparent. In contrast, LLM failure modes are more or less impenetrable and very non-human. You're just inexplicably served with some very convincing but incorrect stuff.


20% is a harsh figure but it could be a good entry point to figure out the unknown unknowns and go in depth once you have the relevant keywords using more reliable sources.


Well that sounds like oral history, which is how all people used to learn. Strictly fact check everything you say seems like a modern invention.


Confidence hijacks the human brain. Without direct, personal expertise or experience to the contrary, spending time around your hypothetical "friend who's very well-read and talkative but is also extremely confident and loves the sound of their own voice" is going to subconsciously influence your opinions, possibly without you even knowing.

It's easy to laugh and say, well I'm smart enough to defeat this. I know the trick. I'll just mentally discount this information so that I'm not unduly influenced by it. But I suspect you are over-indexing on fields where you are legitimately an expert—where your expertise gives you a good defense against this effect. Your expertise works as a filter because you can quickly discard bad information. In contrast, in any area where you're not an expert, you have to first hold the information in your head before you can evaluate it. The longer you do that, the higher the risk you integrate whatever information you're given before you can evaluate it for truthfulness.

But this all assumes a high degree of motivation and effort. Like the opening to this article says, all empirical evidence clearly points in the direction of people simply not trying when they don't need to.

Personally, I solve the problem in my friend circle by avoiding overconfident people and cultivating friendships among people who have a good understanding of their own certainty and degree of expertise, and the humility to admit when they don't know something. I think we need the same with these AIs, though as far as I understand getting the AI to correctly estimate its own certainty is still an open problem.


> Like if you have a friend who's very well-read and talkative but is also extremely confident and loves the sound of their own voice. You quickly learn to treat them as a source of probably-correct information, but only part of they way you learn any given topic.

I can't speak to everyone's experience - but whenever I'm having a conversation around relatively complex topics with MY friends - the deeper they dive, the more they're constantly referring back to their dive computer. They'll also try to make arguments that are principally anchored to the pegs that they're convinced will hold. I'm aware I'm mixing metaphors here but the point stands.

As far as "mixing information" - yes there are commonly known tricks to trying to get a more accurate answer:

- Query several LLMs

- Query the same LLM multiple times with different context histories

- Socratically force it to re-assess itself

- Provide RAG / documents / access to search engines

- Force quantitative tests in the form of virtualized envs though this is more for Compsci/Tech/Math

etc.

LLMs don't currently have a good sense of their boundaries - they can't provide realistic confidence scores and weight their outputs accordingly - the human equivalent of saying, "I only have a passing familiarity with the original greek of the Septuagint, but I think...."

It's a poor use of an LLM as a glorified fact checker - it's far better as a tool for free form exploration.

> Being good at mixing together information from a variety of sources - of different levels of accuracy - is key to learning anything well.

I have a pretty extensive background in teaching/education and I would heavily disagree with this assertion - at least when starting as a complete novice. The key to learning well is to establish a strong foundation by learning from the most accurate resources as possible. When you pick up a musical instrument, you don't want a teacher who's just one page ahead of you in the lesson book.


You ask them for references and check yourself. They are good exploratory and hypothesis generating tools, but not more. Getting a sensible sounding answer should not be an excuse for you to confirm. Often, the devil is in the details.


> the harder it is for me to use these tools in a way that doesn’t feel like too much blind faith (even if it works!)

I tend to ask multiple models and if they all give me roughly the same answer, then it's probably right.


Also keeping context short. Virtually all my cases of bad hallucinations with o1 have been when I've provided too much context or the conversation has been going on for too long. Starting a new chat fixes it.

You can see this effect in the ARC-AGI evals, too much context impacts even o3(high).


> if they all give me roughly the same answer, then it's probably right.

... or they had a lot of overlapping training data in that area.


Or maybe they were just trained on the same (incorrect) dataset.


I wrote a chat app built around mistrust for LLM responses. You can see an example here:

https://beta.gitsense.com/?chat=ed907b02-4f03-477f-a5e4-ce9a...

If you click on the Evaluation links, you can see how you can use multiple LLMs to validate LLM response. The evaluation of the accurate response is interesting since Llama 3.3 was the most critical.

https://beta.gitsense.com/?chat=fdfb053d-f0e2-4346-bdfc-7305...

At this point, you would ask Llama to explain why the response was not 100% which you can use to cross reference other LLMs or to do your own research.


> Now the question is how can I, someone without a PhD in history but currently a PhD candidate in another discipline, use these tools to reliably interrogate topics of interest and produce at least a graduate level understanding of them?

You can't. Because LLM's are statistical generative text algorithms, dependent upon their training data set and subsequent reinforcement. Think Bayesian statistics.

What you are asking for is "to reliably interrogate topics of interest", which is not what LLM's do. Concepts such as reliability are orthogonal to their purpose.


I find them useful in summarizing State of the Art to get me going in a new topic, but then again so is Wikipedia. A useful side angle, if you using LaTeX, you can cut-and-paste references into ChatGPT and can turn them into Bibtex format with >80% success. For a PHD study though starting from textbooks, papers etc. it will be better, but can augment successfully, like any tool use it for what is best.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: