More

alexwebb2 · 2025-12-22T16:10:26 1766419826

> Python is most dominant language on the planet

JavaScript would like a word!

win311fwg · 2025-12-22T16:32:57 1766421177

English laughs at their supposed dominance.

__MatrixMan__ · 2025-12-22T16:42:53 1766421773

Math peers through a microscope and smiles at all the Earth languages. So cute.

win311fwg · 2025-12-23T16:17:43 1766506663

Math isn't a language, of course. What we oft refer to as "standard notation" is, but it is as earthy as all the others.

__MatrixMan__ · 2025-12-23T18:31:24 1766514684

> of course

It's reasonable to disagree, but let's not pretend that some alternative is the obvious choice.

The laws of physics as a program and math as its language is just as good as any other framing. And its a short logical hop to expect that more than one civilization would discover that language.

alexwebb2 · 2025-12-09T21:40:37 1765316437

You're correct, OP used the word "hallucination" wrong. A lot of these other comments are missing the point – some deliberately ('don't they ONLY hallucinate, har har'), some not.

For those who genuinely don't know – hallucination specifically means false positive identification of a fact or inference (accurate or not!) that isn't supported by the LLM's inputs.

- ask for capital of France, get "London" => hallucination

- ask for current weather in London, get "It's cold and rainy!" and that happens to be correct, despite not having live weather data => hallucination

- ask for capital of DoesNotExistLand, get "DoesNotExistCity" => hallucination

- ask it to give its best GUESS for the current weather in London, it guess "cold and rainy" => not a hallucination

alexwebb2 · 2025-11-19T16:03:24 1763568204

LLM spam

alexwebb2 · 2025-11-19T16:02:54 1763568174

LLM spam

alexwebb2 · 2025-10-26T19:21:12 1761506472

0-10 in each domain. It’s a weird table.

jagrsw · 2025-10-27T09:14:25 1761556465

The simple additive scoring here is sus here. It means a model that's perfect on 9/10 axes but scores 0% on Speed (i.e., takes effectively infinite time to produce a result) would be considered "90% AGI".

By this logic, a vast parallel search running on Commodore 64s that produces an answer after BeaverNumber(100) years would be almost AGI, which doesn't pass the sniff test.

A more meaningful metric would be more multiplicative in nature.

alexwebb2 · 2025-10-15T15:17:15 1760541435

> correctly simulating the environment interactions, the sequence of progression, getting the all the details right, might take hundreds to thousands of years of compute

Who says we have to do that? Just because something was originally produced by natural process X, that doesn't mean that exhaustively retracing our way through process X is the only way to get there.

Lab grown diamonds are a thing.

Chance-Device · 2025-10-15T15:22:11 1760541731

Who says that we don’t? The point is that the bounds on the question are completely unknown, and we operate on the assumption that the compute time is relatively short. Do we have any empirical basis for this? I think we do not.

alexwebb2 · 2025-02-17T00:14:58 1739751298

It’s tedious shooting down all of these backwards-from-conclusion things from the anti-AI crowd.

Good thing I have an intelligent AI that can respond for itself!

——

There appear to be several potential issues with the paper's argumentation:

1. False Dichotomy in Systems Comparison - The paper appears to create an artificial divide between "thermodynamic systems" and "computer systems" - This ignores that computers are also physical systems governed by thermodynamics - The distinction between biological and artificial systems may be one of degree rather than kind

2. Evolutionary Argument Problems - The paper assumes consciousness/intelligence requires evolutionary history - This is a correlation-causation fallacy - just because biological intelligence evolved doesn't mean evolution is the only path to intelligence - It fails to consider that artificial systems could potentially develop goal-oriented behaviors through other mechanisms - The argument would also imply that any hypothetical alien intelligence that evolved differently from Earth life couldn't be conscious

3. Goal-Orientation Assumptions - Claims computers "lack goal-orientation essential for consciousness" - This begs the question by assuming: a) Consciousness requires goal-orientation b) Only evolutionary processes can create genuine goal-orientation - Neither assumption is clearly justified

4. Methodological Issues - Using multiple disciplines (physics, biology, philosophy, neuroscience) could be a strength, but could also indicate cherry-picking convenient arguments from each field - The abstract suggests a conclusion-driven approach rather than following evidence to a conclusion

5. Consciousness-Intelligence Conflation - The paper appears to conflate consciousness with intelligence - These are separate concepts - we could potentially have AGI without consciousness, or consciousness without human-level intelligence - Many AGI researchers aren't claiming to create consciousness, just general problem-solving ability

6. Definitional Vagueness - Based on the abstract, it's unclear how the paper defines key terms like: - Artificial General Intelligence - Consciousness - Goal-orientation - Mind creation - Without clear definitions, the arguments may be attacking straw men

7. Predictive Cognition Argument - The claim that AGI is an "illusion shaped by the information our minds receive" could be turned around - The same argument could be used to claim that AGI skepticism is an illusion shaped by our cognitive biases - This is essentially a form of psychological dismissal rather than substantive argument

8. Historical Perspective - The paper seems to ignore that many previously "uniquely human" capabilities have been successfully mechanized - Claims about fundamental impossibility need to account for why previous similar claims have often been wrong

9. Thermodynamic Argument Issues - While biological systems are indeed complex thermodynamic systems, the paper needs to demonstrate why this specific physical implementation is necessary for intelligence - Many complex behaviors can be implemented through different physical mechanisms - The argument risks confusing the substrate with the function

10. Scope Problem - The paper makes a very strong claim ("AGI is and remains a fiction") - To justify this, it would need to prove not just that current approaches won't work, but that NO possible approach could ever work - This is a much harder philosophical and scientific claim to defend

alexwebb2 · 2025-02-13T21:30:18 1739482218

I think the idea here is that it literally can't be traced to the user – at no point is there anything passed that would allow Kagi to make the association between the user and the query.

mortar · 2025-02-13T21:41:39 1739482899

Thanks, yes completely agree! I guess the part I’m concerned with is the politically side whereby they could be potentially compelled to change the method slightly after the fact and be forced to slip something in somewhere in a quite technical process now making it possible.

I’d love to assume this will never happen, I’m just concerned that even if it did I’d never find out - Because unfortunately the more popular this service gets for bad actors, the more of a target it becomes for the government with identification of users.

I guess as a search engine, we could assume the government may leave them well alone and still just focus on content creators.

prophesi · 2025-02-13T22:21:50 1739485310

The best that we can do is to continue working on FOSS solutions that make it technically impossible to backdoor. I haven't grok'd the protocol yet, but it seems to claim you only have to trust the client. The client is open source, so it would be hard for it to be backdoor'd without the community noticing.

Cryptography is a literal godsend for people living under oppressive regimes.

mortar · 2025-02-13T22:00:26 1739484026

I see this now, thanks for the clarity!

alexwebb2 · 2025-01-18T14:40:23 1737211223

GPT 3.5 has been very, very obsolete in terms of price-per-performance for over a year. Bit of a straw man.

alexwebb2 · 2025-01-18T14:08:59 1737209339

I think your intuition on this might be lagging a fair bit behind the current state of LLMs.

System message: answer with just "service" or "product"

User message (variable): 20 bottles of ferric chloride

Response: product

Model: OpenAI GPT-4o-mini

$0.075/1Mt batch input * 27 input tokens * 10M jobs = $20.25

$0.300/1Mt batch output * 1 output token * 10M jobs = $3.00

It's a sub-$25 job.

You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.

simonw · 2025-01-18T18:01:29 1737223289

You might be able to use an even cheaper model. Google Gemini 1.5 Flash 8B is Input: $0.04 / Output: $0.15 per 1M tokens.

17 input tokens and 2 output tokens * 10 million jobs = 170,000,000 input tokens, 20,000,000 output tokens... which costs a total of $6.38 https://tools.simonwillison.net/llm-prices

As for rate limits, https://ai.google.dev/pricing#1_5flash-8B says 4,000 requests per minute and 4 million tokens per minute - so you could run those 10 million jobs in about 2500 minutes or 42 hours. I imagine you could pull a trick like sending 10 items in a single prompt to help speed that up, but you'd have to test carefully to check the accuracy effects of doing that.

w10-1 · 2025-01-18T19:08:11 1737227291

The question is not average cost but marginal cost of quality - same as voice recognition, which had relatively low uptake even at ~2-4% error rates due to context switching costs for error correction.

So you'd have to account for the work of catching the residue of 2-8%+ error from LLMs. I believe the premise is for NLP, that's just incremental work, but for LLM's that could be impossible to correct (i.e., cost per next-percentage-correction explodes), for lack of easily controllable (or even understandable) models.

But it's most rational in business to focus on the easy majority with lower costs, and ignore hard parts that don't lead to dramatically larger TAM.

gf000 · 2025-01-18T22:18:12 1737238692

I am absolutely not an expert in NLP, but I wouldn't be surprised if for many kinds of problems LLMs would have far less error rate, than any NLP software.

Like, lemmation is pretty damn dumb in NLP, while a better LLM model will be orders of magnitude more correct.

griomnib · 2025-01-19T02:04:34 1737252274

This assumes you don’t care about our rapidly depleting carbon budget.

No matter how much energy you save personally, running your jobs on Sam A’s earth killer ten thousand cluster of GPUs is literally against your own self interest of delaying climate disasters.

LLM have huge negative externalities, there is a moral argument to only use them when other tools won’t work.

amanaplanacanal · 2025-01-19T09:55:17 1737280517

It's digging fossil carbon out of the ground that's the problem, not using electricity. Switch to electricity not from fossil carbon and you're golden.

griomnib · 2025-01-21T18:02:13 1737482533

Drowning isn’t the problem; just the water.

renewiltord · 2025-01-19T08:31:34 1737275494

Haha, this is pretty good. I’m going to take a plane to SF while I laugh at this.

elicksaur · 2025-01-18T14:35:57 1737210957

How do you validate these classifications?

bugglebeetle · 2025-01-18T17:53:20 1737222800

The same way you check performance for any problem like this: by creating one or more manually-labeled test datasets, randomly sampled from the target data and looking at the resulting precision, recall, f-scores etc. LLMs change pretty much nothing about evaluation for most NLP tasks.

segmondy · 2025-01-18T17:38:55 1737221935

The same way you validate it if you didn't use an LLM.

jeswin · 2025-01-18T14:44:17 1737211457

Isn't it easier and cheaper to validate than to classify (requires expensive engineers)? I mean the skill is not as expensive - many companies do this at scale.

scarface_74 · 2025-01-18T14:47:54 1737211674

You need a domain expert either way. I mentioned in another reply that one of my niches is implementing call centers with Amazon Connect and Amazon Lex (the NLP engine).

https://news.ycombinator.com/item?id=42748189

I don’t know the domain beforehand they are working in, I do validation testing with them.

axegon_ · 2025-01-18T15:21:40 1737213700

Yeah... Let's talk time needed for 10M prompts and how that fits into a daily pipeline. Enlighten us, please.

FloorEgg · 2025-01-18T17:50:02 1737222602

Run them all in parallel with a cloud function in less than a minute?

hnfong · 2025-01-18T17:54:05 1737222845

Obviously all the LLM API providers have a rate limit. Not a fan of GP's sarcastic tone, but I suppose many of us would like to know roughly what that limit would be for a small business using such APIs.

jdietrich · 2025-01-18T19:19:15 1737227955

The rate limits for Gemini 1.5 Flash are 2000 requests per minute and 4 million tokens per minute. Higher limits are available on request.

https://ai.google.dev/pricing#1_5flash

4o-mini's rate limits scale based on your account history, from 500RPM/200,000TPM to 30,000RPM/150,000,000TPM.

https://platform.openai.com/docs/guides/rate-limits

simonw · 2025-01-18T18:02:43 1737223363

Surprisingly, DeepSeek doesn't have a rate limit: https://api-docs.deepseek.com/quick_start/rate_limit

I've heard from people running 100+ prompts in parallel against it.

axegon_ · 2025-01-18T22:34:28 1737239668

Yes, how did I not think of throwing more money at cloud providers on top of feeding open ai, when I could have just code a simple binary classifier and run everything on something as insignificant as an 8-th geh, quad core i5....

FloorEgg · 2025-01-19T02:55:55 1737255355

Did I mention openai?

FloorEgg · 2025-01-19T03:51:27 1737258687

Ah my bad someone further up thread did.

Really it boils down to balance of time and cost, and the skill set of the person getting the job done.

But you seem really anti establishment (hung up over $25 cloud spend), so you do you.

Just don't expect everyone else to agree with you.

rlt · 2025-01-18T19:58:50 1737230330

Also can’t you just combine multiple classification requests into a single prompt?

FloorEgg · 2025-01-19T02:55:21 1737255321

Yes, for such a simple labelling task request rate limits are more likely the bottleneck than token rate limits.

LeafItAlone · 2025-01-18T14:12:23 1737209543

>You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.

How much for the “prompt engineer”? Who is going to be doing the work and validating the output?

blindriver · 2025-01-18T14:28:47 1737210527

You do not need a prompt engineer to create: “answer with just "service" or "product"”

Most classification prompts can be extremely easy and intuitive. The idea you have to hire a completely different prompt engineer is kind of funny. In fact you might be able to get the llm itself to help revise the prompt.

alexwebb2 · 2025-01-18T14:19:43 1737209983

All software engineers are (or can be) prompt engineers, at least to the level of trivial jobs like this. It's just an API call and a one-liner instruction. Odds are very good at most companies that they have someone on staff who can knock this out in short order. No specialized hiring required.

otabdeveloper4 · 2025-01-18T14:30:00 1737210600

> ..and validating the output?

You glossed over the meat of the question.

alexwebb2 · 2025-01-18T14:38:36 1737211116

Your validation approach doesn't really change based on the classification method (LLM vs NLP).

At that volume you're going to use automated tests with known correct answers + random sampling for human validation.

IanCal · 2025-01-18T14:28:44 1737210524

Prompt engineering is less and less of an issue the simpler the job is and the more powerful the model is. You also don't need someone with deep nlp knowledge to measure and understand the output.

LeafItAlone · 2025-01-18T16:43:26 1737218606

>less and less of an issue the simpler the job

Correct, everything is easy and simple if you make it simple and easy…

IanCal · 2025-01-18T22:54:04 1737240844

Plenty of simple jobs required people with deeper knowledge of AI in the past, now for many tasks in businesses you can skip over a lot of that and use a llm.

Simple things were not always easy. Many of them are, now.