Hacker Newsnew | past | comments | ask | show | jobs | submit | measurablefunc's commentslogin

There are no secrets when you are using AI providers. They track all interactions b/c that's valuable information for improving their models.

I'm talking about sharing things publicly that you are trying to claim as your own

It doesn't matter. If someone has the same idea then they can use AI the same way you did to recreate it. Keeping it a secret benefits no one other than the AI providers b/c now they can charge money for giving someone else "your" code. The AI providers don't care about license restrictions so it's the perfect way to launder code. If you want credit for something then you'll have to claim it publicly b/c the AI providers sure as hell are not going to give you any credit.

strange downvotes, not only these services allow anyone with money to copy their competitors if they use the same services, but on the long run, Anthropic could very well be the competition, trained on corporations that use Claude. Why would this startup be any different from Google or Microsoft on the long run? People can't seem to learn their lesson.

People are very naive about how technology companies operate.


Even if you believe the "we don't train on your data" claim/lie, that leaves a whole lot of things they can do with it besides training directly on it.

Analytics can be run on it, they can run it through their own models, synthetic training data can be derived from it, it can be used to build profiles on you/your business, they could harvest trade/literal secrets from it, they could store derivatives of your data to one day sell to competitors/compete themselves, they can use it to gauge just how dependent you've made yourself/business on their LLMs and price accordingly, etc.


No. Your data or any derivative of it does not leave RAM unless you are detected as doing something that qualifies as abuse, then it is retained for 30 days.

Even the process of deciding what "qualifies as abuse" does what I'm talking about: they're analyzing your data with their own models and doing whatever they want with the results, including storing it and using it to ban you from the product you paid for, and call the police on you.

Either way, I don't believe it.


And you believe them?

Yes. That's the rational position.

That's about the API. It doesn't say anything about their other products like Codex. Moreover, even in the API it says you have to qualify for zero retention policies. They retain the data for however long each jurisdiction requires data retention & they are always improving their abuse detection using the retained data.

> Our use of content. We may use Content to provide, maintain, develop, and improve our Services, comply with applicable law, enforce our terms and policies, and keep our Services safe. If you're using ChatGPT through Apple's integrations, see this Help Center article (opens in a new window) for how we handle your Content.

> Opt out. If you do not want us to use your Content to train our models, you can opt out by following the instructions in this article . Please note that in some cases this may limit the ability of our Services to better address your specific use case.

https://openai.com/policies/row-terms-of-use/ https://openai.com/policies/how-your-data-is-used-to-improve...


Codex just talks to the responses API with store=false. So unless the model detects you are doing something that qualifies as abuse, nothing is retained.

Alright, good luck to you. I'm not really interested in talking to people who think they're lawyers for AI providers. If you think they don't keep any of the data & don't use it for training then you are welcome to continue believing that. It makes no difference to me either way.

> Alright, good luck to you. I'm not really interested in talking to people who think they're lawyers for AI providers.

Codex is open source, you can inspect it yourself, but let's not let facts ruin your David vs Goliath fantasy.


This is a lot of useful data for the next iteration of Claude because not only does Anthropic have the final artifacts but they also saw the entire workflow from start to finish & Facebook paid them for the privilege of giving them all of that training data.

Only if you assume they don't honor their enterprise agreements.

I assume all chat logs are used for training in one way or another because it would be foolish to not do that.

More training data at this point leads to marginal improvements, curve is flattening. So advantage is low. Especially when Anthropic definitely has the budget and talent to carry out the same study.

On the other hand, having it leak that you train on your customers data, ignoring the opt-out, is probably existential when close alternatives exist in the market.


You probably also thought Anthropic did not use pirated PDFs. You don't know how these companies actually operate & you don't know what weasel language they use in their contracts to get away w/ exactly what I assume to be the case.

There is no AI, all these companies have is the chat logs so unless you have further evidence on what they do or don't do behind the scenes I recommend you use a more conservative approach in your assumptions about what they use or don't use for training.


No, why would they care about using pirated PDFs? Did you actually read/understand what I wrote? Violating their customers comes with risk for them. Violating the copyright of unrelated texbook authors does not. If that's even what they did.

They are currently paying book authors over a billion dollars in damages. You're out of your depth in this discussion so further engagement is not going to be fruitful for anyone involved. Good luck.

Oh no, not 0.2% of their valuation! The end is near for Anthropic. Humanity is saved. By the copyright lobby, of all people.

Yes, it's well known that money & prices are what make people act rationally. We'd still be slinging mud & rocks if it wasn't for money & prices.

Tangentially related from something I'm currently reading¹:

> This is the reality of twenty-first-century resource exploitation: reducing vast quantities of rock into granules and chemically processing what remains. It is both awe inspiring and disturbing. One risk is that the cyanide and mercury used in the method could escape into the surrounding ecosystem. After all, while miners like Barrick insist they follow all the rules laid down by the US Environmental Protection Agency (EPA), campaigners warn that pollution often finds its way out of the mine. Indeed, a few years earlier the EPA had fined Barrick and another nearby miner $618,000 for failing to report the release of toxic chemicals including cyanide, lead and mercury. But the main thing I was struck by as I observed each stage in this process was just how far we will go these days to secure a tiny shred of shiny metal.

> The scale, for one thing, was mind-boggling. As I looked down into the pit I could just about make out some trucks on the bottom, but only when they emerged at the top did I realise that they were bigger than three-storey buildings; the tyres alone were the size of a double-decker bus. How much earth do you have to remove to produce a gold bar? I asked my minders. They didn’t know, but they did know that in a single working day those trucks would shift rocks equivalent to the weight of the Empire State Building.

¹ Material World: A Substantial Story of Our Past and Future by Ed Conway


> in a single working day those trucks would shift rocks equivalent to the weight of the Empire State Building.

Oh. My. God.


With four parameters I can fit an elephant, and with five I can make him wiggle his trunk so there is still room for improvement.

Except learning to reason is a far cry from curve fitting. Our brains have more than five parameters.

After a quick content browse, my understanding is this is more like with a very compressed diff vector, applied to a multi billion parameter model, the models could be 'retrained' to reason (score) better on a specific topic , e.g. math was used in the paper

It's the statistics equivalent of 'no one needs more than 640kb of RAM'

My very first PC was a Packard Bell with 640KB of RAM. If I’d known, I’d have saved all my RAM for retirement…

speak for yourself!

reasoning capability might just be some specific combinations of mirror neurons.

even some advanced math usually evolves applying patterns found elsewhere into new topics


I agree, I don't think gradient descent is going to work in the long run for the kind of luxurious & automated communist utopia the technocrats are promising everyone.

It's not that simple. Production costs have gone up for everyone, inflation is going to get worse so the simple logic of "higher prices, higher profits" doesn't really work in this case.

There will be a short term long term thing with this. I agree with you that ultimately everyone loses long term. Short term the higher prices will result in higher profits which will enrich whoever owns the oil.

We aren't at the end of the inflation, though, that's going to hit. This is only the beginning. Next year will be when things really go south. At this point it's not a question of if, but rather how bad.


I agree.

It's not clear or obvious why continuous semantics should be applicable on a digital computer. This might seem like nitpicking but it's not, there is a fundamental issue that is always swept under the rug in these kinds of analysis which is about reconciling finitary arithmetic over bit strings & the analytical equations which only work w/ infinite precision over the real or complex numbers as they are usually defined (equivalence classes of cauchy sequences or dedekind cuts).

There are no dedekind cuts or cauchy sequences on digital computers so the fact that the analytical equations map to algorithms at all is very non-obvious.


Continuous formulations are used with digital computers all the time. Limited precision of floats sometimes causes numerical instability for some algorithms, but usually these are fixable with different (sometimes less efficient) implementations.

Discretizing e.g. time or space is perhaps a bigger issue, but the issues are usually well understood and mitigated by e.g. advanced numerical integration schemes, discrete-continuous formulations or just cranking up the discretization resolution.

Analytical tools for discrete formulations are usually a lot less developed and don't as easily admit closed-form solutions.


It is definitely not obvious, but I wouldn't say it is completely unclear.

For instance we know that algorithms like the leapfrog integrator not only approximate a physical system quite well but even conserve the energy, or rather a quantity that approximates the true energy.

There are plenty of theorems about the accuracy and other properties of numerical algorithms.


How do they apply in this case?

This is what the field of numerical analysis exists for. These details definitely have been treated, but this was done mainly early in the field's history; for example, by people like Wilkinson and Kahan...

I just took some basic numerical courses at uni, but every time we discretized a problem with the aim to implement it on a computer, we had to show what the discretization error would lead to, eg numerical dispersion[1] etc, and do stability analysis and such, eg ensure CFL[2] condition held.

So I guess one might want to do a similar exercise to deriving numerical dispersion for example in order to see just how discretizing the diffusion process affects it and the relation to optimal control theory.

[1]: https://en.wikipedia.org/wiki/Numerical_dispersion

[2]: https://en.wikipedia.org/wiki/Courant%E2%80%93Friedrichs%E2%...


Doesn't continuous time basically mean "this is what we expect for sufficiently small time steps"? Very similar to how one would for example take the first order Taylor dynamics and use them for "sufficiently small" perturbations from equilibrium. Is there any other magic to continuous time systems that one would not expect to be solved by sufficiently small time steps?

You should look into condition numbers & how that applies to numerical stability of discretized optimization. If you take a continuous formulation & naively discretize you might get lucky & get a convergent & stable implementation but more often than not you will end up w/ subtle bugs & instabilities for ill-conditioned initial conditions.

I understand that much, but it seems like "your naive timestep may need to be smaller than you think or you need to do some extra work" rather than the more fundamental objection from OP?

The translation from continuous to discrete is not automatic. There is a missing verification in the linked analysis. The mapping must be verified for stability for the proper class of initial/boundary conditions. Increasing the resolution from 64 bit floats to 128 bit floats doesn't automatically give you a stable discretized optimizer from a continuous formulation.

Or you can just try stuff and see if it works

Point still stands, translation from continuous to discrete is not as simple as people think.

Numerical issues totally exist but the reason has nothing to do with the fact that Cauchy sequences don't exist on a computer imo.

The abstract formulation is different from the concrete implementation. It is precisely b/c the abstractions do not exist on computers that the abstract analysis does not automatically transfer the necessary analytical properties to the digital implementation. Cauchy sequences & Dedekind cuts are abstract & do not exist on digital computers.

Infinity has properties that finite approximations of it just don't have, and this can lead to serious problems for certain theorems. In the general case, the integral of a continuous function can be arbitrarily different from the sum of a finite sequence of points sampled from that function, regardless of how many points you sample - and it's even possible that the discrete version is divergent even if the continous one is convergent.

I'm not saying that this is the case here, but there generally needs to be some justification to say that a certain result that is proven for a continuous function also holds for some discrete version of it.

For a somewhat famous real-world example, it's not currently known how to produce a version of QM/QFT that works with discrete spacetime coordinates, the attempted discretizations fail to maintain the properties of the continuous equations.


Real numbers mostly appear in calculus (e.g. the chain rule in gradient descent/backpropagation), but "discrete calculus" is then used as an approximation of infinitesimal calculus. It uses "finite differences" rather than derivatives, which doesn't require real numbers:

https://en.wikipedia.org/wiki/Finite_difference

I'm not sure about applications of real numbers outside of calculus, and how to replace them there.


I can't tell if this a troll attempt or not.

If your definition of "algorithm" is "list of instructions", then there is nothing surprising. It's very obvious. The "algorithm" isn't perfect, but a mapping with an error exists.

If your definition of "algorithm" is "error free equivalent of the equations", then the analytical equations do not map to "algorithms". "Algorithms" do not exist.

I mean, your objection is kind of like questioning how a construction material could hold up a building when it is inevitably bound to decay and therefore result in structural collapse. Is it actually holding the entire time or is it slowly collapsing the entire time?


You should provide evidence & examples for your claims if you want to be taken seriously.

Precisely!

No need to engage with an article that makes naked assertions with little backing.

Ok, fine then...:

"But they have no more consciousness, sensitivity, and sentience than a hammer. " -- naked assertion, no backing, no definition, no ope rationalization, no scientific or philosophical work shown (and this is a spicy one, because there's been philosophical turf wars on this for half a century, you can't just ASSERT that)

"Every device made by man has an off switch. We can use it sometimes." -- I have stories. Semi-Explosive near death stories. At any rate... uh, not quite?

Look, at very least he's sloppy here. Mostly just a raw opinion piece I guess, but not really backed by much that is real. Just so you know, this cost me more time than the text even deserves.


This is similar to AWS & their Graviton VMs.

The author does not exist & the paper is pure nonsense: https://scholar.google.com/citations?user=G97KxEYAAAAJ&hl=en. Might even be a psyop by some 3 letter agencies. So the obvious question, why did you post this?

Sorry for the confusion, even though the author's names may not have an active record on Scholar. But I would like to share it here because I read the paper, and I find it interesting.

You read the paper? All 459 pages of it? And you missed e.g. this gem on page 257? "[11:23:54] CLAUDE: OPUS 5 — 606 pages, need 194 more. FINAL PUSH. Write these in first person as Logan. MAXIMUM DENSITY:"

I'm sorry.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: