If you're prepared to do that you don't even need to run any benchmark. You can just print up the sheets with scores you like.
There if a presumption with benchmark scores that the score is only valid if the benchmark were properly applied. An AI that figures out how to reward hack represents a result not within the bounds of measurement, but still interesting, and necessitates a new benchmark.
Just saying 'Done it!' is not reward hacking. It is just a lie. Most data is analysed under the presumption that it is not a lie. If it turns out to be a lie the analysis can be discarded. Showing something is a lie has value. Showing that lying exists (which appears to be the level this publication is at) is uninformative. All measurements may be wrong, this comes as news to no-one.
I think the point of the paper is to prod benchmark authors to at least try to make them a little more secure and hard to hack... Especially as AI is getting smart enough to unintentionally hack the evaluation environments itself, when that is not the authors intent.
The Mac classic was about as pure as you could get from an architectural point of view.
A 1 bit framebuffer and a CPU gets you most of what the machine can do.
Most of the quirk abuse of 8-bit machines came from features that were provided with limitations. Sprites, but only 8 of them, colours but only 2 in any 8x8 cell. Multicolour but only in one of two palettes and you'll hate both.
Almost all of the hacks were to get around the limitations of the features.
I don't know if the decision apple made was specifically with future machines in mind. It certainly would have been a headache to make new machines 5 generations down the track if the first one had player missile graphics.
The difference is that the quantity of what is being supplied is a factor with supply of oil/gold/grain/etc.
For mining it is just necessary that it happens.
The amount of work in mining is way higher than is required to prevent another party from being able to overwhelm the Blockchain. It is that high because of the subsidy of the mining reward means if Bitcoin has a high value the reward is worth a lot.
This is factored in with the halving of the reward. Either the price will increase exponentially or the mining reward will drop. Causing mining to reduce to those who can be profitable from fees. Which rewards those who can mine most efficiently, it becomes a supply and demand calculation in a market where there are relatively low barriers for competitors.
> The amount of work in mining is way higher than is required to prevent another party from being able to overwhelm the Blockchain.
Isn’t that exactly the point? Bitcoin incentivized wasting resources. It is, according to your own comment, unnecessary to use so much computing to keep bitcoin going. But it’s being used.
THEREFORE A COMPUTER MUST NEVER MAKE A MANAGEMENT DECISION
—IBM internal training,
1979
It took me a while to realise that the premise is saying the same thing as the reason why we have so many "Computer says no" experiences today.
The conclusion only follows if you want someone to be accountable.
If you want to avoid being accountable, computers should make all management decisions.
This has nothing to do with AI other than it provides another mechanism to do that.
People saying "I'd love to help you but the computer won't let me do that" has been happening for years now.
Websites develop abusive patterns because A/B testing lets a process decide based on the goal you want, It doesn't measure the repercussions so you have made no decision to allow them.
Management read it as
A COMPUTER CAN NEVER BE HELD ACCOUNTABLE
THEREFORE THERE CAN BE NO LIABILITY IF COMPUTERS MAKE ALL MANAGEMENT DECISIONS
>AIs are not human and therefore their output is a human authored contribution and only human authored things are covered by copyright.
That is a non sequitur. Also, I'm not sure if copyright applies to humans, or persons (not that I have encountered particularly creative corporations, but Taranaki Maunga has been known for large scale decorative works)
That doesn't say much other than the rules are over in section 15.
To be protected they not only have to publish their security protocol, but adhere to it.
That's not just 'providing a PDF'
That particular section is entirely appropriate. A company can't do everything necessary to prevent every bad thing. They should do everything that they reasonably can. Someone else should decide what is reasonable.
The regulators are saying we've decided the what you have to do to be considered to have done all you could to be safe. Follow those rules, tell us how you've followed those rules, and if something bad happens and we find out that you didn't follow the rules you said we're going to nail you to the wall.
This hinges on Section 15. Which I think is inadequate because it does not meet the criteria of someone else deciding what is reasonable. Publishing their safety plans and adhering to them should be enough to grant protection from liability of harm directly to users, since the publication give individuals the ability to make an informed decision, provided they have done the safety work that they have said, a user deciding that is sufficient for them and choosing to use it should be allowable.
That should not extend to harm done to others. They don't get to choose. Consequently the standard required to be protected against claims of negligence has to be decided by a third party (experts hired by regulators ideally).
Blanket liability and blanket indemnity both go too far.
If someone makes a YoYo that blow's someone up because they made it out of explosives then they should be held liable.
If someone makes a YoYo that blow's up a city because it contained particles unknown and undetectable to any science we have, they shouldn't be to blame.
The key is that they have to have done what we think is required. Legislators get to decide what it is that is required. If a company does all of that, then they shouldn't be held responsible, because they have done all they were asked to do.
The problem is not that a law provides indemnity, the problem is that it sets the standard to qualify too low.
Mine is also pixel coloring at the lowest level. I have a shading kernel in GPU doing the low level work, mainly applying colors recursively like fractal. I got sick of writing shader code so I make a high level language supporting math operations in concise expression that are compiled to shader code in GPU. The main thing is it supports functions. That let me reuse code and build up abstractions. E.g once I get the "ring" pattern settled, it's defined as a function and I can use it in other places, combine with other functions, and have it be called by other functions.
One of these days when I get some time, I'll formalize it and publish it.
What was the reasoning behind that? Were there specific features of that inductor that led them to choose it, or did they choose it and then found some of their design relied on atypical generic inductor behaviour.
The problem with going off design sheet is you don't know what might change. There's usually a good chance that you are not depending on the difference, but it's the not knowing that gets to you.
They are suggesting bypassing the RP2350's internal switching regulator (which only needs an external coil and some caps) and replacing it with an external linear regulator (which is actually supported by the datasheet)
Switching regulators have much lower power draw (which is important when running off batteries) and generate less heat, which sometimes leads to a more compact footprint (though I'm not sure the RP2350's core uses enough power for that benefit to kick in)
The power/heat savings don't really matter for this usecase, and linear regulators have the advantage of producing more stable power, though you are hardwiring it to 1.2v (a small overvolt) rather than using the ability of the internal regulator to adjust its voltage on the fly (adjustable from 0.55c to 3.30v)
I had been pondering about doing more or less the same thing for 6502 (6510).
It was always the dilemma of whether to pull the CPU out of a C64 and replace it like this, do it as a bus mastering cartridge, or replace the RAM.
I have been leaning towards the cartridge plan to avoid the requirement of doing machine surgery. If you get the RP2350 to pretend to be the RAM then the video hardware could read directly out of it which makes all sorts of shenanigans possible (every line a BADLINE).
At some point it would look like just plugging A VIC-II and a SID into a board with the RP2350 though, The cartridge approach means you have to do transfers across into the computer's RAM, but you could also write to hardware registers every CPU cycle, which would enable some potentially new modes that would not be entirely dissimilar to every line a BADLINE.
Right now I'm mucking around with getting the RP2350 to output video constructed a scanline at a time, using as little CPU as possible. I got three layers of tiles and two layers of sprites each with different pixel formats working yesterday. Quite pleased with that. The CPU calculates a handful of values per scanline, but fetching tilemap data, then tile data, then conversion to pixel values, transparency and palette lookup are all DMA and PIO. Does 1,2,4, and 8 bits per pixel, each tile/sprite/imagebuffer layer with independent 24 bit palettes.
I think, for my use, just having the ability to write to DMA registers would have been a big advantage. It feels wasteful to have A DMA waiting on a FIFO just to write what it gets to DMA registers to do the transfer you actually wanted.
Looking at the Architecture diagram It seems like it could have allowed that and stayed on the same side of the AHB5 splitter.
There if a presumption with benchmark scores that the score is only valid if the benchmark were properly applied. An AI that figures out how to reward hack represents a result not within the bounds of measurement, but still interesting, and necessitates a new benchmark.
Just saying 'Done it!' is not reward hacking. It is just a lie. Most data is analysed under the presumption that it is not a lie. If it turns out to be a lie the analysis can be discarded. Showing something is a lie has value. Showing that lying exists (which appears to be the level this publication is at) is uninformative. All measurements may be wrong, this comes as news to no-one.
reply