I just wanted to add a comment that I never knew but if you google Times New Roman they display the entire Google web search results page in Times New Roman.
Honestly, I have been using Claude Code for 4 months now and I honestly don't know if to trust all the people dooming on it for the past few weeks. The time required to test a new tool properly is also not insignificant; so I would honestly like some input from this community.
Most likely yes. I don't think companies can be blamed for not wanting to subject themselves to EU regulations or uncertainty.
Edit: Also, if you don't want to follow or deal with EU law, you don't do business in the EU. People here regularly say if you do business in a country, you have to follow its laws. The opposite also applies.
1. No one is training on users' bank details, but if you're training on the whole Internet, it's hard to be sure if you've filtered out all PII, or even who is in there.
2. This isn't happening because no one has time for more time-wasting lawsuits.
> No one is training on users' bank details, but if you're training on the whole Internet
Tencent has access to more than just bank accounts.
In the West there's Meta that this year opted everyone in their platform into training their AI.
> This isn't happening because no one has time for more time-wasting lawsuits.
No, this isn't happening because a) their training data is, without fail, trained on material they shouldn't have willy-nilly access to and b) because they want to pretend to be open source without being opensource
Doesn't that mean if they used data created by, (or even the data of), anyone in the EU, that they would want to not release that model in the EU?
This sounds like "if an EU citizen created, or has data referenced, in any piece of the data you trained from then..."
Which, I mean, I can kind of see why US and Chinese companies prefer to just not release their models in the EU. How could a company ever make a guarantee satisfying those requirements? It would take a massive filtering effort.
This seems to mirror the situation where US financial regulations (FATCA) are seen as such a hassle to deal with for foreign financial institutions that they'd prefer to just not accept US citizens as customers.
> > This sounds like "if an EU citizen created, or has data referenced, in any piece of the data you trained from then..."
> Yes, and that should be the default for any citizen of any country in the world.
This is a completely untenable policy. Each and every piece of data in the world can be traced to one or more citizens of some country. Actively getting permission for every item is not feasible for any company, no matter the scale of the company.
I think that’s kinda the point that is being made.
Technolgy-wise, it is clearly feasible to aggregate the data to train an LLM and to release a product on that.
It seems that some would argue that was never legally a feasible thing to do, based on the training data being impossible to use legally. So, it is the existence of many of these LLMs that is (legally) untenable.
Whether valid or not the point may be mute because, like Uber, if the laws actually do forbid this use, they will change as necessary to accommodate the new technology. Too many “average voters” like using things such as ChatGPT and it’s not a hill politicians will be willing to die on.
> Actively getting permission for every item is not feasible for any company, no matter the scale of the company.
There's a huge amount of data that:
- isn't personal data
- isn't copyrighted
- isn't otherwise protected
You could argue if that is enough data, but neither you nor corporations argue that. You just go for "every single scrap of data on the planet must be made accessible to supranational trillion-dollar corporations, without limits, now and forever"
In Meta's case, the problem is that they had been given the go-ahead by the EU to train on certain data, and then after starting training, the EU changed its mind and told them to stop.