I agree with Antirez on the fear: I call this the "Google Maps effect", the more you use Google Maps, the less you learn about city streets, etc.
By the way, I am Italian too; I lived in the US (SF) for ~10 years, and I speak (I think) a pretty good English.
More than 20 years ago I wrote a novel, Nonovvio, in Italian. A couple of months ago I decided to translate it to English, using GPT-4.
Here you can find both the (original) Italian version, and the one created with GPT-4. [0].
I don't think the novel is amazing nor a work of art; however, what I find interesting is how prescient I was on a number of things that happened after I finished writing it in the early 2000s.
> I call this the "Google Maps effect", the more you use Google Maps, the less you learn about city streets, etc.
I call it the "pocket lighter effect". The more you rely on pocket lighters, the less you learn about kindling and fires. But for some reason, people don't fear this development.
My parents bought me a GPS unit when I first moved away. While using it was convenient, I discovered that I did not actually learn to orient myself with respect to that city. It wasn't until my GPS unit was stolen and I had to go at it on my own that my mental map "snapped" into place.
That is one reason why I'm hesitant to use LLMs to the extent that so many have. If we don't use certain skills, we lose them. I learned this the hard way with French. In my teens, I was conversant in the language thanks to years of high school French and immersion. But, I didn't touch French again until my late thirties, and I forgot most of it. It hasn't been until recently that my French has surpassed the level I was at with it in my teens.
I always have my GPS map set to "North up", and I honestly think this has caused me to learn the layout of where I live faster. Knowing which direction you are traveling really helps.
I'm using a combination of Duolingo, Babbel, watching and reading French content, and live drilling with local French speakers. Since I live in the US, it's difficult to replicate immersion here, but I do what I can.
I'm also studying Spanish, Italian, Portuguese, German, and Esperanto. These aren't (yet) as intensive as the French, but a bucket list item of mine is to become a polyglot. Fluency isn't likely, but I should at least be able to chat with folks about food, the weather, and other pleasantries. Yeah, the language selections center on Europe and the Americas, but these languages are at least somewhat interrelated, which means that I get a boost when learning them, since I already know English and some French.
For me the freakish thing was having GPT-4 complete the prefix of a script, which was mostly a doc-comment and the inputs, written in my style. The completed body of the script was also in my style, even though I had not given GPT any actual "code" in the prompt. It somehow had learned that people that write the preamble "just so" also write script logic in a certain specific way, and it could emulate that.
One of the bigger difficulties with coming to terms with LLMs is that we are either far simpler and easy to model than we once thought, or LLMs are vastly more complex than we comprehend. I am not sure I like either.
The ease with which LLMs mimic mundane human behavior has, in some small ways, helped me prioritize my own thoughts and actions. I'll be doing something and realize an LLM could easily perform this action, complete this thought, have this conversation as well as I can. So why am I doing this when my time would probably be better served in pursuit of something else. I don't want to go overboard with this since many mundane things have merit but it does help pull me out of a rut occasionally.
LLMs offer a big advantage over other types of machine translation: they can translate the source text in multiple ways. You can then pick and choose from among those versions to produce the final wording that you like best.
Here are five translations of the same passage from a speech by the Japanese prime minister. They were all prepared by GPT-4 from a single prompt:
1. Since becoming Prime Minister, I have visited many ASEAN countries. I recently returned from visits to Malaysia and the Philippines. In every country, I have felt the depth of business ties with Japan and the strength of personal connections beyond business.
2. As Prime Minister, I have had the opportunity to visit numerous ASEAN countries. Just recently, I traveled to Malaysia and the Philippines. Everywhere I went, I witnessed the profound business connections with Japan, as well as the enduring personal relationships that transcend business.
3. Since my appointment as Prime Minister, I've visited several ASEAN countries. My recent trips included Malaysia and the Philippines. In each country, the depth of business connections with Japan and the enduring strength of interpersonal relationships beyond business were evident.
4. In my role as Prime Minister, I've visited many ASEAN countries. Recently, I was in Malaysia and the Philippines. In each nation, I felt the depth of Japan's business ties and the robustness of personal connections that go beyond mere business.
5. As Prime Minister, I've had the honor of visiting several ASEAN countries. My recent travels took me to Malaysia and the Philippines. In every country, the strength of Japan's business ties and the resilience of personal relationships beyond business were apparent.
The prompt I used and the full source text and translations are here:
I think it's rather crude and inefficient to generate several translations with a prompt.
Can't we just sample different "branches" of successive tokens with some mechanism for ensuring diversity like [1] did? This would allow the user to edit a handful of words/phrases/sentences and guide the LLM towards a more idiomatic translation.
> I hope that the fact I can write well enough in my mother language in some way it is still visibile in my English posts
I don't know if it's GPT-4 or an actual editing typo but using "visibile" instead of "visible" in that sentence is actually quite ironically funny.
On the post itself, I totally understand the point being bilingual myself with English being my second language. However, now having been living 11 years in the UK, I'm more comfortable with writing in English.
It's insane to me much how much better GPT-4 is than Papago at Korean, and Papago was the gold standard for Korean translation for quite some time. It also performs much better than DeepL in Korean.
For what it's worth, I prefer your non-LLM writing style.
Sure, it has some grammar mistakes and is awkward in a few places. I actually read it in my head with an Italian accent. But it reads to me like you in a way that the LLM post doesn't.
If more people went ahead with literal translations of their L1 idioms and style into English, we'd all have a richer lingua franca. In the mouth of the wolf, Antirez!
Personally I find your ordinary English writing more fluent seeming, and the LLM version seems more like a foreign speaker doing somewhat awkward, choppy, mildly ungrammatical line-by-line translation.
That might be confirmation bias though.
Neither of them is a dazzling gem of English prose style, but both of them seem more than fine as effective communication from a non-native speaker/writer. I certainly couldn't do any better writing in a second language. So I say write with whichever process brings you more joy or seems more convenient.
I was recently in Fiji - of note, Google Translate does not support Fijian; however GPT-4 did a lovely job of it.
GenAI is going to be transformative for low-resource languages - both making their content available to a global audience (outbound) as well as helping those that only know a rare language have access to content in all.
Hah! I'm not sure about this. It could, but it also could help with language preservation.
I was quite surprised when GPT-4 was able to talk in Lojban to me. Somehow, I doubt it was intentionally trained to speak Lojban - that's a pretty niche conlang. So, I guess, it possibly picked some on its own, just from the books and dictionaries that were fed to it?
And to my amusement, it said that it doesn't speak Lojban, but can talk in the languages from my custom instructions... to my very limited understanding, all in mostly correct Lojban (except that it used "glico" ["English" in the sense of culture] instead of "glibau" ["English" in the sense of language]). Maybe it doesn't speak it well, but certainly does much better than I.
I have recently tried out using GPT4 and other LLMs to help with writing documentation in another language (I speak, read and write it good enough for daily tasks but technical writing is a challenge). I tried it mostly out of the pure fact that many people who knew me at the workplace enjoyed my documentation and code comments because they kinda reflected a comedic / frank nature of what was going on in the code/program. However, it always struck a nerve that my writing while not native or technical brought enjoyment but the need to provide quality technical documents was something I always felt. So, instead of having a co-worker take time to explain what parts of my sentence were non-technical or could be clearer an LLM is more than happy to tell me so I can write better. I feel that when it comes to linguistics LLMs can help those who need to communicate at higher levels get better with less proof-reading required before publishing. It isn't the machine translation of yesterday and it won't replace human interaction but it certainly in my case helps to make my sentences more nuanced to the native audience.
It seems to me that one of the strengths of LLMs is their ability to _not_ differentiate between (human) languages, because to a statistical model, (for lack of a better term) people gonna people, no matter their culture or background.
I actually did a bit of research and found this prompt: Please translate the user message from {src} to {tgt}. Make the translation sound as natural as possible.
This actually gives much better results, and now I think I actually changed my mind and like GPT4 better.
I find pleasantries like "please" are wasted on bots like GPT4. Remember, it's a machine -- not a human: It doesn't get upset if it doesn't hear "please," "thank you," and "atta-boy".
It just needs some commands to follow.
Something like: "Fluently translate this text, from {here} to {there}, while preserving details" seems to be fine for me.
If the style then needs tweaked, then I edit the original command and tweak it to have it start anew: "Fluently translate this text as a peasant baker from Rhode Island in 1810, from {here} to {there}, while preserving details"
actually from a prompt engineering perspective it makes sense to use words like "please" because the llm is more likely to give better results (which it learnt from the texts it was trained on)
I did a very quick trial of this with a post I did in English. I don't speak Spanish so am not the best judge for sure but when I translated it to Spanish then back to English it was nearly flawless. Much better, in this same manner, than Google Translate.
(nitpick but: note that machine round-tripping technically only verifies enough is preserved to restore the original form, not that the intermediate form is edible frozen)
There are a lot of grammar errors and other awkward phrases in this translated blog post. Still pretty readable but worth noting it isn't a perfect translation (especially in the context of Duolingo or other learning services leveraging AI).
That’s because this post wasn’t written with the help of LLMs:
> the post you are reading is not just written by myself, but as the tradition in this blog demands, not even re-read or corrected if not for a quick second pass. This way you can see what my written English really is, and if you are curious, compare it with the post about LLMs. The difference is not less than huge.
Ah my mistake, I read that but thought that "not just written by myself" meant it was written by him with the LLM. I see now that this was not LLM augmented, it was the other post that was.
I do see some grammar errors still in the opening paragraphs of the LLM translated post on first look though...
For example: "Countless hours spent searching for documentation on peculiar, intellectually uninteresting aspects; the efforts to learn an overly complicated API, often without good reason; writing immediately usable programs that I would discard after a few hours."
That's maybe a bit informal in that it's not a sentence. But I wouldn't call it ungrammatical except from a prescriptive-grammarian viewpoint -- I read a lot and I wouldn't notice it in context.
The unusual part of the writing of this blog post is how long the sentences are and how many commas there are. I guess Italian prefers longer sentences than english?
> And fear about the potential AI has to make everybody lazy, no longer willing to do things as hard as learning a new language.
GPT-4 is also an invaluable help in learning a new language. You can use it as a tutor, explore topics you don't understand well enough, ask it to write texts tailored exactly for your level and vocabulary.
If you live in an Anglophone country you already don't need to do so, although there will probably be cultural value to do so indefinitely. I love learning languages but ultimately it's for fun and to make friends and it's such a time commitment...
Its not the 'cultural value', but the learning value.
Natively understanding another language gives you access to a whole different information sphere. With different taboos and focuses. So topics completely undiscussable or unpolite in one language, is perfectly analyzed in another language. I've noticed the level of censorship significantly increasing on dominant English platforms (Say youtube), which is harder to see, if you can only speak English.
This makes it very easy to identify cultural shibboleths, disinfo or even just low-relevance/importance information.
However, this only works if you truly have native language levels in two languages. If you find it 'painful'/'difficult' to read the second language in any way, you'll avoid it outside of work, and therefore forfeit the advantage. This is also why even the inconvenience of translation will destroy this advantage.
Also, this requires the second language to be sufficiently distant from the first, while still being a large-population language.
So knowing Danish/Irish perfectly doesn't give you access to a whole different worldview.
This criteria is harsh, and for more entertainment/cultural value, LLM translation is basically good enough even for novels, there's no point in learning a foreign language at a low level anymore.
By the way, I am Italian too; I lived in the US (SF) for ~10 years, and I speak (I think) a pretty good English.
More than 20 years ago I wrote a novel, Nonovvio, in Italian. A couple of months ago I decided to translate it to English, using GPT-4.
Here you can find both the (original) Italian version, and the one created with GPT-4. [0].
I don't think the novel is amazing nor a work of art; however, what I find interesting is how prescient I was on a number of things that happened after I finished writing it in the early 2000s.
[0]: https://github.com/simonebrunozzi/Nonovvio