Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm naive to the translation tech space but is this sort of thing unique to languages like Chinese? I figured all this stuff was mostly solved. Like I wouldn't expect dflhglsdhfgalskjdf to have Google Translate output some grammatically valid Spanish output.


There is one difference between gibberish Chinese and Latin character sequences. In Chinese, each character indeed carry some meanings (like a word). So I guess the model may hallucinate some output inspired by these meanings. In the case "慢正潤牯" -> "Slow and positive", it actually translated the first two characters literally (慢 -> slow, 正 -> correct/positive/upright).

So equivalent English gibberish would be like "hast prank bibble done anut me me ions." Google translates this one to "对我而言,恶作剧已经完成了。" (To me, the prank has been done.) in Chinese -- very valid sentence, and "¿Me has hecho una broma a mí, Bibble?" in Spanish -- also seems valid.

I guess the model is (over) optimized to generate valid outputs. This can be a feature, so it still translates grammatically invalid but to some degree understandable text (like with typos or non-standard Internet language).


I think the Latin script might be somewhat protected because random jumbles of letters do appear as serial numbers and whatnot, but for other scripts, anything goes.

I say ҏӲҨЏ ҜъКѠ ЇЩіН гӞэѷ in "Russian", Google Translate says "Let's talk about it".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: