Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> writing HIPPA instead of HIPAA

Ouch. You'd think in 2018 spellcheck would have solved this for newspaper.



Tangential but interesting: conventional spell checks, in my experience, tend to turn off for all caps words, or words that start with a capital. Turning it on always can be annoying, so you need something smart, which detects common errors, but doesn't hose you with false positives.

Tangential to this tangent: Google Docs spell check catches wrong use of "flower" vs "flour" based on context. Try this: open a new doc, type:

  Flower
No error. Now add:

  Flower, eggs, sugar
Error on flower. But if you do:

  Flower, sugar, eggs
No error. Milk also works.

ML? Or regex?


They probably use a bloom filter and populate it with a bunch of 'probably spurious' word pairs. The combination of 'Flower' and 'eggs' will evaluate as 'probably spurious' but the combination 'flower' and 'sugar' evaluates as 'not spurious'. This is probably a manually populated bloom filter of common spurious word pairs.

For reference, a bloom filter is an extremely space efficient, probabilistic data structure that acts a bit like a set and can answer the query 'does the bloom filter contain this entry'. The bloom filter will respond with either 'definitely not' or 'possibly/probably' depending on how it is tuned.

You could conceivably automatically populate this (still hardcoded) bloom filter by doing a brute force language corpus search for heavily correlated word pairs that have one or more of the two words having phonetically similar misspellings. E.g. 'sea' and 'breeze' would be heavily correlated. 'Sea' has a phonetically identical misspelling 'see'. You could then automatically add 'see + breeze' as a spurious pairing to the filter.


I think Google has pretty good deep learning-based word prediction for their Swype-style Android keyboard.

There was a period, maybe 1.5 years ago, during which text input prediction got substantially worse, then gradually improved. Along with the change came the ability for text input to change the estimated word after you entered the next word, using the combination of your entries to both words to estimate both simultaneously.

If they have language models which perform that task at a level worth pushing out to consumers they can do some smoothing of entries in a list.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: