I've heard horror stories and have held off so far in 'upgrading'. In the end I don't really want the fully flexible responses people are leaning into with these llm tools. All I want is to be able to give a precise instruction with my voice and have the machine reliably perform the action that it performed the last time I gave that instruction.
Since that seems to be an increasingly niche desire (at least as far as the product managers are concerned), I've been looking more and more seriously at setting up my own local voice assistant. My main barrier has been hardware—the mic arrays in the Home devices are surprisingly good and hard to beat with cheap off-the-shelf components, and you need a good mic for good STT.
Yeah, I would like Siri to actually play the album I asked for and not something completely different phonetically from what I asked. Or even when I set an alarm and not be told "I can't connect to the internet right now" while I'm using my laptop connected to the internet. Or if my internet is down to actually use the speakers that I bought as speakers.
The hardware is really well done but the software is either over or under-engineered to a stupid degree.
It also sometimes asks to unlock my phone for commands that plain old Assistant was happy to do while locked. I haven't really found it useful at all yet, free ChatGPT is just better than free Gemini for "LLM stuff" and Google Assistant is better for "smart home stuff"
I've been thinking about trying the OpenAI integration for home assistant[1], because controlling things in my home is primarily what I use my assistant shortcut for. The normal assistant works well enough but can be frustrating if you don't remember the exact phrasing it wants to activate a certain command.
I have it set up with ollama. It’s… interesting. HA commands are provided to the model as tools so it works as well as the model is able to determine when and how to use tools. From experimenting with that and my own tool use code I’ve found that models vary greatly in their ability to wield tools and none that I’ve tried are exceptional.
It’s neat that you can intermix general chatting with HA commands but you’re probably going to find that the old assist is more reliable for commands. What I do like is that you can use a template as your system prompt so you can provide the state of a number of entities and then ask for them with natural language. That works well.
I have an Alexa/Echo voice announcement system set up and have recently tied that into assist so I can do automations like if the garage opens I prompt for “what is the state of the garage?” and announce the result. Makes it feel more humane than the same plain announcements all the time.
Do try it. I've been running it ever since it got integrated into the core, mostly to control A/C units around our flat, and it's the best voice assistant experience I had to date.
I mean honestly, how is it possible Amazon, Apple, Google and Microsoft[0] all keep screwing this up for over a decade now? I literally spent 15 minutes hooking up GPT-4 to the Home Assistant integration, and I was able to semi-reliably[1] control actual devices[2] like air conditioners and smart lights, in a completely natural and ad-hoc way, by talking to my smartwatch on the go, or to a phone, whatever was more convenient at the moment.
It's a really magical experience, a step closer to Star Trek reality. And what makes it possible is not just LLMs being able to deal with natural language, but more importantly, "bring your own API key" model allowing to cut away all the bullshit that FAANG assistants are stuck in.
--
[0] - Ever since they dropped MS Speech API in Windows, and did the Cortana thing. Some 15 years passed, at this point, and I'd still prefer to work with the Speech API than to touch any of the FAANGs' voice assistant - it worked, and worked off-line!
[1] - Works ~90% of the time; some 5% of the time the voice model (from Home Assistant Cloud) misunderstands me, and 5% of the time the LLM gets confused. It's still worth it, because I can actually talk to it like to a person, without thinking of style or grammar or magic keywords.
[2] - Which, given the level of integration of Home Assistant companion app with the phone, can be easily turned into an equivalent of on-phone voice assistant that can do more than the one I got from Google. Critically, there are ways to couple Home Assistant app and Tasker, so it's not hard to make it do arbitrary things on your phone. And, if you don't like low-ish code Tasker experience, you can trivially shell out from Tasker to Termux, at which point sky is the limit. Point being, an enthusiastic non-developer with minimal tech aptitude can beat Google and Apple at the voice assistant game today.