I also use niche questions a lot but mostly to check how much the models tend to hallucinate. E.g. I start asking about rank badges in Star Trek which they usually get right and then I ask about specific (non existing) rank badges shaped like strawberries or something like that. Or I ask about smaller German cities and what's famous about them.
I know without the ability to search it's very unlikely the model actually has accurate "memories" about these things, I just hope one day they will acutally know that their "memory" is bad or non-existing and they will tell me so instead of hallucinating something.
I'm waiting for properly adjusted specific LLMs. A LLM trained on so much trustworth generic data that it is able to understand/comprehend me and different lanugages but always talks to a fact database in the background.
I don't need an LLM to have a trillion parameters if i just need it to be a great user interface.
Someone is probably working on this somewere or will but lets see.
For me it's mostly about indentation / scope depth. So I prefer to have some early exits with precondition checks at the beginning, these are things I don't have to worry about afterwards and I can start with the rest at indentation level "0". The "real" result is at the end.
I was wondering the whole time why people in the comments are so hyped about this, then I finally noticed (after I stumbled upon a comment about running this on a mobile phone) that it's "270M" not "270B" model :)
I also don't get it. I mean if the training data is publicly available, why isn't that marked as dangerous? If the training data contains enough information to roleplay a killer or a hooker or build a bomb, why is the model censored?
That disk was indeed impressive. What impressed me even more than all of it fitting on a disk and being pretty fast was that it just worked. This was at a time where I did my first experiences with Linux (redhat and suse) and often I could not even get the xserver to start. This disk however just worked...
I've been experimenting with using various LLMs as a game master for a Lovecraft-inspired role-playing game (not baked into an application, just text-based by prompting). While the LLMs can generate scenarios that fit the theme, they tend to be very generic. I've also noticed that the models are extremely susceptible to suggestion. For example, in one scenario, my investigator was in a bar, and when I commented to another patron, 'Hey, doesn't the barkeeper look a little strange?', the LLM immediately seized on that and turned the barkeeper into an evil, otherworldly creature. This behavior was consistent across all the models I tested. Maybe by prompting the LLM to fully plan the scenario in advance and then adhere to that plan would mitigate the behavior but I haven't tried it. It was just an experiment and I actually had a lot of fun with the behavior. Also the reactions of the LLM if the player does something really unexpected (e.g. "the investigator pulls a sausage out of his pocket and forcefully sticks it into the angry sailors mouth") are sometimes hillarious.
> on, 'Hey, doesn't the barkeeper look a little strange?', the LLM immediately seized on that and turned the barkeeper into an evil, otherworldly creature.
Though making it an evil otherwordly creature is a bit extreme, it's at least similar to what a flexible GM can do. In my DMing days, I would often develop new paths that integrated into the whole inspired by things my players noticed/suspected.
In my GM days, I had a lot of trouble with players that tried really their best to completely leave the path I prepared for them.
You are right though and it's not that I completely dislike the LLMs "flexibilty" and openness to suggestions. However, it's also super easy to use it for "cheating". E.g. it generated a scenario with an evil entity about to attack me and some friendly NPC and I could "solve" that problem by telling the NPC "remember the device I gave you last week and told you to always keep on hand? pull the trigger now!" (that never happend, at least to the LLMs knowledge) and the LLM made up some device that shot a beam of magic light at the creature and stopped it.
Have you used any thinking models? I remember being surprised by QwQ-32B when I tried it. It would think about what I said and how it should respond, reiterate the behaviors I had assigned to it, and respond accordingly. That constant self-reinforcement in the thinking phase seemed to keep it on track.
I know without the ability to search it's very unlikely the model actually has accurate "memories" about these things, I just hope one day they will acutally know that their "memory" is bad or non-existing and they will tell me so instead of hallucinating something.
reply