Mistral 7B

anon1253 · on Oct 11, 2023

It works really really well for chatbots and roleplay applications (at least for me). The fine-tune on the instruct version is rather meh however, and I recommend https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/ if you plan on using it out-of-the-box. Take note of the prompt template, you'll get really undesired results otherwise (basically just garbage). I've been running it on my pet projects with llama.cpp and the inference is blazing fast even with my mediocre 2080 Super

all2 · on Oct 11, 2023

For those that want it, The model mention in the parent is available on Ollama and was updated 3 minutes ago: https://ollama.ai/library/mistra-openorca

You guys are on top of things.

anon1253 · on Oct 11, 2023

Cool! And on top of that TheBloke has the quantized models of course:

https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF

https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF

https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGU...

I typically run my models with the 5_K_M or 6_K quantization, but it really depends on the use case and requirements

T-A · on Oct 11, 2023

Just tried Mistral-7B-OpenOrca-GGUF. It's impressive - and delusional. Open-Orca training seems to have convinced it it's ChatGPT:

Me: Hello. This is the first time we chat. What can you tell me about your capabilities?

AI: Hi there! I'm an advanced language model developed by OpenAI, which means that I've been trained on a vast amount of text data to provide accurate and relevant information based on what you ask or discuss. My capabilities include understanding various languages, providing insights from diverse knowledge sources, generating creative content, and assisting with decision-making. Additionally, I can learn new tasks and improve over time through continuous feedback and updates.

Me: Do you have the ability to access the web?

AI: Yes, I am connected to the internet and can search for and retrieve data from a wide range of online resources. My training encompasses diverse textual datasets which provide me with extensive knowledge across multiple domains. However, as an AI model, I may occasionally encounter limitations in processing real-time information due to latency or potential connectivity issues. Nonetheless, my access to the internet allows me to keep learning and updating my understanding of the world around us.

dragonwriter · on Oct 12, 2023

> Just tried Mistral-7B-OpenOrca-GGUF. It's impressive - and delusional. Open-Orca training seems to have convinced it it's ChatGPT:

The worst thing to ask any bare LLM about (i.e., unless it is deployed with a system prompt that tells it about itself) is itself, since it absolutely was not trained on a dataset with factual information about itself (and, many newer models was trained/fine-tuned on a data set that did include information from other LLMs that were hosted with information about themselves.)

nzealand · on Oct 12, 2023

Hmmmm.

> ollama run falcon

This isn't right.

> ollama run mistral-openorca

This doesn't work.

all2 · on Oct 12, 2023

I found it was mistra-openorca:7B_q4_0 or something like that. Note that "mistral" is missing an L. They may have fixed this.

throwanem · on Oct 12, 2023

You need the tag, like with a Docker image.

    ollama run mistral-openorca:7b

brucethemoose2 · on Oct 11, 2023

Try

https://huggingface.co/Undi95/Mistral-11B-CC-Air-GGUF

https://huggingface.co/Undi95/Mistral-11B-CC-Air-RP-GGUF

anon1253 · on Oct 11, 2023

I'll give those a shot as well, thanks! It's a tricky balance sometimes between "I should actually finish building the thing I am trying to build" and "ooooh shiny new model to try for a bit...", however.

lxe · on Oct 11, 2023

How does OpenOrca compare with Zephyr-7B? Both are fine-tuned mistral-7B's AFAIK.

https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha

These LLMs are dropping so quickly it's hard to keep up these days!

seaal · on Oct 11, 2023

Average performance seems to be very similar.

>Zephyr alpha is a Mistral fine-tune that achieves results similar to Chat Llama 70B in multiple benchmarks and above results in MT bench (image below). The average perf across ARC, HellaSwag, MMLU and TruthfulQA is 66.08, compared to Chat Llama 70B's 66.8, Mistral Open Orca 66.08, Chat Llama 13B 56.9, and Mistral 7B 60.45. This makes Zephyr a very good model for its size.

source: https://www.reddit.com/r/LocalLLaMA/comments/174t0n0/hugging...

nwoli · on Oct 11, 2023

What prompts do you use for role play? (I have some myself but I never see people write up prompts like this so im curious if im missing out on fun versions.)

anon1253 · on Oct 11, 2023

I typically write them myself in the form of a "you are-such-and-so, your role is this-and-that. As such-and-so you have the following traits..." and so on. Sometimes I let some other AI rewrite it. There's very little method or science to it for me: if it feel right, it's right. Typically I find the first few chat-lines of the prompt (i.e. the chat history in the context) to be much more decisive to the conversation flow than the actual prompt itself. But it's all just "prompt" of course. My biggest realization in making the things go was "it's just a wall of text, the chat bits are just a thin facade". Write the prompt the way you want the text to continue, basically. It's a fancy Eliza. The folks over at https://www.reddit.com/r/LocalLLaMA/ sometimes share their (sometimes NSFW) prompts as well though. Right now I'm working on a minimalist interactive journaling app (a diary that talks back), and it's been a lot of fun to do and learn

Applejinx · on Oct 11, 2023

It's always so weird to me that this works at all. There is no 'you'. It's weights in an impossibly complex network. It seems to me that there surely must be another approach to prompt-making that would be more effective than 'this is another intelligence like me, I will tell it how I want it to act'. It's really not, it's something else.

dragonwriter · on Oct 11, 2023

> It's always so weird to me that this works at all. There is no 'you'. It's weights in an impossibly complex network. It seems to me that there surely must be another approach to prompt-making that would be more effective than 'this is another intelligence like me, I will tell it how I want it to act'. It's really not, it's something else.

Yes, but that "something else" is designed (both via architecture and training data) to predict the language response from humans of language used by humans to communicate with humans, so addressing it like a human addresses a human doesn't just work well coincidentally, but by design.

soerxpso · on Oct 11, 2023

Although you're correct that it's not exactly 'another intelligence like me,' what it IS is an algorithm that's trained to respond in the way that another intelligence like you would respond. In the corpus of human text, second person instructions are generally followed by text that adheres to the instructions.

There is an alternative that I've found has tradeoffs, where you give it its instructions in third person, e.g. 'Sam is an intelligent personal assistant. The following is a discussion between Sam and Max --- Max: [question]? --- Sam:' You tend to get slightly more coherent responses with that format, because you've hooked into the part of its mind that knows how text looks in textbooks and guides, which are usually well-edited. However, it often gives more 'dry' responses, because you've moved away from the part of its mind that's familiar with human-to-human forum RP.

Applejinx · on Oct 12, 2023

Ah, that's interesting. So you're able to lean it towards particular contexts by the way you frame the prompt? That would follow, and makes sense.

Implies that the system's behavior isn't only controlled by the prompt, but by how you ADDRESS the prompt. So, start believing that it's a person and address it as such, and it's going to lean towards engaging with you as if it is a person, further misleading you.

sharkjacobs · on Oct 11, 2023

> there surely must be another approach to prompt-making that would be more effective than 'this is another intelligence like me, I will tell it how I want it to act

I don't think that this is especially beneficial for the LLMs, the benefit of chat interface is that humans are social animals with lots of experience forming prompts like this.

all2 · on Oct 11, 2023

I'm very curious to see your setup and maybe a demo. Do you have a git repo I can look through?

anon1253 · on Oct 11, 2023

Probably soon! I'll post it here. Still finalizing some Retrieval Augmented Generation things. It's written in Clojure with a very thin HTMX front-end. However there are some interesting things like using gbnf grammar constraints creatively for chain-of-thought reasoning. It's a one-person job though but I've always wanted a diary that feels like someone to talk to, and the tech is finally here!

anon1253 · on Oct 15, 2023

Code is up https://github.com/vortext/esther but it's still heavily work in progress :-)

joennlae · on Oct 11, 2023

Llama1 --> 1.0T Llama2 --> 2.0T Mistral --> ??

They do not publish how many tokens it is pre-trained on, additionally to sharing no info on datasets used (except for fine-tuning).

To my knowledge, no one has trained a larger LLM (>250M) to the capacity limit. As discussed in the original GPT3 paper (https://twitter.com/gneubig/status/1286731711150280705?s=20)

TinyLlama is trying to do that for 1.1B: https://github.com/jzhang38/TinyLlama

As long as we are not at the capacity limit, we will have a few of these 7B beats 13B (or 7B beats 70B) moments.

mark_l_watson · on Oct 11, 2023

I look forward to more released Mistral 7B docs in the future. I spent more time with Mistral 7B tuned version yesterday and it really is amazing. Subjectively, I find it better than any of the 13B models I have used. I support Camenduru on Patreon and I used one of his many Colab notebooks yesterday https://colab.research.google.com/drive/1-UK_PE8R3xktlwoXqCf...

vjb2tq4dws · on Oct 11, 2023

Can you make the colab public, it does not seem to be accessible!

mark_l_watson · on Oct 11, 2023

I just made the notebook public, please try again.

vjb2tq4dws · on Oct 11, 2023

thanks a lot, I can see the colab now.

benxh · on Oct 11, 2023

It's missing a lot of crucial details. Nothing on the dataset used, nothing on the data mix, nothing on their data cleaning procedures, nothing on the tokens trained.

dazed_confused · on Oct 11, 2023

What we get when it is on arxiv first before being peer reviewed.

arugulum · on Oct 11, 2023

BERT was on arXiv before being peer reviewed. As were T5, BART, LLaMA, OPT and GPT-NeoX-20B. The Pile and FLAN were also on arXiv before being peer reviewed. Of course, the original Transformer paper was also on arXiv before being peer reviewed.

Being on arXiv before being peer reviewed is not the or even a problem.

jmac01 · on Oct 11, 2023

I cud almost tell this would be the case when the title of the paper was simply Mistral 7B. A little more info would be useful!

brucethemoose2 · on Oct 11, 2023

> To evaluate the generalization capabilities of Mistral 7B, we fine-tuned it on instruction datasets publicly available on the Hugging Face repository.

Heh, they won't even say what datasets they used for chat finetuning.

> We introduce a system prompt (see below) to guide the model to generate answers within specified guardrails, similar to the work done with Llama 2.

This was totally undocumented in the initial model release.

Other than that... Not much really new? We already know it uses SWA, though it works without SWA in current llama implementations, and SWA isnt new either.

If most upcoming base models are this mysterious on release, the field is going to be... weird.

riedel · on Oct 11, 2023

Weird is the right term: do they want to demonstrate with this arxiv paper that they manage reformat a blog post into latex and upload it to a preprint site after publication?

dang · on Oct 11, 2023

Recent and related: Mistral 7B - https://news.ycombinator.com/item?id=37675496 - Sept 2023 (618 comments)

Is there significant new information here? (That's the test we use for followups:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...)

fgfm · on Oct 11, 2023

Well, that was a blog post, but they just released a research paper. And in comparison to the blogpost, they indeed added more information regarding the attention mechanism they used, details about the architecture, more evaluation results (Arena Elo rating) etc.

Not saying it's novel, but it's useful from a research perspective and well appreciated that they added new information in there I would say. But let me know if you feel differently

visarga · on Oct 11, 2023

Thin paper for a thin & capable model, it is great to have it. It made my 2080Ti smarter than ever. But why emulate OpenAI style of white papers?

anonyfox · on Oct 11, 2023

Is there some convenience wrapper around this to drop-in replace the OpenAI api with it?

I‘d like to put this on a modest DO droplet or Fly.io machine, and be able to have a private/secured HTTP endpoint to code against from somewhere else.

I heard that you could force the model to output JSON even better than ChatGPT with a specific syntax, and that you have to structure the prompts in a certain way to get ok-ish outputs instead of nonsense.

I have some very easy classification/extraction tasks at hand, but a huge quantity of them (millions of documents) + privacy restrictions, so using any cloud service isn’t feasible.

Running something like mistral as a simple microservice, or even via Bumblebee in my Elixir apps natively would be _huge_!

LoganDark · on Oct 11, 2023

A simple microservice would be https://github.com/huggingface/text-generation-inference .

Works flawlessly in Docker on my Windows machine, which is quite shocking.

Supports Mistral as well as everything else.

Biggest downside is that there's no way to operate the tokenizer through the API. I put in a feature request but they said "you really ought to write your own specialized client-side code for that". Real bummer when the server already supports everything, but oh well.

It has token streaming, automatically halts inference on connection close, and other niceties.

Quite long startup time, but worth it as it doesn't have to be restarted with the client.

replwoacause · on Oct 13, 2023

What kind of resources do you need to run this setup, and how well does yours perform? Is it like chatting with any other chat bot (ChatGPT or Claude, for example) or is it significantly slower? Can you train it on your own self-hosted documents, like markdown?

LoganDark · on Oct 14, 2023

The inference server doesn't do training. The speed is pretty decent for https://huggingface.co/TheBloke/storytime-13B-GPTQ on my 3060, it definitely doesn't feel like you are really waiting for a response.

My exact invocation (on Windows) was:

$ docker run --gpus all --shm-size 1g -p 8080:80 -v C:\text-generation-webui\models:/data ghcr.io/huggingface/text-generation-inference:1.1.0 --model-id /data/storytime-13b-GPTQ --quantize gptq

replwoacause · on Oct 14, 2023

Thanks! I don't have a GPU though, so I'm assuming it isn't going to perform very well. I'll have to see if there are any models that can run on CPU only.

dragonwriter · on Oct 11, 2023

> Is there some convenience wrapper around this to drop-in replace the OpenAI api with it?

text-generation-webui has an OpenAI API implementation.

> I heard that you could force the model to output JSON even better than ChatGPT with a specific syntax, and that you have to structure the prompts in a certain way to get ok-ish outputs instead of nonsense.

Probably to get the maximum use out of that (particularly the support for grammars), it would be better not to use the OpenAI API implementation, and just use the native API in text-generation-webui (or any other runner for the model that supports grammars or the other features you are looking for.)

[0] https://github.com/oobabooga/text-generation-webui

all2 · on Oct 11, 2023

Ollama [0] + LiteLLM [1]

[0] https://ollama.ai

[1] https://github.com/jmorganca/ollama/issues/305#issuecomment-...

---

Ollama is essentially docker for LLMs, and LiteLLM offers an API passthrough to make Ollama OpenAI API compatible. I haven't tried it yet, but I will be trying it probably this weekend.

replwoacause · on Oct 13, 2023

Cool! Is it possible to make this self-hosted model reference my own content, for example markdown files? Or does it only know how to respond to things it was trained on?

brucethemoose2 · on Oct 11, 2023

Koboldcpp has an OpenAI (and kobold api) endpoint now, and supports grammar syntax like you said:

https://github.com/LostRuins/koboldcpp

The biggest catch is it doesn't support llama.cpp's continuous batching yet. Maybe soon?

siquick · on Oct 11, 2023

Pretty straightforward with Ollama + LlamaIndex

https://gpt-index.readthedocs.io/en/latest/examples/llm/olla...

Havoc · on Oct 11, 2023

Llama.cpp has a translation server

leschak · on Oct 12, 2023

python3 -m llama_cpp.server --model /path/to/model.gguf

nothrowaways · on Oct 11, 2023

Try ollama

ipaddr · on Oct 11, 2023

Oobabooga

redox99 · on Oct 11, 2023

Ooba is not meant to serve multiple users (no batching). Batching gives you 5x to 10x throughput increase.

kiraaa · on Oct 11, 2023

the paper does not live up to the quality of model lol

bee_rider · on Oct 11, 2023

Maybe these models should start writing themselves up.

Provide the model with an outline of a 20-or-so page research paper about itself and have it fill in the blanks. The researchers might have to provide textual description of the figures in the “experiments” section.

ramesh31 · on Oct 11, 2023

Is it better than llama 2?

tarruda · on Oct 11, 2023

It is better than llama 2 7b and 13b. I tried the OpenOrca fine tune and it is very good, even when 4-bit quantized

faizshah · on Oct 11, 2023

What does OpenOrca do? It’s just instruction tuning it?

tarruda · on Oct 11, 2023

Yes, it is a instruction tune dataset: https://huggingface.co/datasets/Open-Orca/OpenOrca

It felt different from the official Mistral7B-Instruct. One of the highlights with the OpenOrca version is that you can steer the model with a system prompt (eg "You are a 5 year old")

sebzim4500 · on Oct 11, 2023

For its size, yes. In absolute terms it is obviously less capable than llama-2-70B

espadrine · on Oct 11, 2023

For now. Huggingface[0] mentioned a DPO-fine-tuned version, Zephyr 7B, which it claims is competitive with Llama2-70B[1].

[0]: https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat

[1]: https://twitter.com/huggingface/status/1711780979574976661

brucethemoose2 · on Oct 11, 2023

Oh, they uploaded the weights. I missed this one, cheers!

andai · on Oct 11, 2023

I found llama-2-70B to be a bit worse than GPT-4. (So, pretty good!) But I did not compare with GPT-3.

How do llama-2-70B and Mistral 7B compare with GPT-3?

TheRoque · on Oct 11, 2023

d4rkp4ttern · on Oct 12, 2023

I always throw the Sally puzzle to any new model I try:

Sally, a girl, has 3 brothers. Each brother had 2 sisters. How many sisters does Sally have?

I’ve tried this on mistral, zephyr, llama variants. None of them get it right. Zephyr (on the HF demo page) shows me half a page of discussion and comes up with 8. Even gpt3.5 says 6, which is the most common answer among models.

Only GPT4 gets it right as far as I’ve seen.

I’ve heard a Mistral GPTQ variant gets it right but I haven’t found an easy way to run it.

If anyone found a local model that gets it right, please tell me exactly which one and how to run it!

Rallen89 · on Oct 12, 2023

Worked first try for me on default gpt3.5

https://i.imgur.com/uaNGSFS.jpg

d4rkp4ttern · on Oct 12, 2023

It’s hit and miss with GPT3.5 I think. It got it wrong on both iOS and website.

https://imgur.com/a/a9FOyFL

syntaxing · on Oct 11, 2023

I really look forward to the 13B (if they ever do it). The rolling context is pretty amazing. It get super weird on long text generation but reading long text is great.

replwoacause · on Oct 13, 2023

Does anyone have a good guide to share for how to self-host one of these models and put it behind an API? I’d like to tinker with building a chatbot on my home lab server, so I guess it would need to be runnable on a VM with a few GB of RAM and a couple of cores. Or is that not possible with these kinds of models yet?

mhartz · on Oct 11, 2023

Can someone help me understand Figure 2? Why does the newest token appear at the beginning of the sequence rather than next to its neighboring token?

nivekkevin · on Oct 11, 2023

it's a rolling buffer, so it just upsert index % 4 in this case

mhartz · on Oct 11, 2023

Thanks, so does that mean position within the buffer is irrelevant?

nivekkevin · on Oct 11, 2023

it does feel like so, the position eventually loses its meaning as more and more data gets crunched by the training process, eventually it's just a context of the past 4 tokens it feels like

sp332 · on Oct 11, 2023

Still no mention of what data was used for training.

thawab · on Oct 11, 2023

that's how facebook was sued. Their paper mentioned a data sources that was crawling books from pirated sites.

diggan · on Oct 11, 2023

Maybe the correct way of addressing this problem is by using data sources that won't make others sue you, rather than hiding what data sources you're using.

redox99 · on Oct 11, 2023

Correct in what way? Not correct if you want the best performing model. And if it doesn't outperform the current best model nobody will care about you. For a company like Mistral that could be the end for them.

I'm not saying it uses books3, it might not. I'm just saying why it might make sense to risk it.

drngdds · on Oct 11, 2023

Nah. Copyright law isn't compatible with small startups and independent developers being able to compete with megacorps in AI. It's unironically better to just ignore it.

jcuenod · on Oct 11, 2023

You seem to be unfamiliar with the mantra of silicon valley.

lelima · on Oct 11, 2023

'The Dark Side Of The Force Is A Pathway To Many Abilities Some Consider To Be Unnatural'

LoganDark · on Oct 11, 2023

Jedi won't believe this one simple trick

mardifoufs · on Oct 11, 2023

Why should they be? It's a French startup.

jagrsw · on Oct 11, 2023

HN comments' section most likely :)

opyate · on Oct 12, 2023

I put Mistral-7B-Instruct in a Godot poc game the other day, and the simulated conversations it generates is funny as heck:

https://github.com/opyate/godot-llm-experiment

m3kw9 · on Oct 11, 2023

Why not instead of generalist models with 7b, it should specialize like “role play” model, or just code? But I just realized if the model are not generalized, it won’t understand natural language

jshmrsn · on Oct 11, 2023

An idea I hear often listening to talks about LLMs, is that training on a larger (assuming constant quality) and more various data leads to the emergence of grater generalization and reasoning (if I may use this word) across task categories. While the general quality of a model has a somewhat predictable correlation with the amount of training, the amount of training where specific generalization and reasoning capabilities emerge is much less predictable.

dragonwriter · on Oct 11, 2023

There are RP, code, etc. specialized fine tunes of some models, to get the most bank for the bunk on some small models.

all2 · on Oct 11, 2023

You can take a general model and fine tune it for a specific task. There are various tutorials out there for creating fine-tuned models.

febed · on Oct 12, 2023

Any links or colab for beginners to learn how to fine tune this model?

Nischalj10 · on Oct 11, 2023

what is the best way to fine-tune these models? any good resources would be very helpful. TIA /\

PS - I have a brief background in Machine Learning, more in development.

code_biologist · on Oct 11, 2023

Jeremy Howard talks about it in his recent video "A Hackers' Guide to Language Models": https://youtu.be/jkrNMKz9pWU?t=4808

That link goes directly to the timestamp where he discusses fine tuning, but the whole talk is great. Punchline, check out Axolotl: https://github.com/OpenAccess-AI-Collective/axolotl

vjb2tq4dws · on Oct 11, 2023

This is a walkthrough based on that talk for fine-tuning with axolotl https://dzlab.github.io/dltips/en/pytorch/llama-2-finetuning...

fgfm · on Oct 11, 2023

The research paper by Mistral about their Mistral 7B v0.1

bratao · on Oct 11, 2023

[flagged]

brucethemoose2 · on Oct 11, 2023

> I'm on the edge of my seat for the 13B and 34B behemoths.

When have they talked about this?

I'm very afraid the answer is "In their Discord."

LoganDark · on Oct 11, 2023

> I'm very afraid the answer is "In their Discord."

Same, honestly. I hate having to "join" "servers" when half the time they have a noisy bot that pings me to the whole channel as soon as I join. I would prefer to introduce myself when I'm ready. Or stay silent.

esafak · on Oct 11, 2023

Didn't you see the message in #random? It scrolled past 37 minutes ago!

brucethemoose2 · on Oct 11, 2023

And it was a link to a comment in an obscure thread in #memes-2 which you can only see after being in the server for three days, being logged in with your phone number and PMing some bot.

msoad · on Oct 11, 2023

Why this comment is written in sports podcast tone?

bratao · on Oct 11, 2023

Sorry about that. I´m not a native speaker and asked GPT-4 to: "Create a engaging reply for HackerNews talking that this is a great model, and I really hope that they release a 13B and 34B version. As those sizes are way more capable and have a chance of finally surpassing the GPT 3.5. This would be a very nice decision for mind share, and their larger models that can rivalize gpt 4 can be keep private for commercialization."

I think that this is how GPT-4 thinks that a engaging comment for HN looks like.

pnpnp · on Oct 11, 2023

I think your prompt was written well enough to not need GPT-4. Don't undersell yourself :)

satvikpendem · on Oct 11, 2023

I flagged it for being AI written. Even if you're not a native speaker, it's best to not have AI outputs polluting future datasets, anyway.

redox99 · on Oct 11, 2023

AI outputs are not necessarily bad for datasets, given they've been verified by a human for their quality and correctness (probably the case here, but not for SEO content farms).

pmarreck · on Oct 11, 2023

We've had flawed human outputs re-polluting future human learning for some time, now

unshavedyak · on Oct 11, 2023

That's actually really interesting, thanks for sharing. We're in for an interesting future hah.

yborg · on Oct 11, 2023

This is the future we are choosing. https://youtu.be/Cn8Pua5rhj4?si=tOro1MLaOE525Q2O

throwaway6977 · on Oct 11, 2023

(ha!)

throwanem · on Oct 11, 2023

Given it's the most upvoted on the thread at the moment, I think GPT-4 was on the money here :D

Lerc · on Oct 11, 2023

This is something that I have found LLMs nearly completely useless for. I gave a talk on uses of AI for gamedev, and had some great things but I couldn't get it to write a blurb for the talk that wasn't vomit inducing.

This isn't so much a problem white LLMs themselves but the training data. The world is so inundated with meaningless marketing speak that when you try and get it to talk about a topic in even a slightly promotional manner it will create something that fits in nicely with the existing drivel.

bigfudge · on Oct 11, 2023

I have the same problem writing references for students and summarising my feedback to them. I find asking to “write concisely and without waffle, like a brusque, British academic” helps a bit.

pmarreck · on Oct 11, 2023

So basically you're admitting it's a prompting problem.

brucethemoose2 · on Oct 11, 2023

The comment implied that 13B and 34B models are coming.

This is interesting... You didn't have any malicious intent, hence this is a somewhat novel example of GPT4 sneaking misinformation into an HN comment section.

brucethemoose2 · on Oct 11, 2023

Its Mistral!

Or are you Mistral?

BrunoJo · on Oct 11, 2023

I just started a simple service to use Mistral as a replacement for OpenAI. If anyone is interested you can sign up at https://lemonfox.ai

matteoraso · on Oct 12, 2023

>$0.001 per request second

This pricing is probably more expensive than gpt-3.5-turbo 4k context. A large prompt for the API would be 1k tokens in and 1k tokens out, which comes to $0.0035 for OpenAI. Your website says to expect a request to take 4 seconds minimum, so that's $0.004. Given how light Mistral is, I think you'd have to cut your price by at least a factor of 10 for it to be reasonable.

bazmattaz · on Oct 11, 2023

Tried to sign up. Just got a loading spinner on the sign up button and nothing else

yieldcrv · on Oct 11, 2023

can someone explain why the AI or language model community circles around arxiv?

I really hate the pseudo-academic gatekeeping in the AI/ML community, Google said you have no moat, we all know you have no moat, including that degree. we can all fine tune with consumer hardware we already have or even better cheaply on readily accessible clouds for this specific purpose. why are they still doing this fake academic junk.

selfhoster11 · on Oct 11, 2023

I am very confused by this comment. There is no gatekeeping in the ML/AI community. Ideas flow freely (albeit within the confines of several major Discord servers, or so it seems). Whether the author of an idea has formal training in ML and adjacent disciplines or not, whether it's published on arXiv or not, it doesn't matter - it'll be adopted if it works and/or makes it easier for people to run their GPT waifu/ baby AGI prototype.

That said, new open foundation models sized 7B and over are still a fairly rare thing to see. If someone goes through the effort of creating one of those, and especially if it has some sort of an edge against Llama 2 7B, it's not unreasonable to expect an arXiv paper to be released about it.

LoganDark · on Oct 11, 2023

Isn't token completion incapable of representing AGI? AGIs need the ability to perform internal thought and deliberation, and they probably also need to be bottom-up rather than top-down.

stavros · on Oct 12, 2023

You can't judge whether something is AGI or not from how it works, that just leads to goalpost-moving. AGI is AGI if it can do certain things, no matter whether it's token-based or top-down or anything.

dragonwriter · on Oct 12, 2023

> Isn't token completion incapable of representing AGI?

Given the absence of a validated model (or even usable operational definition) of general intelligence, who knows? AGI might as well be an empty marketing buzzword, it isn't something about which falsifiable fact claims can be made.

> AGIs need the ability to perform internal thought and deliberation

Systems built around LLMs can do something like this, via reflection, a technique in constructing agent-based simulations using LLMs as the engine.

selfhoster11 · on Oct 13, 2023

I see what you mean. In this case, I was talking about the LLM being a component of the AGI rather than the whole AGI all by itself.

SimplyUnknown · on Oct 11, 2023

I mean, you can't just share the weights of the model and call it a day, right? You have to share details on what and why you are doing. You must communicate this somehow. In theory, you might be able to do this in a github readme, but a paper-style document on arxiv is nicely suited for this.

selfhoster11 · on Oct 11, 2023

People release models as just the weights all the time. HuggingFace makes it pretty easy to do that.

yieldcrv · on Oct 11, 2023

> I mean, you can't just share the weights of the model and call it a day, right?

you can't?

SimplyUnknown · on Oct 11, 2023

Obviously you can, but in the grand scheme of things people should share more details about their method so people can improve on it in the future, no?