Hacker Newsnew | past | comments | ask | show | jobs | submit | kiraaa's commentslogin

when there are two commands in a prompt example

do A and then do B.

the model completely ignores the second task B.


on gpu that is still huge.


Even if it is “only” the 40% lower end, that is a gargantuan savings. So many groups are compute constrained, every bit helps.


Sure; but 40% improvement is much less than a 26x improvement. If 40% is the realistic figure, cite that. Changing the title to include an outlier of 26x is click baity.


it sure is huge, but it's still far from 26x


super easy to install and use


mistral 7b v0.2 supports 32k


This is a good point actually, and an underappreciated fact.

I think so many people (including me) effectively ignored Mistral 0.1's sliding window that few realized 0.2 instruct is native 32K.


Mixtral 8x7B has 32k context.

Mistral 7b instruct 0.2 is just an instruct fine tune of Mistral 7b and stays with a 8k context.


maybe they are using ring attention, on top of their 128k model.


More likely some clever take on RAG. There’s no way that 1M context is all available at all times. More likely parts of it are retrievable on demand. Hence the retrieval-like use cases you see in the demos. The goal is to find a thing, not to find patterns at a distance


could be true, we can only speculate.


really like the font and great article btw.


Thanks so much! It’s GNU Unifont. I kept switching between fonts every few months until I discovered it, I’ve been using it for the past few years now with no intention of switching off :D


the paper does not live up to the quality of model lol


Maybe these models should start writing themselves up.

Provide the model with an outline of a 20-or-so page research paper about itself and have it fill in the blanks. The researchers might have to provide textual description of the figures in the “experiments” section.


Is it better than llama 2?


It is better than llama 2 7b and 13b. I tried the OpenOrca fine tune and it is very good, even when 4-bit quantized


What does OpenOrca do? It’s just instruction tuning it?


Yes, it is a instruction tune dataset: https://huggingface.co/datasets/Open-Orca/OpenOrca

It felt different from the official Mistral7B-Instruct. One of the highlights with the OpenOrca version is that you can steer the model with a system prompt (eg "You are a 5 year old")


For its size, yes. In absolute terms it is obviously less capable than llama-2-70B


For now. Huggingface[0] mentioned a DPO-fine-tuned version, Zephyr 7B, which it claims is competitive with Llama2-70B[1].

[0]: https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat

[1]: https://twitter.com/huggingface/status/1711780979574976661


Oh, they uploaded the weights. I missed this one, cheers!


I found llama-2-70B to be a bit worse than GPT-4. (So, pretty good!) But I did not compare with GPT-3.

How do llama-2-70B and Mistral 7B compare with GPT-3?


Yes


you need 94gb, does not matter which RAM.


https://github.com/sysid/sse-starlette makes token streaming so much easier in python


I’m using this for an internal ChatGPT UI clone and it’s working great.

The actual biggest pain for me has been the front end handling of it with the fetch api. But that’s likely just due my inexperience with it.


and also easy to deploy


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: