More

kiraaa · on Feb 25, 2025

when there are two commands in a prompt example

do A and then do B.

the model completely ignores the second task B.

kiraaa · on May 20, 2024

on gpu that is still huge.

0cf8612b2e1e · on May 20, 2024

Even if it is “only” the 40% lower end, that is a gargantuan savings. So many groups are compute constrained, every bit helps.

josephg · on May 20, 2024

Sure; but 40% improvement is much less than a 26x improvement. If 40% is the realistic figure, cite that. Changing the title to include an outlier of 26x is click baity.

VeejayRampay · on May 20, 2024

it sure is huge, but it's still far from 26x

kiraaa · on March 13, 2024

super easy to install and use

kiraaa · on Feb 21, 2024

mistral 7b v0.2 supports 32k

brucethemoose2 · on Feb 21, 2024

This is a good point actually, and an underappreciated fact.

I think so many people (including me) effectively ignored Mistral 0.1's sliding window that few realized 0.2 instruct is native 32K.

tarruda · on Feb 22, 2024

Mixtral 8x7B has 32k context.

Mistral 7b instruct 0.2 is just an instruct fine tune of Mistral 7b and stays with a 8k context.

kiraaa · on Feb 15, 2024

maybe they are using ring attention, on top of their 128k model.

ein0p · on Feb 16, 2024

More likely some clever take on RAG. There’s no way that 1M context is all available at all times. More likely parts of it are retrievable on demand. Hence the retrieval-like use cases you see in the demos. The goal is to find a thing, not to find patterns at a distance

kiraaa · on Feb 16, 2024

could be true, we can only speculate.

kiraaa · on Jan 29, 2024

really like the font and great article btw.

jarbus · on Jan 29, 2024

Thanks so much! It’s GNU Unifont. I kept switching between fonts every few months until I discovered it, I’ve been using it for the past few years now with no intention of switching off :D

kiraaa · on Oct 11, 2023

the paper does not live up to the quality of model lol

bee_rider · on Oct 11, 2023

Maybe these models should start writing themselves up.

Provide the model with an outline of a 20-or-so page research paper about itself and have it fill in the blanks. The researchers might have to provide textual description of the figures in the “experiments” section.

ramesh31 · on Oct 11, 2023

Is it better than llama 2?

tarruda · on Oct 11, 2023

It is better than llama 2 7b and 13b. I tried the OpenOrca fine tune and it is very good, even when 4-bit quantized

faizshah · on Oct 11, 2023

What does OpenOrca do? It’s just instruction tuning it?

tarruda · on Oct 11, 2023

Yes, it is a instruction tune dataset: https://huggingface.co/datasets/Open-Orca/OpenOrca

It felt different from the official Mistral7B-Instruct. One of the highlights with the OpenOrca version is that you can steer the model with a system prompt (eg "You are a 5 year old")

sebzim4500 · on Oct 11, 2023

For its size, yes. In absolute terms it is obviously less capable than llama-2-70B

espadrine · on Oct 11, 2023

For now. Huggingface[0] mentioned a DPO-fine-tuned version, Zephyr 7B, which it claims is competitive with Llama2-70B[1].

[0]: https://huggingface.co/spaces/HuggingFaceH4/zephyr-chat

[1]: https://twitter.com/huggingface/status/1711780979574976661

brucethemoose2 · on Oct 11, 2023

Oh, they uploaded the weights. I missed this one, cheers!

andai · on Oct 11, 2023

I found llama-2-70B to be a bit worse than GPT-4. (So, pretty good!) But I did not compare with GPT-3.

How do llama-2-70B and Mistral 7B compare with GPT-3?

TheRoque · on Oct 11, 2023

kiraaa · on June 18, 2023

you need 94gb, does not matter which RAM.

kiraaa · on June 16, 2023

https://github.com/sysid/sse-starlette makes token streaming so much easier in python

dnadler · on June 16, 2023

I’m using this for an internal ChatGPT UI clone and it’s working great.

The actual biggest pain for me has been the front end handling of it with the fetch api. But that’s likely just due my inexperience with it.

kiraaa · on April 19, 2023

and also easy to deploy