Encoding JSON or MessagePack will be about the same speed, although I would expect MessagePack to be marginally faster from what I’ve seen over the years. It’s easy to encode data in most formats, compression excluded.
Parsing is the real problem with JSON, and no, it isn’t even close. MessagePack knows the length of every field, so it is extremely fast to parse, an advantage that grows rapidly when large strings are a common part of the data in question. I love the simple visual explanation of how MessagePack works here: https://msgpack.org/
Anyone who has written parsing code can instantly recognize what makes a format like this efficient to parse compared to JSON.
With some seriously wild SIMD JSON parsing libraries, you can get closer to the parsing performance of a format like MessagePack, but I think it is physically impossible for JSON to be faster. You simply have to read every byte of JSON one way or another, which takes time. You also don’t have any ability to pre-allocate for JSON unless you do two passes, which would be expensive to do too. You have no idea how many objects are in an array, you have no idea how long a string will be.
MessagePack objects are certainly smaller than JSON but larger than compressed JSON. Even compressed MessagePack objects are larger than the equivalent compressed JSON, in my experience, likely because the field length indicators add a randomness to the data that makes compression less effective.
For applications where you need to handle terabytes of data flowing through a pipeline every hour, MessagePack can be a huge win in terms of cost due to the increased CPU efficiency, and it’s a much smaller lift to switch to MessagePack from JSON than to switch to something statically typed like Protobuf or CapnProto, just due to how closely MessagePack matches JSON. (But, if you can switch to Protobuf or CapnProto, those should yield similar and perhaps even modestly better benefits.)
Compute costs are much higher than storage costs, so I would happily take a small size penalty if it reduced my CPU utilization by a large amount, which MessagePack easily does for applications that are very data-heavy. I’m sure there is at least one terribly slow implementation of MessagePack out there somewhere, but most of them seem quite fast compared to JSON.
Also take note of the “ShamatonGen” results, which use codegen before compile time to do things even more efficiently for types known ahead of time, compared to the normal reflection-based implementation. The “Array” results are a weird version that isn’t strictly comparable, the encoding and decoding steps assume that the fields are in a fixed order, so the encoded data is just arrays of values, and no field names. It can be faster and more compact, but it’s not “normal” messagepack.
I’ve personally seen crazy differences in performance vs JSON.
If you’re not handling a minimum of terabytes of JSON per day, then the compute costs from JSON are probably irrelevant and not worth thinking too hard about, but there can be other benefits to switching away from JSON.
Size savings depends I guess on the workload. That home page example gets larger gzip'd so raw msgpack is smaller. Another comment says their data was considerably smaller vs json.
Sometimes you can't gzip for various reasons. There were per-message deflate bugs in Safari and Brave somewhat recently. Microsoft is obsessed with the decades old CRIME/BREACH for some reason(I've never heard any other company or individual even mention them) so signalR still doesn't have the compression option yet..
Encoding JSON or MessagePack will be about the same speed, although I would expect MessagePack to be marginally faster from what I’ve seen over the years. It’s easy to encode data in most formats, compression excluded.
Parsing is the real problem with JSON, and no, it isn’t even close. MessagePack knows the length of every field, so it is extremely fast to parse, an advantage that grows rapidly when large strings are a common part of the data in question. I love the simple visual explanation of how MessagePack works here: https://msgpack.org/
Anyone who has written parsing code can instantly recognize what makes a format like this efficient to parse compared to JSON.
With some seriously wild SIMD JSON parsing libraries, you can get closer to the parsing performance of a format like MessagePack, but I think it is physically impossible for JSON to be faster. You simply have to read every byte of JSON one way or another, which takes time. You also don’t have any ability to pre-allocate for JSON unless you do two passes, which would be expensive to do too. You have no idea how many objects are in an array, you have no idea how long a string will be.
MessagePack objects are certainly smaller than JSON but larger than compressed JSON. Even compressed MessagePack objects are larger than the equivalent compressed JSON, in my experience, likely because the field length indicators add a randomness to the data that makes compression less effective.
For applications where you need to handle terabytes of data flowing through a pipeline every hour, MessagePack can be a huge win in terms of cost due to the increased CPU efficiency, and it’s a much smaller lift to switch to MessagePack from JSON than to switch to something statically typed like Protobuf or CapnProto, just due to how closely MessagePack matches JSON. (But, if you can switch to Protobuf or CapnProto, those should yield similar and perhaps even modestly better benefits.)
Compute costs are much higher than storage costs, so I would happily take a small size penalty if it reduced my CPU utilization by a large amount, which MessagePack easily does for applications that are very data-heavy. I’m sure there is at least one terribly slow implementation of MessagePack out there somewhere, but most of them seem quite fast compared to JSON.
Some random benchmarks in Go: https://github.com/shamaton/msgpack#benchmark
Also take note of the “ShamatonGen” results, which use codegen before compile time to do things even more efficiently for types known ahead of time, compared to the normal reflection-based implementation. The “Array” results are a weird version that isn’t strictly comparable, the encoding and decoding steps assume that the fields are in a fixed order, so the encoded data is just arrays of values, and no field names. It can be faster and more compact, but it’s not “normal” messagepack.
I’ve personally seen crazy differences in performance vs JSON.
If you’re not handling a minimum of terabytes of JSON per day, then the compute costs from JSON are probably irrelevant and not worth thinking too hard about, but there can be other benefits to switching away from JSON.