More

YourDadVPN · 2026-05-01T15:50:53 1777650653

How did you do it? Hopefully you generated thousands of such arrays and measured the cost of searching all of them as a single iteration? Because depending on what you use to measure time, the overhead of reading the timer would likely dwarf the cost of searching such a small array. The best case is a dedicated cycle counter instruction/register (e.g. rdtsc) and even that may cost hundreds of cycles. Cache hits cost less than a cycle so if your timing code gated a small number of searches you essentially didn't measure the search at all.

That aside your findings point to the prefetcher having identified your linear searches as sequential access so practically every single access was a cache hit (you effectively measured the latency between a CPU and its L1 cache). If you wanted to test this you could do something like make each element some number of cache lines wide. Stream prefetchers have a maximum stride, so variables more than that many bytes apart won't be prefetched. Google's multichase benchmark uses 256 bytes IIRC.

codedokode · 2026-05-02T03:41:38 1777693298

I often benchmark my DIY data structures, as well as check the assembly on godbolt. I think I wrote the code using ML model (they save lot of time), and then fixed the issues and extended it myself. I have the code here: https://pastebin.com/GzY4HMsZ

I do the search multiple (thousands) times and calculate total time spent, so the timer is called only 2 times. I search across the same array, but for the different number each time. So the array should be completely within L1 cache. It's also important to use the result of the search otherwise the compiler simply omits the code.

I use clock_gettime() which takes on order of 1us. Timestamp counter (which is faster) is not available for me because every CPU core has its own counter and they show different values, so Linux refuses to use it. First core is 600 ms ahead of other cores. How in 2025 we cannot make a boring counter amazes me. Or, maybe this is intentional to prevent using cheaper consumer hardware for professional purposes.

Here are some numbers:

Array size 8, 200 000 iterations, time ns/seach: linear 8 ns, linear no-branch <1 ns (I wonder if it is an error), binary 11 ns, binary no-branch 7 ns.

Array size 128, 100 000 iterations, time: linear 34 ns, linear no-branch 22 ns, binary 35 ns, binary no-branch 16 ns.

As you can see, the branches is what ruins the performance here. Every time the CPU mis-predicts a branch, it wastes several cycles.

Looking through disassembly on godbolt, the compiler inlined and vectorized the measurement loop for no-branch code, which might explain the numbers. The SIMD assembly is hard to read and I don't quite understand what it does: https://godbolt.org/z/dofKnT3W3

I'll be happy to read if you point me at any mistakes in my benchmark. If you want to compile the code, I used gcc with "-O3" and maybe with "-march=native".

YourDadVPN · 2026-05-02T08:11:07 1777709467

Interesting, thanks for sharing. I actually assumed ?: desugared to an if-else and never checked. I was a bit confused by the if-else chains you have e.g. on line 111 where it looks like you could just use the else branch for every case?

You could circumvent the counter issue by pinning to a specific core, otherwise you'd have to use scheduler events (ftrace/perf on Linux) to know which CPU you were running on at specific times and when so you could subtract the right counters (I've had to do this before and it isn't pretty). It would also prevent issues where your task gets preempted and moved to another core and you have to warm the caches again. If you use Linux, options like isolcpus and nohz_full ensure you're not measuring the scheduler/other tasks but have to be set at boot. But if your loops between clock_gettime take substantially more than clock_gettime itself then at least the timer overhead is probably not that important.

As for the branch prediction, I would think most of the branches would be predicted correctly because usually you'll have runs of "not x" before each "is x". Switching between many loops of course hurts the prediction.

My SIMD knowledge only extends as far as which compiler options make my code faster.

No shame using LLMs, I use them extensively, but I find I have to write some code yourself because I make noticeably more mistakes coding if I have let the LLM do everything for a couple of weeks.

codedokode · 2026-05-02T10:52:20 1777719140

> I was a bit confused by the if-else chains you have e.g. on line 111 where it looks like you could just use the else branch for every case?

The point of "ifs" there is to replace a variable "size" with a constant like 8, 16 so that the compiler can hardcode it in the code and maybe make it more efficient (for example, instead of counting the iterations the compiler can just repeat the instruction 8 times when iteration count is 8). I expect that the compiler would generate customized "linear_search" function versions instead of using generic version with unknown number of iterations. You can see the example in godbolt assembly code at line 825, where the compiler unrolled a loop for "linear_search" function with size = 8 - there is no loop, just 8 comparisons.

> As for the branch prediction, I would think most of the branches would be predicted correctly because usually you'll have runs of "not x" before each "is x". Switching between many loops of course hurts the prediction.

I don't think so. In binary search there is no predictable pattern, so the predictor would miss in 50% of cases. And in linear search there is a guaranteed miss in the last iteration.

> You could circumvent the counter issue by pinning to a specific core,

I think I'll try this in future.

> but I find I have to write some code yourself

I can write it myself, but it would take 20-30 minutes including time to find and read the docs for functions while for LLM it takes less than a minute. But anyway I have to edit the result a lot, for example LLM "forgot" that you need to use the result of the search to prevent compiler from throwing the code away. Also, free version of ChatGPT became much dumber couple of months ago.

YourDadVPN · 2026-05-02T12:56:35 1777726595

Fair point that binary search causes a lot of branch mispredictions. Did you ever measure whether the unrolled linear search with 8 elements outperforms a "rolled" one? Because with instruction level parallelism I wonder how much difference removing 8 mostly easily predicted (in the linear search) branches makes.

codedokode · 2026-05-02T17:44:30 1777743870

I didn't try, but loop contains a branch so there could be misprediction, which doesn't happen with unrolled loop. Also, instruction level parallelism applies to unrolled code as well.

YourDadVPN · 2026-05-01T15:31:19 1777649479

I'm guessing you don't really do this for users since the response for all of them should be 401 on any user that you aren't logged in as? I would argue even for IDs that don't exist, you should get the same error whether they don't exist or you just aren't authorised to see them. It's been a few years since I worked in web but I think that's what I would have done, GitHub does similar for private repos.

YourDadVPN · 2026-04-29T07:23:02 1777447382

What does it do for you that, say, konsole, gnome-terminal or even xterm wouldn't?

leephillips · 2026-04-29T13:11:46 1777468306

Aside from its well-known features, such as displaying images directly in the terminal over ssh, I use it to create TUI applications. The application is a saved Kitty session, with a defined arrangement of windows. Each window runs a specified program, and communicates with the other windows over a Unix socket. Kitty has a convenient tool to create these sessions. Once created, I can start the session-application like any other program. The sessions are defined in a text file, so I can edit it to adjust the window arrangement or other details.

I also use its shell integration features, such as putting the scrollback into a pager, constantly.

guenthert · 2026-04-29T15:59:40 1777478380

Does session management really belong in the terminal emulator?

em-bee · 2026-04-29T20:23:53 1777494233

to answer that question we need to first look at the alternatives. whee else could session management go? and then we can consider the benefits and drawbacks of each approach.

if you are thinking of tmux then the problem here is that tmux is in itself a terminal.

to get session management away from the terminal it would need to be done in such a way that when the session tool connects the session it merely acts as a proxy or less, but does not interpret and then translate the signals that come from the session like tmux does.

this is not trivial, at least with the current way terminals work.

we need to entirely rethink the terminal protocol: https://news.ycombinator.com/item?id=47941354

leephillips · 2026-04-29T16:07:36 1777478856

I‘m not sure I understand the question. Some people use similar kinds of sessions in terminal multiplexers such as tmux (is that what you have in mind?), but this leads to problems (interpretations of key sequences, etc.) that the Kitty solution sidesteps.

mlok · 2026-04-29T09:42:56 1777455776

I was wondering also. If it can help, there is an overview on Kitty's website : https://sw.kovidgoyal.net/kitty/

graemep · 2026-04-29T11:14:54 1777461294

Compared to Kitty, Konsole has image preview, click on files and links, tabs and tiles. Konsole does not have a way of changing the layout of tiles, equivalents to Kitty shell and remote control, pager, and the ssh integration (e.g. ls in an ssh session and Konsole will not open the file).

throwa356262 · 2026-04-29T18:16:08 1777486568

I dont get it. What exactly does kitty give use here?

YourDadVPN · on Feb 8, 2024

Why are people using "learnings" as a word, when "lessons" already fits all its use cases? It's quite jarring.

YourDadVPN · on Jan 25, 2024

Considering they knew these failure modes in advance, would it not have been prudent to put some sort of self-righting mechanism on the lander? Something like the mechanical arms you see in Robot Wars.

mecsred · on Jan 25, 2024

Mechanisms weigh a significant amount, long mechanisms like arms even more so. The more mass you spend on contingencies the less science you bring and the less valuable the mission is, even with a flawless touchdown.

anigbrowl · on Jan 25, 2024

If you can't land reliably, then a lot of the science is wasted. On the other hand, once you reliably solve the landing issue (which was one of the main objectives here, and where they made significant progress in that they successfully deployed the two mini rovers anyway), you can add on as much science as you want.

Also I've said before and will say it again, the moon is not far away. Unlike Mars and other celestial bodies where we have to time launches around orbital positions, gravitational slingshots and such, the moon is really close by, and we should be lofting stuff onto it on a monthly basis.

blisterpeanuts · on Jan 25, 2024

Good point, but I wonder if some of the attitudinal jets could be employed to push it upright, assuming there's any fuel left (and power).

mecsred · on Jan 25, 2024

Neat idea, I'm not sure how much force those jets produce, maybe it could be enough? It might be risky to fire up jets after a botched landing though. If the nozzle has any material in it I'd be worried about blowing a hole in the lander!

johnwalkr · on Jan 26, 2024

And it's all a moot point if you're too heavy or complicated to fly in the first place.

YourDadVPN · on Jan 18, 2024

What is your opinion of code coverage requirements now? I have been in a "phase" of seeing them as "code quality theatre". Considering that a function which takes a single 8-bit integer as an argument already has 256 unique inputs, and may bug only on 1-2 values, 100% statement coverage can be very misleading. A typical function has billions or trillions of unique inputs and 100% statement coverage could be very nearly 0% state space coverage. I'm 5y into my career (but 15y into programming) and aware that my opinions will change and develop as I progress. This one has been stable for a while though.

sk11001 · on Jan 18, 2024

Coverage is necessary but not sufficient, your tests also need to be good and test the right things. What's your proposed alternative - that we don't even try because our tests aren't guaranteed to be perfect?

YourDadVPN · on Jan 19, 2024

Well, almost. I use a lot of assertions to check the obvious (value out of range, null pointer, etc.) and test the happy path(s) to prove the code at least works in the cases I can anticipate. I add unit tests for complex algorithms, and to prove a reported bug is what I think it is and that it has been fixed. Otherwise, I think using unit tests to find bugs is mostly busywork.

For even a fairly trivial piece of code, the search space for bugs can be vast or even infinite. Writing unit tests to find bugs within that space is like throwing darts at an infinitely large wall and trying to hit an unknown number of invisible targets. You can only write tests for the potential bugs you anticipate - if you could anticipate a bug, you wouldn't write it, right? You end up with dozens or hundreds of tests that probably never failed, except when you have to change something. Such was my experience when I tried to maintain high code coverage. When I switched to writing assertions and acceptance tests, my rate of bug reports did not change, and I was more agile.

YourDadVPN · on June 25, 2023

I think the best way to achieve this is by providing an OS API that results in the files always being created in the same place. Applications/libraries could still choose their own filenames and syntax, just the location would be OS controlled. I think there is room for a new desktop/laptop OS to emerge and one good idea from mobile OS design I would like to see is having everything be an API call that allows the OS to impose standardisation, permissions and user preferences rather than the free-for-all desktop OSes have (though I propose letting the user run non-compliant applications, and not porting the iOS app store only model into the OS).

AeroNotix · on June 25, 2023

You're basically suggesting the windows registry.

aeonik · on June 25, 2023

The problem with the Windows registry, at least back in the day, was that it was a single file that could be corrupted and it would wreck your whole system.

I think having a standard utility API for *nix configs makes a lot of sense. I'm surprised it doesn't exist.

I tried to find one, and there are some libraries for reading and parsing in every language, but nothing that seems to cover everything.

This bash script seems to be a fairly "built in" way to parse them: https://unix.stackexchange.com/questions/441076/which-is-the...

feldrim · on June 25, 2023

> The problem with the Windows registry, at least back in the day, was that it was a single file that could be corrupted and it would wreck your whole system.

Exactly. But Registry has been using NTFS and it's capabilities for rollbacks and recovery. Therefore, the problem is mostly solved. There are occasions [0] of bugs causing corruptions though but they are very rare.

0. https://www.bleepingcomputer.com/news/microsoft/windows-10-n...

angulardragon03 · on June 25, 2023

macOS has something similar also with defaults [1].

https://macos-defaults.com/#%F0%9F%99%8B-what-s-a-defaults-c...

YourDadVPN · on June 9, 2023

At 29, I don't feel like I was "basically finished" until 25 or so, and feel significantly more mature than I was even a year ago. I'm very interested to know why you think 13 year olds are mostly finished developing.

willcipriano · on June 9, 2023

Extended adolescence is a byproduct of postmodernity. By the time my grandmother was 25 she had been married for a decade and had 2 children herself.

I had a job at 13 and would've spent my own money on something like this.

We treat adults like children so children behave like infants.

r2_pilot · on June 9, 2023

The human brain does not fully develop until around the mid-20s. https://hr.mit.edu/static/worklife/youngadult/brain.html

willcipriano · on June 9, 2023

Alexander conquered most of the known world with a "undeveloped brain".

r2_pilot · on June 9, 2023

You seemingly state that as a positive goal. I posit that many people would have preferred to be left alone and not killed or subjugated. Perhaps if he had been prevented from those actions until after maturity, he would have chosen different actions.

spicymapotofu · on June 9, 2023

To rephrase the other user's [correct] sentiment, a scientific determination of brain development has more to do with myelin on axons than maturity regarding emotions, finances, or anything else we actually judge people on.

YourDadVPN · on June 5, 2023

Wonder how much Apple paid for all these ads.

lelandbatey · on June 5, 2023

I mean, I don't think most of us commenting here are Apple ads. I think we're all hoping that if Apple is jumping into VR, then finally it'll be good enough for us all to adopt massively, the same way we all have smartphones. That's an exciting prospect, even if we don't necessarily "like" Apple.

But at $3500 for one, I think they've blown it. I'll wait for the reviews, and if it really does deliver on all the promises, then that'll be nice. Somehow, I doubt that it'll happen though.

YourDadVPN · on June 5, 2023

I was talking about the posts/linked articles rather than the comments. I counted six in the first 20 posts, and the titles are mostly quite ad-like.

While I was initially interested by the announcement, the size, lack of internal battery and price makes this a hard no for me. Idk what they were thinking - doesn't anyone remember Google Glass? That was half the price and size and nobody wanted it.

YourDadVPN · on June 4, 2023

Why was Bitcoin an option?