Hacker Newsnew | past | comments | ask | show | jobs | submit | pimlottc's commentslogin

Sure, but the world is vast. I would love to be able to test every UI framework and figure out which is the best, but who’s got time for that? You have to rely on heuristics for some things, and popularity is often a decent indicator.

Popularity’s flip side is that it can fuel commodification.

I argue popularity is insufficient signal. React as tech is fine, but the market of devs who it is aimed at may not be the most discerning when it comes to quality.


Why not just try every permutation of (1,l)? Let’s see, 76 pages, approx 69 lines per page, say there’s one instance of [1l] per line, that’s only… uh… 2^5244 possibilities…

Hmm. Anyone got some spare CPU time?


It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure, reducing the cost similar to how you can crack passwords when the server doesn't use a constant-time memcmp. Are PDFs typically compressed by default? If so that makes it even easier given built-in checksums. But it's just not something you can do by throwing data at existing tools. You'll need to build a testing harness with instrumentation deep in the bowels of the decoders. This kind of work is the polar opposite of what AI code generators or naive scripting can accomplish.

Not necessarily a PDF attachment?

Someone who made some progress on one Base64 attachment got some XMP metadata that suggested a photo from an iPhone. Now I don't know if that photo was itself embedded in a PDF, but perhaps getting at least the first few hundred bytes decoded (even if it had to be done manually) would hint at the file-type of the attachment. Then you could run your tests for file fidelity.


I'd say 99% of the time, the first 10 bytes would be enough to know the file type.

>It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure

that's pointed out in the article. It's easy for plaintext sections, but not for compressed sections. Didn't notice any mention of checksums.


On the contrary, that kind of one-off tooling seems a great fit for AI. Just specify the desired inputs, outputs and behavior as accurately as possible.

You might be taking the "I" in AI too literally.

I wonder if you could leverage some of the fuzzing frameworks tools like Jepsen rely on. I’m sure there’s got to be one for PDF generation.

Easy, just start a crypto currency (Epsteincoin?) based on solving these base64 scans and you'll have all the compute you could ever want just lining up

Please don’t give ideas to Nvidia.

I took it to mean “make Postgres your default choice”, not “always use Postgres no matter what”

I personally see a difference between “just use Postgres” and “make Postgres your default choice.” The latter leaves room to evaluate alternatives when the workload calls for it, while the former does not. When that nuance gets lost, it can become misleading for teams that are hitting or even close to hitting—the limits of Postgres, who may continue tuning Postgres spending not only time but also significant $$. IMO a better world is one where developers can have a mindset of using best-in-class where needed. This is where embracing integrations with Postgres will be helpful!

I think that the key point being made by this crowd, of which I'm one, is somewhere in the middle. The way I mean it is "Make Postgres your default choice. Also *you* probably aren't doing anything special enough to warrant using something different".

In other words, there are people and situations where it makes sense to use something else. But most people believing they're in that category are wrong.


> Also you probably aren't doing anything special enough to warrant using something different".

I always get frustrated by this because it is never made clear where the transition occurs to where you are doing something special enough. It is always dismissed as, "well whatever it is you are doing, I am sure you don't need it"

Why is this assumption always made, especially on sites like HackerNews? There are a lot of us here that DO work with scales and workloads that require specialized things, and we want to be able to talk about our challenges and experiences, too. I don't think we need to isolate all the people who work at large scales to a completely separate forum; for one thing, a lot of us work on a variety of workloads, where some are big enough and particular enough to need a different technology, and some that should be in Postgres. I would love to be able to talk about how to make that decision, but it is always just "nope, you aren't big enough to need anything else"

I was not some super engineer who already knew everything when I started working on large enough data pipelines that I needed specialized software, with horizontal scaling requirements. Why can't we also talk about that here?


Nothing about my post indicated that these things shouldn't be talked about and discussed. It's worth understanding how things work, even things that one personally need to use. Continue to stash ideas away and add them to your future bag of tricks. The key issue here is that people need to be more critical and self-aware of the space where their problems lie, and not assume that because something is exceeding the envelope of what they've experienced that it's entering into special case territory.

Rather my point was that people, including myself, have a tendency to believe they're in an exceptional case when they're actually not. And thus will see discussions on sites like this and assume that's what they need to do. And of course they don't understand the tradeoffs and wind up not realizing they're actually making things harder for themselves.

The classic example is scaling issues where people, again including myself, assume they have exotic scaling needs simply because it's larger than anything they've seen before. When in fact by objective measures what they have is something that could run perfectly fine on 20 year old hardware and bog standard techniques.


And another related one, you’ll know when you’ll need it.

No I don’t. I’ve never used the thing so I don’t know when it’ll come in useful.


The point is really that you can only evaluate which of alternatives is better once you have working product with data big enough - else it's just basically following trends and hoping your barely informed decision won't be wrong.

Postgres is widely used enough with enough engineering company blog posts that the vast majority of NotPostgres requests already have a blog post that either demonstrates that pg falls over at the scale that’s being planned for or it doesn’t.

If they don’t, the trade off for NotPostgres is such that it’s justifiable to force the engineer to run their own benchmarks before they are allowed to use NotPostgres


Agree to disagree here. I see a world where developers need to think about (reasonable) scale from day one, or at least very early. We’ve been seeing this play out at ClickHouse - the need for purpose-built OLAP is reducing from years to months. Also integration with ClickHouse is few weeks of effort for potentially significantly faster performance for analytics.

Reasonable scale means... what exactly?

Here's my opinion: just use postgres. If you're experienced enough to not when I say that, go for it, the advice isn't for you. If you aren't, I'm probably saving you from yourself. "Reasonable scale" to these people could mean dozens of inserts per second, which is why people talking vagueries around scale is madenning to me. If you aren't going to actually say what that means, you will lead people who don't know better down the wrong path.


I see a world where developers need to think about REASONABLE scale from day one, with all caps and no parentheses.

I've sat in on meetings about adding auth rate limiting, using Redis, to an on-premise electron client/Node.js server where the largest installation had 20 concurrent users and the largest foreseeable installation had a few thousand, in which every existing installation had an average server CPU utilisation of less than a percent.

Redis should not even be a possibility under those circumstances. It's a ridiculous suggestion based purely on rote whiteboard interview cramming. Stick a token_bucket table in Postgres.

I'm also not convinced that thinking about reasonable scale would lead to a different implementation for most other greenfield projects. The nice thing about shoving everything into Postgres is that you nearly always have a clear upgrade path, whereas using Redis right from the start might actually make the system less future-proof by complicating any eventual migration.


Ack, agreed. But there’s a better way to communicate than making blanket statements like “just use Postgres.” For example, you could say “Postgres is the default database,” etc.

Don’t get me wrong—I’m a huge huge fan of Postgres. I’ve worked at Postgres companies for a decade, started a Postgres company, and now lead a Postgres service within a company! But I prefer being real rather than doing marketing hype and blanket love.


"Just" doesn't solely mean "only", this phrase is using it in the spirit of "just do it" - meaning "stop dawdling around".

"Just use postgres" is the snappy phrase you're looking for.


[flagged]


That wasn’t my intention though to mention my workplace multiple times for the sake of PR. Try to avoid being that, atleast on hn. :)

Unless you have YC funding. Conflicts of interest go brrr! HN doesn't even try to hide it.

We do the opposite, in fact [1]. Negative stories about YC-funded companies (or YC itself) are given greater visibility than equivalent stories about non-YC-related entities.

When YC-funded companies are featured in Launch HN or Show HN posts, it's clearly denoted in the title or top-text. YC-funded companies don't get preferential treatment in the rankings for any type of post other than Launch HN posts.

ClickHouse isn't a YC-funded company. We don't penalize people for mentioning their employer in the comments unless it's blatantly promotional, in which case it breaks the guideline against promotional posting, and such comments should be flagged by the community.

[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...


What I really want to know is, is it mighty mighty? And does it let it all hang out?

This is my philosophy. When the engineer comes to me and says that they want to use NotPostgres, they have to justify why, with data and benchmarks, Postgres is not good enough. And that’s how it should be

This could be similar to defaulting to a monolith service, and breaking it into micro services only when experience demands it.

Start with Postgres for everything, then use custom solutions where you are running into the limitations of Postgres.


Kids who grew up playing Carmen Sandiego will definitely remember it fondly

I played a bunch of that too, was that a cited source for it? Don’t remember. I do recall that the very-early-90s geopolitics simulation game Shadow President contained large portions of the fact book in its in-game information system (with citations, which is my first recollection of ever knowing of the thing by name)

I later leaned on the Web version of the factbook quite a bit for basic country stats in undergrad.

I don’t know of a replacement of comparable quality. Damn good resource. Not that you can necessarily trust a government source, and especially one from an intelligence agency, but most of what it covered wasn’t exactly useful for the kind of propaganda you’d expect the US government to push, so you could expect it to broadly be a sincere attempt at describing reality (it didn’t hurt that it wasn’t a super-widely-known resource outside certain academic disciplines, so lying about e.g. the major exports of Guyana or whatever wouldn’t have much effect anyway, lowering the likelihood that anyone would bother)


Whoops, I was mistaken, I was thinking of The World Almanac and Book of Facts that was included in the original version of the game as a player reference.

This is definitely a nit but is there any reason you need 2 decimal places accuracy for percent complete?

Nope, might change it based on the feedback

Must be US-based, it’s $6.14 for the cheapest shopping for me

And then for that shipping price, takes 4-6 weeks for delivery? Or you can have it 8 days for something like $130? I don't get it.

It's not the delivery that takes that long. It's the printing. It's a print on demand item, printed in the United States. The decks don't currently exist and the current print queue is just that long. If you want to jump the queue, that will be extra.

The legislations includes CNC mills.

AWS is often unnecessary but I do hope you had some kind of pre-prod environment in your original setup

What is Sorcerer here?

Thanks for this, I assumed there would be some more rigor behind this but it hardly seems credible, it relies mostly on anecdotes and "common sense".

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: