Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One way I try to get my head around things like this is to skip to a section I understand deeply and see what they said. Here, the claim is made:

Don't try to get a compliance certificate at the last minute. Preparing for and conducting an audit such as for PCI DSS or SOC 2 from start to finish is a lengthy process, ranging from six to twelve months for most startups. Starting early and maintaining compliance is cheaper than starting late and doing rework.

This is basically the opposite of the advice I would give a startup. SOC2 attestations in particular are easy to get, and are a waste of money to obtain preemptively before there are purchase orders on the line for them.

There are things you should start doing early that lay the groundwork for attestations, but you should be doing them anyways, even if you never plan to get a SOC2 (and if a big-ticket customer never demands it, you shouldn't SOC2). That's stuff like setting up single sign-on and having protected git branches; simple best practices.

Anyone else want to spot check other parts of this document? I wouldn't feel qualified to challenge most of it.



Great approach. I ctrl-F'd for databases, good info there generally. The only thing that gave me pause: a startup doesn't need to focus on SQL vs. NoSQL in 2025 with such good json support in the most popular SQL databases. Just use PostgreSQL or MySQL -- whichever your engineers have more experience with -- use CloudSQL or RDS which will take care of the hard stuff like backups and replication for you, use read replicas for BI with a good visualization tool, you'll be good with that for a good while before you need to fork over 5/6 figures for Snowflake or anything else.


> use read replicas for BI with a good visualization tool

Put up 2 or 3 read replicas, split your queries so writes happen to main and reads come from replicas (supported out of the box by many modern ORMs), and you can scale to millions in daily active users for most startup workloads.

Really the hard part of BI is that folks who need the info don’t wanna learn SQL. The ones who can do SQL, will struggle to keep up with your changing schema.


I give them Metabase. Metabase pointed to read-replica-3; and via Metabase API one can add lots of meta-data about tables and fields so the BI folk can point & click to build reports (and keep up with schema changes (which I mostly resolve with views anyway))


The hard part of BI is application developers not wanting to support a stable data model and changing the schema all the time, often made harder by BI people not knowing what they want and being stuck with a brittle integration.

Add analytics reporting views in your app database as the 'API' is the way.


> Really the hard part of BI is that folks who need the info don’t wanna learn SQL.

Data analysts are fine with SQL though. Every "get into data analysis as a career" course will teach you SQL (about 70% of what the querynomicon teaches [1]).

[1] https://github.com/gvwilson/sql-tutorial


> Data analysts are fine with SQL though.

Yes! I haven’t seen startups hiring these though. Somehow I always end up doing this as a side-gig on my engineering job.


Definitely - I've been surprised at some very complex pipelines built with pandas, etc. because someone didn't want to use SQL...


I was just commenting to a colleague recently about the significant improvements RDBMS have gotten for json support over the last decade. For instance, keys below the first level in Postgres jsonb fields were not indexable around a decade ago. Now you can do GIN index and other options that are rather sophisticated.


Agreed. I can't think of anything that would convince me today to use a document store over Postgres as the primary (or likely only) database. Most of the time JSON fields augmenting the RDBMS seems like the way to go.


My default position nowadays is “Postgres” and engineering should have to justify why it is insufficient if engineering wants to use something else. It’s worked pretty well


Hahaha, that is good, not justify why to use a certain tech, but rather justify why not just use postgres


This should be default decision-making process. If your proposition is to move the compromise scale please not only provide the benefits, but also the drawbacks and analysis on transition. This forces proposal to analyze existing status quo and reasons behind it, which is often enough for the proposal to be withdrawn.

Distant relative of 5 whys. We need NoSql document store -> so we can store json blobs -> so we can do databasing at app level -> because DBAs with their insistence on schemas are slow. Oh, so we can solve the problems by hiring one DBA and maybe training two devs instead of hiring full dev team and refactoring stuff for a year?


As someone working with datastore/firestore in a product first created around 10 years ago I wish you could have been there at the time. Running a migration to add a boolean field to all existing documents of a certain type took ~40 hours.

Funny thing is we are now migrating stuff out of datastore (and new stuff is not in datastore to begin with) into an RDBMS, but we are doing it microservice style with each microservice having its own separate database. So relationships are now cross-services concerns...

Not that having EVERYTHING in a single DB is the best approach always, but IMO we should default to keep everything in one single DB.


Yep, this is a sneaky great feature. Where previously you’d have a sequential scan unless you put in multiple indexes or a bloom filter, you can now get great performance and easy of maintenance at the same time.


> use read replicas for BI with a good visualization tool

Ugh. That sounds good on paper, but in practice it can become a problem. You're making your _database_ schema a part of the public API. It's an example Hyrum's Law, people will, sooner rather than later, start depending on internal details of the data representation.

And your development velocity will crater, as you'll now need to update all the reports (that are not necessarily even tracked in version control!).

Investing some time early to add code to pull out the data relevant for analytics can be worthwhile.

There's also a question of the personal information.


It can definitely become a problem. But if you’re at that point, you don’t need a guide that explains SQL databases to you. :p

Realistically this guide should be bifurcated in terms of scale.


>with such good json support in the most popular SQL databases

Wait, was that the reason people were doing NoSQL? JSON support? I thought it was about sharding, write scalability, etc.


Ah yea the old “web scale” phase. I think everyone’s more or less accepted that very, very few startup-level (or even SMB-level) workloads need more scalability then Postgres/mysql gives.

My favorite example is that Twitter used mysql for all tweets, writing ~5k/s 24/7/365, until about 2016ish. Well into being a public company with billions in revenue and 300mm+ MAUs.


Has everyone accepted that?

3/4 companies in the Bay Area senior software engineer interviews require a System Design interview where they will tell you "what if you had 10m users" and expect a distributed write-heavy sharding answer


You’re not wrong in the literal sense. But the “inside baseball” of that question is just that it’s a prompt to talk about how you would horizontally scale a system should the need arise. It’s not a prompt to start questioning whether 10mm or 200mm is the specific limit.


Well that's the thing. You don't need a NoSQL database to design a data tier that scales to accommodate distributed write-heavy workloads.


Lots of people were mad that my employer developed a new distributed NoSQL database engine, but it was literally just an API to encapsulate what an application doing "sharded MySQL" would do in its own data tier. A lot of this is a question of framing and storytelling.


Sharding, write scalability, and similar are the technical advantages that can matter at scale (and mattered a lot more before SSDs became so common), but I think for most users the only tangible ?benefit? was the schema less nature.


> use read replicas for BI

Yes this is good advice, until you get really large scale you don't need anything more fancy than some SQL in a read replica.


Yeah, in my experience, most companies who are going to 1) do business with early stage startups and 2) want SOC2 report, are going to be totally fine with writing “startup X will get their SOC2 type 1 in the next six months” into the contract and moving forward, so long as someone technical can get on the phone with their IT people convince them you are reasonably competent.


Made an account just to say that I respectfully disagree solely when it comes to accounting and supply chain processes in an enterprise ERP. Unwinding un-auditable processes costs so much f’ing time and money while the business still has to run that I’ve found it to be cheaper and better to be auditable from day 1, in this one specific instance.


I built one of the Trade Promotion Management platforms used in the NA market, and couldn't agree more. It's a nightmare trying to be auditable if you didn't think about it from the start.


I was at (insert infamous unicorn) and they spent a ton of money (all relative, the money meant nothing) but more importantly 18 months attempting to get SOX compliant and never made it, because running the business was too important. Of course it all came down to lack of leadership to enforce policies but even if we had it, it was objectively super fucking challenging.

When I do get a chance to implement compliant processes at the beginning, it’s one of those amazing IT things where we prevent WW3 but never get the credit for it.


There’s being auditABLE and being auditED. Honestly I think the article’s take is smarter for a less experienced or skilled founding team and tptacek’s is better for a more experienced team. Paying auditors to look at screenshots and CSVs is a giant waste of money until it’s not, but at the same time, letting bad practice ossify until it’s expensive to remove is also a mistake.


Yea agreed, my comment was more of a sidenote than a direct response.


I think this advice may vary in applicability across industries. If you're selling a B2B product that touches PII, you're definitely going to need SOC2 if you don't want to be laughed out the door during pitch meetings. And depending on your funding level, using an automatic SOC2 compliance checklist service like Secureframe may only be a few thousand dollars but will ensure not only that you are following those best practices but also in an idiosyncratically SOC2 manner that will make for an easy audit. Not a huge investment relative to the dev and project management time it takes to get onto SOC2 track with an organization that already has deeply engrained non-compliant processes in place.


Well, we run a public cloud, and before I joined up I spent the preceding 5 years at a consulting firm that ran the security teams of B2B companies that touched PII, including some in ludicrously sensitive problem domains (retail mortgage financing!) and I stand by what I wrote.

Further: while checklisting tools may only cost a couple thousand dollars, the actual process of getting a SOC2 attestation isn't the real expense. I could get OWASP WebGoat a SOC2 attestation if I wanted to (a ham sandwich would be even easier). The actual expense in SOC2 is the engineering work you do in support of it. Those checklist tools are fine if you know exactly what you're doing and don't let them add any engineering work, but what I've seen happen repeatedly is a SOC2 checklist from a tool leading a team into building a pasteurized process cheese food security practice, with IDS and WAF and server agents and code scanners and Nessus scans, at great expense.


I am new to compliance but this seems super strange to me. Based on my cursory read of SOC2 you need a ton of evidence gathering for months leading up to your audit. How wold you know what to retroactively have if you didn't spend time on it?

SOC2 attestations being easy to get also runs counter to what I have heard from every single other person on this topic. Generally what I hear is that it is extremely hard and time consuming. What am I missing? I would love to be wrong here and for this to be easy.


Using something like Vanta or Drata makes life a lot easier. I've done SOC2/PCI audits in fintech where we change tools every year (meaning we reinvented the wheel every year), and I've now done it at my own startup using Drata. Auditors feel more comfortable, you'll feel more comfortable, etc. Even if you're not planning on doing it right away, just sign up and have it start tracking your progress.

It's time consuming, but not all consuming. I think I spend <2 hours a week on compliance now that we're set up.

The "fun" part was engineering ways to implement things like PHI scanning and WAF protection as cheaply as possible. There's almost always a nearly-free cron job/python script/slackbot alternative to every "mandatory" 5-6 figure SaaS subscription in the space.


By all means use tools like these, but be very careful, because they (and auditors that use them) will lead you into engineering changes that are not required for SOC2 and may not be what's best for your team. For instance: there is absolutely no need to set up PHI scanning or a WAF to get SOC2.


My startup has to maintain a HIPAA cert, hence PHI scanning. But, you are correct.


I'm a few years out of date, but I don't believe that any sort of PHI scanning is specifically required by HIPPA either, though I've seen plenty of consultancies happy to sell you it.


I posted two guides downthread. It's hard because people make it hard, or let people make it hard on them.


The section on performance management is circular and vague: a good one is motivating and a bad one is demotivating. OK. Glad we got that out of the way.

The whole intro reads like a puffy resume and lots of gilding. Even a section of gushing testimonials.

And he puts his name on the title so you don't gotta read the author byline. Total cheese.


The section on performance management is at least five pages long, and it covers compensation, leveling, job titles, PIPs, and firing. Perhaps you mistook the introduction to the section for the entirety of the section?


> There are things you should start doing early that lay the groundwork for attestations, but you should be doing them anyways, even if you never plan to get a SOC2 (and if a big-ticket customer never demands it, you shouldn't SOC2). That's stuff like setting up single sign-on and having protected git branches; simple best practices.

This is in many ways the spirit of SOC2, no? There are a lot of startup founders, far more than I'd like, who would purposefully eschew such "simple best practices" unless they had an axe like a SOC2 audit swinging over them.

I think you're both right, for what it's worth, and my take is that you are more aligned with TFA than you perceive.


How are we both right? I think you literally should wait until the last minute to start a SOC2 process.


GP's point is that having SSO and protected git branches _is_ starting the SOC2 process.


I'm pretty sure that's not what the author meant. Again: those are things you should do regardless of whether you're ever going to get SOC2 (and a lot of startups shouldn't).


That and having a ticket system (e.g., Jira) to track why you touched prod and you can answer just about every question.


We don't have that, and didn't need it for SOC2.

(We have other ways of tracking prod changes, but our auditors don't know anything about them.)


I think that is what author meant actually.

Downside is there is a lot of startup founders that will need help getting the basics in place.

I worked in place where 2 business guys hired 4-5 freelancers and as freelancers took high salaries not even one of them had any clue about setting up infra or SDLC let alone secure SDLC. They would write the code and not give a damn about anything besides that.

Business guys thought they have great technical guys because they were expensive.


You absolutely do not need an SDLC process in order to get SOC2 attested.


Of course not, that was just part of the story to draw the picture. Where it might be required to pay for some consultant that will help with initial setup.

But maybe not go full attestation mode right away - but also tricky to find one.


I think this stuff is highly folkloric and that any startup that picked a reasonable high-touch auditor and talked to some friends about their experiences could get through a Type 1 with virtually no effort (outside of their bizops team, who the auditors will definitely harass).


SDLC?


Software Development Life Cycle


Just wanted to +1 this comment and say Vanta made SOC2 way more intimidating than it was.

What made it easy was talking to a startup that wanted soc2 and had it themselves who recommended an auditor who helped us untangle what was actually required.

It took a couple of months to get type 1 from start to finish with very part time attention.


It's a good idea to just not do stupid shit that would make it very painful to actually get compliant. Get vendors who have certs, keep infra minimal (which means not infra team). The more you do in house the more painful compliance will be. Buy, and buy from certified providers, simple. Manage identity centrally, keep all your secrets in a secret manager, use git and do code reviews. You're right all things you should be doing anyway.


Doesn't "Buy, and buy from certified providers, simple. Manage identity centrally...." contradict each other?


Manage identity centrally is probably referring to using an identity management system like Okta, Microsoft Identity, or hosting your own IdP and using strong hardware 2FA. You don't want people creating their own accounts manually for everything or shared accounts that everyone knows the password for (or is on a shared spreadsheet that the entire company has access to).


At this point most startups would just use Google; since they're almost certainly using Google as their email provider, and "company email" is a de facto root-of-trust even if you don't intend it to be, there isn't really a whole lot of thought that needs to go into it. It helps that they have the best 2FA stack of any mainstream cloud service.


Exactly; lots of over engineering/pre-optimisation in this. It's less for startups and more for startups-burning-vc-money-while-team-builds-resume.


Do you know of a good resource which describes these simple best practices?



Thanks!


Having gone through quite a number of compliance audits... the one thing that is good in that advice, is that many items in an audit are just a checklist of questions, such as

do you have a policy for XYZ?

or confirm you have a process for "thing"

So what ends up happeneing is if you feel stressed about an audit, just getting a list of the audit, you will realize how much you can just say "yes" to and feel less daunted by the audit.

So, its a good self-check even if youre just crossing out the things you should have already have a framework for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: