Introduction to Flat-File CMS

tootie · on April 24, 2020

My company does a lot of CMS work for clients. Everything from static site generators up to giant enterprise tools. One thing I've determined is that the giant enterprise tools are nearly worthless even for the enterprises they're designed to serve. They add giant layers of complexity to service features that look great in a demo and serve no business purpose. Then they become a technical ghetto full of proprietary plugin code that require product expertise and are inaccessible to the rest of the dev team. That and they cost $100K-$1MM and more for licenses. It's possible to implement them well, but only by applying the same kind of design principles you would to simpler systems. Even when I'm working with a client that doesn't blink at the price tag, I'm 100% onboard with headless or flat files.

pjc50 · on April 24, 2020

I'm reminded of the now-vanished JWZ post about "groupware". Enterprise software is nearly always awful in lots of ways that the purchaser insists on. They don't want a cheap, general solution that empowers users. They want an expensive, high-profile, highly customised solution that railroads users into workflows. Often it's less important what the specific railroad is, just that it can be imposed.

Proprietary plugin code serves the need of the purchaser to feel special by getting something customised for them. They can then pretend it's a secret sauce for their business, despite it actually impairing employee productivity compared to, say, stock Wordpress.

teddyh · on April 24, 2020

> I'm reminded of the now-vanished JWZ post about "groupware".

It’s not vanished; here it is: [redacted]/doc/groupware.html

cpach · on April 24, 2020

Try this link instead :)

(JWZ has anti-HN measures on his site.)

https://web.archive.org/web/20050217051819/https://www.jwz.o...

tootie · on April 24, 2020

You should delete that link. It looks like whoever owns that site left an, uh, easter egg, for HN visitors. A rather inappropriate one.

teddyh · on April 25, 2020

It seems that the moderators have silently edited my post to remove the link. I have “referer” headers restricted in my web browser, so I did not see the redirect.

It seems that the so-called “vanished” article would be more appropriately named the “disappeared” article.

cpach · on April 25, 2020

It’s Jamie Zawinski’s site. And unfortunately he doesn’t like Hacker News.

Joe8Bit · on April 24, 2020

Couldn't agree more.

It's really interesting to see a new generation of vertical specific, large scale CMSs emerge (like from publishing with Chorus from Vox[0] or Arc from the Washington Post[1]).

My problem with 'Enterprise CMSs' was always that they tried to sell one hammer for every single type of nail, and actually, use cases for content management are so varied (and often specific to verticals) that you buy a big generalist CMS and end up having to customise and mess about with it so much you're always left in the situation you described.

[0]: https://getchorus.voxmedia.com [1]: https://www.arcpublishing.com

tootie · on April 24, 2020

I think the maturity model for CMS is something like flat file to headless to homemade. If you ever reach the point that you have enough scale of content to require a dedicated editorial team, it's probably worth investing in something very domain-specific and tailored to your organization. Could be based on an existing headless or flat file system.

gunn · on April 24, 2020

Hey, since you know about this space, could I contact you to to learn some more about it? I'm developing a product that can act as a CMS but is more general and I think more powerful. If you can email me, I'm arthur@everdb.net

rograndom · on April 24, 2020

I was being pitched by a "Enterprise" CMS company. One of their selling points, listed several times in their collateral, was:

* No need to dig through dozens of plugins to find the one that actually works. Since "EnterPriseCMS" is .NET, you can just write your own!

neya · on April 24, 2020

I started out re-writing the core of Wordpress to Elixir around last year. Once of the decisions I had to make was with databases. I started out with flat files as well and realized they can quickly grow complex when you have more than a handful articles. Eg. If you want to add some new attributes to the flat files, they can't be backported to the older articles if the number of articles is quite large.

Guess what? That's the problem that the SQL databases already solve.

So, to circumvent this, I searched for a database that can be version controlled but doesn't require a separate database server. And I found one - SQLITE.

It's been amazing so far. It can be inside my GitHub repo, while giving me access to normal SQL queries via my ORM. With a static file solution built on just flat files, complex querying is definitely not possible. With now over 30,000 posts in the SQLITE database, my static file generator is still absolutely fast.

I now use my solution for a high traffic client with 3-4 million visitors a month and it works really well for them.

One downside of SQLITE is multiple concurrent writes, but so far it hasn't been much of a problem yet.

chrismorgan · on April 24, 2020

> If you want to add some new attributes to the flat files, they can't be backported to the older articles if the number of articles is quite large.

This isn’t a reasonable claim. In SQL databases if you want to do this, you have two options:

1. Go through and update every row to add a value for the new column—which you can do in flat files just as effectively, though almost certainly more slowly because files are comparatively slow.

2. Define a default value—which you can also do in flat files just as effectively and just as quickly.

In this vein, I tend to find the problem to exist in precisely the opposite direction: you need to apply some sort of transformation to all the database entries, for example to update a markup pattern or links to a certain domain. If it’s flat files, you can use standard text-processing tools and easily do anything (presuming you’re a developer). But if it’s in a database, you probably need to go through some database driver which will probably be a pain to deal with for anything involved, and there’s a fair chance with this style of app that you won’t even have access to that and will just be completely stuck.

For the likes of blogs, I hate having to deal with database-driven apps. Give me a set of files that I can work with every time.

Notwithstanding all that: 30,000 posts is quite a bit as most flat-file systems go, and a well-architected SQL database certainly should perform somewhat better than a well-architected flat file system, simply because the overhead per record in a database is fundamentally lower.

Things like migrations and batch operation and revision management just tend to be way more flexible on flat file systems.

neya · on April 24, 2020

When I say flat files, you're probably thinking just posts, but there's more to something like Wordpress than just posts. For example, there's global site options, there's individual post options, categories, taxonomies, etc. You get the idea.

Sure, all of these can be maintained with static files as well. And that's what we did initially. But, over time, it becomes a maintenance nightmare. Plus, nothing like going into a nice fluid UI and checking/unchecking checkboxes. One common argument is these interfaces can be had with flat file CMSes as well. But then, you will end up re-writing an ORM for the filesystem which goes back to my original problem - SQL databases and their ORMs already solve this problem.

It's easy to think flat file CMSes when you're dealing with small files. When dealing with over 30,000 posts, you need to find that file in a directory, open it up and you'll see a totally different structure of the file compared to the most recent ones. They lack consistency. In a database, if I added a new column all old values will be prefixed with null. But the columns are the same for every entry, unlike a static file CMS.

Finally, think also about complex queries. Have you seen Wordpress's queries? To generate a simple Navbar menu, you will make so many joins. Especially if you are trying to render menus under different categories (eg. on category pages).

Or, how about displaying related posts? It's much easier to query an SQL database, to get something related to the current post. Good luck doing that on 30,000 files whose contents reside inside of the file itself. There's no argument here.

All I'm saying is, we tried all that flat file CMSes have to offer. At a scale of 30,000 (now it's actually close to 50,000) posts, it's no joke. In fact, we plan to open source our flat file CMS'es ORM. It's great for small blogs. But for big publishers, you need to use a hybrid approach.

seanwilson · on April 24, 2020

> With now over 30,000 posts in the SQLITE database, my static file generator is still absolutely fast.

Don't you then miss out on human readable Git diffs when posts are changed? And now you can't edit posts with a standard editor or e.g. the GitHub web UI?

Also, what language are you using? Shouldn't reading in 30,000 posts and building some stats/relationships/lookups on them be pretty fast?

PudgePacket · on April 24, 2020

> Don't you then miss out on human readable Git diffs when posts are changed? And now you can't edit posts with a standard editor or e.g. the GitHub web UI?

You never got those with wordpress either. Seems like they're going for some kind of hybrid. Trading some of the simplicity of standard static file generators for some of the power of a tool like wordpress.

I quite like the idea.

Nothing to stop them from storing historical versions in the DB, then it wouldn't be much more logic for the site generator to spit out the site at X point in time, or a diff. Whether that's a feature you'd want is another thing.

> what language are you using?

"re-writing the core of Wordpress to Elixir"

neya · on April 24, 2020

For simple lookups and stats, they are fine, but imagine complex queries. Best use case? Related posts after every post. With 30,000 posts, your generator will slowdown drastically if it's file based as opposed to some JOIN query on an SQL database. We tried this and it didn't work. Related posts is just one example, think of big publishers' use cases - dynamic menus, ad management, etc.

jakearmitage · on April 24, 2020

What problems did you have with concurrent writes? I just lock stuff and people have to wait. (i literally lock the page if there's someone else there)

neya · on April 24, 2020

Yep, same approach we take now as well. Big publishers have multiple writers working on the same article/site design elements sometimes. It's a PITA.

corey01 · on April 24, 2020

What ORM do you use to interact with your database?

neya · on April 24, 2020

Elixir has an obsolete SQLITE ORM which works pretty well for us till this point. At some point, we will have to upgrade it or maintain it ourselves I guess.

nednar · on April 24, 2020

Just don't copy the wordpress plugin system.

subpixel · on April 24, 2020

I’ve been an API-based CMS fan for a while but at the end of the day it’s a huge bet on the viability of the vendor, because good luck migrating away from an API-based CMS.

No way all of the myriad vendors in the headless-CMS world survive this downturn.

I think that needs to inform any CMS decision.

Ultimately I suspect there will be vertical integration at play with the companies raising money to be the Wordpress of the 21st century. Wordpress is a CMS after all.

akie · on April 24, 2020

I company I worked at had 20 or so sites on a headless API based CMS. Basically you’d enter structured data in their proprietary backend, and the site would request that data as JSON via the API. The marketing department hated it with a passion, because there was no way to do “one-off” things without involving IT, and IT hated it because marketing needed permanent handholding for the simplest of things.

After about a year of paying €8000 a month for hosting they replaced it with one really good 100% custom WordPress theme.

mymmaster · on April 24, 2020

How did this remove the need of using IT to do one-off things?

bigbassroller · on April 24, 2020

WordPress has a publish button, post scheduling and revision history all out of the box with no plugins. That right there eliminates most of the need of IT in day to day publishing.

andrewingram · on April 24, 2020

I don’t follow. Everything you’ve listed is a feature of most headless CMSs I’ve investigated.

bigbassroller · on April 26, 2020

Headless CMS may have that feature but then you have to deal with a more complicated deployment pipeline that a marketer might not be as comfortable with as WP.

andrewingram · on April 27, 2020

There might be a misunderstanding here. Headless CMSes don't mandate deploys, that's only if they're used in conjunction with static-site generators. A headless CMS is just a CMS that provides content as structured data over an API rather then pre-bound to a rendering layer. You can use WP as a headless CMS if you use its content API.

twicetwice · on April 24, 2020

Why would it be particularly difficult to migrate away from an API-based CMS?

Grumbledour · on April 24, 2020

The thing with flat-file cms is, they are often represented as a category of cms, but they are actually not all the same.

There is the type that just saves its data in file form, just like you would in a db. And I never get why this would be desirable. You loose all the benefits of a db system without gaining much.

The other type, the interesting one, is the one where you can work directly on those flat files, not needing some kind of admin interface. Now this can be great!

But it often seems more flat-file cms belong to the former category. I do not know why one would prefer these if you still need a server with a dynamic language and a web interface to edit content. Just use a db!

I never heard of headless API CMS Systems before, but it honestly sound really silly. If you write a front end that consumes JSON, you could just as well write a front end that made db calls. And if you must have JSON, write a wrapper around your db. Having that stuff hosted by someone else is just peak cloud nonsense.

wolfhumble · on April 24, 2020

> The other type, the interesting one, is the one where you can work directly on those flat files, not needing some kind of admin interface. Now this can be great!

I am not sure I understand; could you please explain? That sounds like working directly on flat html files; or are you talking about working directly with a DB that then writes to flat files?

Thanks!

Grumbledour · on April 24, 2020

I was thinking about something like grav cms.

You put markdown files in a folder and it creates html for you when the page is requested. When you use their web based admin tools, they also just change the markdown files. Easy to use from web and local.

I guess many static site generators work the same and we had good success teaching our content creators some markdown. For many, it is often easier than navigating something like drupal and working with the kinks of whatever WYSIWYG editor.

mahesh_rm · on April 24, 2020

Think about a api.json file served by s3, which you can openn with vscode, and update. Your html fetches it's content through xhr.

z3t4 · on April 24, 2020

Everything becomes more simple if the content is static/immutable as long as it's small... The advance in networking and device storage has made this feasible. Rather then requesting what you need, you get the whole database sent to you and stored locally.

nsomaru · on April 24, 2020

Is there a pattern that securely allows JS clients to make arbitrary SQL queries? Or is this a feature of REST? Maybe this is why people invented GraphQL? But then that comes with its own issues...

Either way, is it as simple as you’re making it out to be?

Grumbledour · on April 24, 2020

I would argue if your frontend is just a js client with no server component you are already doing it wrong.

Of course, that then becomes a much bigger discussion about websites vs apps and the current state of the web which we have on here quite regularly.

blondin · on April 24, 2020

very interesting how these things come and go. one day we talk about flat-file frameworks and, for a long period of time, we forget about them. i remember when joomla, wordpress, and the other bazillion php frameworks were all coming about and we debated flat-files vs database-backed frameworks.

one that stood out at the time among flat-filers was textpattern -- not sure if the very yellowish branding was the reason.

databases won over flat-files though. primarily because the web needed to scale. it is also worth mentioning sqlite which at the time was, and still is, a perfect middle ground.

omnimus · on April 24, 2020

I am not so sure the scale is main reason. Most wordpress sites are static content and run db+site on same server. They are scaled using caching which sort of makes it like static website.

The same approach is used with big flat file sites. Editing on dynamic portion and viewers see cashed site.

Many big sites actualy use cmses as static site generators. But cashing is not that different.

techntoke · on April 24, 2020

Hugo is a great flat-file CMS, although it doesn't come with an admin interface or editor built in. Would love to see something as fast as Hugo that I can just point at a directory of Markdown files with front-matter/data that can edit/commit files from a UI using Git that runs locally.

earthboundkid · on April 24, 2020

NetlifyCMS works with Hugo to create a decent GUI, but it works over the REST API to Github rather than working locally.

mickael-kerjean · on April 24, 2020

I did build a tool for this: https://github.com/mickael-kerjean/filestash. The idea is to link your github repo and let people use it (for example: https://demo.filestash.app/login?next=/files/_posts#type=git...) or create a shared links to make it really easy for anyone to manage the repository without requiring any knowledge of GIT https://demo.filestash.app/s/jekyll

netsharc · on April 24, 2020

I've been using Hugo, but the template syntax is a pain to learn/get right...

gnabgib · on April 25, 2020

The template syntax is actually from go[0] which you can find some guides[1] for. While the Hugo website now mentions this, I'm not sure it always made it clear the syntax is not its own (it just seems to be frustratingly light on some use-cases). Once you understand the syntax though, it applies to more than just Hugo.

[0]: https://golang.org/pkg/text/template/ [1]: https://gowebexamples.com/templates/

panphora · on April 24, 2020

I'd like to see someone create a CMS that uses HTML files to store data and let's you query and copy data between the files using CSS selectors. This, to me, would be the simplest CMS.

You could just do:

    <div data-copy-content-from="#about-page .profiles"></div>

And that would go into your about page, copy the content from there, and insert it into the current page.

If you built it out well enough, you could do everything using just static HTML and CSS files and just syncing the content whenever it changes.

I've been thinking about this idea for a while. I even started my own framework to try to make something like this work. I'd love to know what other people think.

raving-richard · on April 24, 2020

Check out Server Side Includes. They used to be all the rage.

You don't need to reinvent anything. Just use what used to exist.

(Also, you start doing SSI, and then you want a bit more power, so you switch to PHP, then before you know it, you go all out with PHP and MySQL. Finally, you say, "MySQL sucks, SQLite is much better and easier" (see also the other comment about SQLite).

In the end you realize that nothing matters. We all die alone in the end.

panphora · on April 24, 2020

Yes, I've been down this rabbit hole before :)

I'm not just looking for static includes, but dynamic, real-time, editable includes built in to a framework that uses a single syntax to describe a full web app. I'm tired of using full programming languages for describing app data — and how it can be edited — when so much of what a web app can do has already been implemented a billion times.

PHP is great for that, but, in my opinion, is 500% more powerful in areas that don't count (low-level language constructs that make it Turing complete, but not ideal for building real-time, editable web apps) and 10% as powerful as it should be in areas that do count (i.e. performing higher level tasks automatically, like structuring data, syncing data between pages, moving data around, rendering data to the page easily, etc.).

johnchristopher · on April 24, 2020

Wouldn't it be like reinventing XSL in the end ?

panphora · on April 24, 2020

Yes! I'd never heard of XSL until today, thank you for pointing me to it! This looks exactly like what I'm trying to do.

Could you point me to any resources on XSL that would make a good starting point? Or any XSL libraries that help you produce dynamic web applications?

yohannparis · on April 24, 2020

I mean, you kind of describe web-component.

Or if you are old-school, PHP templating, you can do a simple: `<div><?php include('about.html'); ?></div>`

This is how I did most of my static files website 10+ years ago. I had an `index.php` file that based on URL will load the content of an article. This is what PHP was created for.

panphora · on April 24, 2020

I guess I'm looking for web-components with really powerful APIs. I could see it happening within the next couple years: building a full web app using only 4-5 new concepts on the front-end, all powered by web components.

Any good resources you'd recommend on that? I've been looking around and it seems like a lot of projects are still in the early phases.

Also, yes, PHP is awesome. Just not built for the modern web. Still is a really impressive language, however. I just wish there was a PHP equivalent built specifically for building dynamic, real-time web applications.

winrid · on April 24, 2020

Why not just use JSON paths and store the data in JSON?

panphora · on April 24, 2020

Actually, that's exactly what I ended up doing.

Using HTML alone ended up being too cumbersome. I needed a single source of truth that I could serialize the data into and JSON ended up being perfect.

I'm still holding out hope for a single XML-like language that can describe web apps in full, however. The tools we use now are way too broad and powerful compared to what you end up implementing with them 99% of the time.

adim86 · on April 24, 2020

What are good headless CMS engines? There are so many flavors and no real community or indication of which ones will survive, is there an open-source equivalent to Wordpress for headless CMS?

tomduncalf · on April 24, 2020

I've been pretty happy with https://getcockpit.com/ for smaller sites. It's flexible and reasonably robust. Not perfect but if you know some PHP and JS the code is fairly easy to follow if you do need to tweak stuff.

mymmaster · on April 24, 2020

Depends what you're trying to build. Customers have good things to say about https://buttercms.com/

_630w · on April 24, 2020

Strapi, hasura, fauna etc comes to mind.

rob-olmos · on April 24, 2020

Somewhat related, I've been fond of flat-file or file-based configs as well. Such as Nagios/Icinga has configuration files, but now seem to suggest a database & UI/API.

File-based config made scripting the generation of the configs pretty easy and updating for changes by just overwriting the files and reloading the daemon.

IIRC, similarly TWiki also uses individual files for the pages and configs with RCS (Revision Control System) per-file to track changes.

xorcist · on April 24, 2020

Especially since Puppet/Salt/Ansible does that part for you.

Apply the "web server" class to a node and there is nothing to forget as the relevant config templates are applied both to the monitoring system, the backup runner, and anything else.

Configurations that live in a database somewhere is much harder to manage as they have to be set by a config running which needs to be written and maintained.

tomcooks · on April 24, 2020

I've made a PHP flat-file CMS to manage my own website and for a while it was the very first result when searching for "fuck wordpress"

_vdpp · on April 24, 2020

Using a flat-file based system appeals to me, but one question I've never had a good answer for is at what sort of scale does a certain database make sense?

Aside from questions of ACID, I get the idea that a lot of people jump to databases because they're uncomfortable with file operations, frankly. But obviously they make sense for other use cases - what is the decision matrix? Just a gut feeling?

omnimus · on April 24, 2020

Flexibility - most flatfe CMSes ive tried (i am heavy user) model the data in files as yaml documents so it becomes very similar to nosql dbs. So advantages and disadvantages. So it easy to have custom data structures for every page. But its harder to query, sort etc.

I wouldnt build ecommerce site with flat-file because there is lot of tabular repeated data like orders, customers, products.

But for content sites its perfect. Simpler and flexible. Its awesome for site builders with different blocks of structured content.

Worth noting that with flatfile you can use sqlite for those repeated records and other way around many sql dbs support json columns that you could use similarly to flatfile/nosql.

klodolph · on April 24, 2020

If you’re strictly asking about scale and not features, the breakpoint is just a question of whether the data fits comfortably in RAM and whether the front-end fits on the same machine. Once the dataset is too big for RAM or once you need multiple front-ends, you’re reimplementing ordinary database features in whatever home-grown system you have.

_vdpp · on April 25, 2020

That's helpful, thank you.

severak_cz · on April 24, 2020

I used few "hybrid" CMSs from this point of view:

- dynamic wiki backed by a folder of markdown files

- dynamic CMS backed by single SQlite file

- classic dynamic CMS which dumps pages as files while editing, effectively making them static on reading

uxamanda · on April 24, 2020

I’ve been looking for a folder based wiki - was this a custom tool you made?

omnimus · on April 24, 2020

Dokuwiki is working well but its pretty oldschool. Meaning not fun to develop for.

uxamanda · on April 24, 2020

Ah, was hoping for something where I could simply add markdown files to folders and it would do all the routing for me. I have a setup that works like this with Flask flatpages[0], but it seems like there has to be a simpler solution out there somewhere. I've been unable to find it though. :-)

[0] https://flask-flatpages.readthedocs.io/en/latest/

omnimus · on April 24, 2020

Well by "wiki" it means to me that it saves history of changes and manage user access etc. That probably requires more complicated file setup than just markdown files.

If you want just markdown > website with some administration to edit you can use pretty much any flat file CMS. Kirby CMS is highend but you need buy license. Grav is ok open-source option but i don't like the admin part too much but its atleast active. You can find many more https://github.com/ahadb/flat-file-cms but flatfile cms often becomes pet project that people stop maintain so watch out for activity.

patchtopic · on April 24, 2020

it's not perfect, but I had a positive experience recently creating a small web site with Publii:

https://getpublii.com/

ChrisArchitect · on April 24, 2020

add (2019)

outdated article/spammy