Ask HN: What is the most complex software that you have built/worked on?

sema4hacker · on June 16, 2023

During 1978-1979 single-handedly writing all of the data processing programs necessary to automate all departments in a 150-employee manufacturer of computer terminals on a 32KB RAM 50MB HD multiuser Pick O/S Microdata: order entry, purchasing, shipping, receiving, payroll, accounts receivable, accounts payable, MRP, QA, work orders, etc, etc, etc.

Grazester · on June 16, 2023

Did they use this up until the 2000's? It's the kind of software that sounds difficult to replace.

tinktank · on June 16, 2023

That sounds like fun

8chanAnon · on June 16, 2023

Hmm. A lot of software is unnecessarily complicated. This has to do with "too many cooks" or "design by committee". The term "complex" is relative. Is a trading system complex on its own merit or is it the process of building it that is complex? I haven't worked on a major project so I can't say.

I took a programming course many years ago. Not because I needed to learn programming but to get job interviews. The instructor asked us to write some C code to perform some small task. I completed the coding in about 5 minutes and it was just two lines (my prior experience was 6502 assembly not C). Others in the class were writing screenfuls of code. I imagine that the same situation occurs on large projects where mediocre programmers write more code than required for the task. Projects get bloated and more complicated as a result.

Another anecdote. I wrote a small operating system for a piece of hardware known as "touch memory" made by a company called Dallas Semiconductor. Intriguing little device, I must say. This was 20 years ago. My boss met up with the owner of a company that was attempting to build what I had already built in just two months. His 4 programmers could not accomplish much of anything over the course of an entire year. I guess that means that I was 24 times more effective. The company went bankrupt.

The most complex project I worked on? I dunno. Maybe converting PDF into braille. Would you say that is complex? I wrote all of the code from scratch without using any third-party tools. That's another area where projects go wrong. They rely too much on closed-source middleware that their own programmers don't understand. It's a recipe for disaster.

8chanAnon · on June 16, 2023

Oh wait. I'm working on a project right now. Never mind the PDF-to-braille thingy. How about a proxy server with HTTP, HTTPS, Socks5 and websocket? Not complex enough? How about routing to Tor and I2P? DNS and DNS-over-HTTPS? It's a local proxy server so it supports local disk access from a web browser. Trapping requests to modify them or write them to disk? The purpose is to support browser apps to do things like reading Twitter or getting Youtube videos (or whatever else you would want to write an app for).

It runs on Node.js and the entire code base is just one file (under 100k). No third-party includes.

Took me three years to get it to the point where it is now (I spent most of that time wondering what to do next).

Is that complex? Dunno. I'm trying to finish the documentation but, damn, there is so much to cover! It's a beast. That's really why I'm here. Gonna announce something soon.

skulk · on June 16, 2023

If it's something you want others to use, why not reuse audited libraries? For something for which security is important, no one wants to audit your homegrown HTTP parser/TOR implementation/etc.

8chanAnon · on June 16, 2023

It's not as extensive as you're suggesting. HTTP parsing is handled by Node.js and it merely connects to an existing Tor server running in the background. The thing is, if I use any kind of third-party software then I end up with much more than I'm needing. For example, what is involved in running a websocket? Mention websocket to anyone and they'll tell you that you need a package like ws or something. Fact is, basic websocket functionality can be had with a few dozen lines of code. Same with DNS. Think you need a package for that?

Besides, even a well-designed library can be fatal for security if it is not handled correctly.

nocubicles · on June 17, 2023

Currently working on module for ERP system that handles milk price calculations. The idea is to collect milk from farms and then based on some qualititive measurements, and fat/protein and lactose to calculate the price for the farms. There's so many factors that the price calculations include like the measurements but also contracts, quantities etc. I think its the most complex stuff so far for me.

looping__lui · on June 17, 2023

If you have a lot of input-output relationships then some ML/XGBoost may take away that pain. Unless you are dealing with very explicit guidance on a billion IF-ELSE relationships…

dfinninger · on June 16, 2023

Not the most complex software, but definitely the most needlessly complex software.

I worked for a company where IT owned customer onboarding for historical reasons. Each customer had their own DB in the same server so that each site would be isolated. So onboarding was basically cloning a “gold DB” and templating some values.

The script that cloned the DB was over 50k lines of Batch files (Windows .bat). There were so many includes, GOTOs, random black holes, passwords in plain text, reliance on special IP whitelisting in the AD server… it had been grown organically by someone who didn’t know how to program and it was a major mess.

It took me three weeks to figure out what was happening, and another to make sure the details were correct.

I replaced the whole thing with <10 lines of an actual scripting language. Proved it worked. But the management at the time didn’t want to change the process…

My Facebook rant at the time got me invited to speak to an upcoming class at my Alma Mater about software quality.

urbandw311er · on June 16, 2023

There is no way you replaced 50,000 lines of batch files with 10 lines of a script.

doubled112 · on June 17, 2023

I'm positive there's some exaggeration for effect, but I've run into some Powershell scripts for that have grown into 1000+ lines and I'd guess roughly 2/3rds of them were in functions never called.

The call is removed, but the function remains. Fields in a form stay, unused and unpopulated, but the validation is disabled and left behind. Things like this.

No version control meant it was hard to go back, and people are afraid to break things.

qup · on June 17, 2023

I once replaced ~2500 lines of ruby with ~450 lines of...ruby.

That was some shit.

Mostly it was copy/pasted data (strings) with slight modifications eating up a lot of lines. A few big chunks were in a code path that could not execute, as well.

It was just really bad code.

LoganDark · on June 16, 2023

> the management at the time didn’t want to change the process…

Every time I see something like this it makes me want to scream. This is not how you develop software, setting all implementations in stone. Ugh

retrocryptid · on June 16, 2023

Nuclear power plant simulator. The actual controls were pretty straight-forward, but the simulator was MUCH more complicated because it had to take all the inputs that the actual moving bits in the power plant took, plus simulate how it thought the system would behave.

phendrenad2 · on June 16, 2023

The BSD kernels (Net, Open, and Free). So many interlocking layers of C macros and typedefs, so many layers of "security features"(1) that are copy-pasted all over the codebase (such that a very simple syscall might be two lines of actual code and 200 lines of boilerplate copy-pasted security features). But I got my tasks done, and swore to never touch kernel dev again.

(1) - Actually, what I'm calling actual security features, plus jails and zones and Linux compat layers and performance counters and debug systems and obscure compile options and features that haven't been used by a soul since 2004.

extasia · on June 18, 2023

This is sort of what puts me off trying to do kernel work tbh.. I'm a C intermediate but dealing with mountains of other people's macros scares me to death.

akasakahakada · on June 18, 2023

Symbolic math solver & compiler.

Only 5000 lines of code but the complexity is insane. The engine deal with a a subset of math equations in quantum field theory. But the deal is that I have to make it compile several optimized math code in several situations.

I would be nice to treat those math ops objectively but it will be too slow. Inmutate will also be slow. So inplace modification is everywhere in the code.

Also the problem is multidimensional. Mathematicaly, I am working with tensors, matrices, scalars. Different treatment for each of them in different compile strategy. Programmatically, different situations sometimes need me to modify only one line of code, sometimes need to modify a block of code. Sometimes a block need to be coherent with the module it sits in, sometimes the module depends on states of those blocks.

Everything is entangled and there is no obvious way to do it right. Applying clean code strategy like "do not repeat" or "keep operation on same level" or "do not make side effect" all suck. Compiling the largest part will then take 1 hr, but now I happy with my messy code base that can compile in 1 min. Because of this, debugging is so much easier and I can just verify arbitrary line of resulting code on the fly.

I have been rewritten the whole thing 6 times in 1 year to try which programming decision is better also extending functionalities. Still a mess.

anonzzzies · on June 17, 2023

We have been building our own database and programming language; we are 3 years in and almost ready to launch. It is the most complex thing I have worked with.

voodooEntity · on June 16, 2023

Well im about to release the alpha but basicly its a project i worked on ~4 years in privat time and consists of two parts.

1. A custom in-memory graph storage/database which is threadsafe and designed for fast multithreading purposes. It also comes with a custom query builder/language which can be transportet via json so viable for every language. It can either be used by directly importing it as dependency in your go project or build as server with a http api package i build and used via api.

2. An architecture/framework which enables you to create completly self supervising software that basicly only persist of you defining a set of "abilities" which you do in form of go plugins. A plugin defines the parameter structure it needs to get executed. The architecture uses the in (1) described graph backend to store all the data. If data gets mapped into the storage the architecture will effective check if the new data can supply any of the registered plugins, and if so execute it. Since the architecture runs multithread by default all jobs will be parallelised into worker/runner threads. Results from those will be automaticly checked by a scheduler to see if it can supply new jobs with the new knowledge (and optional added already existing knowledge). Also data returned from a plugin will automaticly be mapped into the existing storage. Since all the runner do scheduling based on new data you have no central supervising but rather thread-distributed self supervising.

This way you basicly just define certain abilities as plugins and than insert the starting data. The architecture/framework will take on from there.

This was a strongly simplified explaination for what it does and i coded alot of different stuff in the last 15-20 years tho this project by far was the most complex in terms of dynamic data mapping / scheduling / etc. So complex in terms of logic rather then in size.

8chanAnon · on June 16, 2023

>This was a strongly simplified explaination for what it does

You missed the part about what the end result is supposed to be. What is the purpose? Can I use it to run a website?

voodooEntity · on June 17, 2023

Well you could say its for "data driven processing" and probably best suited for any kind of data processing especially data gathering/enrichment. What you could build with it is only limited by your imagination. Tho i will give a simple example (the one i gonne use for the example project i gonne provide).

A webcrawler. What does a webcrawler do? It expects a domain (data) and crawls for more data - analyzes it and enriches your collected data. You may end up with writing multiple plugins like.

- resolveIpFromDomain (takes Domain returns Domain->IP)

- detectWebserver (takes ip uses for example nmap to scan ports+banner) returns ip->port->software->(banner,state)

- detectVhost (takes ip->port->(software[webserver],state[open]) || domain->ip->(software[webserver],state[open]) and returns ip->port->(software[webserver])->[]vhost[]->page[/] ) -> loadPage (takes page loads it with curl and return page->content)

- extractLinks(takes page->content return page->content->[]link)

- loadLink (takes vhost->page->link returns vhost->[]page )

- extractMedia (takes page->content return page->content->[]media)

- analyzeMedia (takes page->content return page->content->media->[]attribute)

..... So what you do is you provide a domain, which will trigger resolveIpFromDomain. This will map the data back to the datahive and based on the Ip in new data trigger detectWebserver. This will return found webservers which triggers the requirement of detectVhost. At this point you probably see how its going.

Due to how the architecture works it will always maximum parallelize the work, it will always map the data into one big structure without you having to care about it, it will only execute things that are necesary/usefull.

So the more your software should branch/parallelize the more gain you get.

Tho as i mentioned in my original post ill release the first alpha so there is still things that can be extended and improved. And right now im spending time in writing the docs which will probably take me some more weeks in orders to make them good enaugh for people to understand how to use it by themself.

I mostly will release it because i think its a great showcase of how you can do optimized data driven processing while havin an architecture that cares about the most painfull things like data mapping / parallelization / etc. I dont expect it to be the next "big thing" or even beeing used by alot of people, but if it inspires people or someone maybe write a even better version based on the idea i would be happy already .)

So to come back to your original question - can it host a website? Probably - but not really meant to do it and a nginx would serve u better.

austin-cheney · on June 17, 2023

The application at my prior employer was the highest risk of failure I had seen in a commercial application.

Something like 80+% of the logic was in SQL stored procedures across different tenants. Nobody wanted to apply static analysis or test automation against it so everything came down to memorization of the code base and the resulting implementations to user interactions in the corresponding web based application.

My experience with commercial software versus open source software has repeatedly proven the complexity of business requirements is completely unrelated to the challenges of scaling and maintaining a software product. It always comes down an appreciation of automation, organizational skills, and writing. I often see people try to cheat this with tools and abstractions only to later drown in debt and blame everything else.

petabytes · on June 17, 2023

My Fujifilm firmware reverse engineering project has been incredibly difficult so far. The codebase isn't that complicated, but reverse engineering the most cryptic and obscure embedded RTOS I've ever seen is next to impossible.

32gbsd · on June 18, 2023

I am rooting for you!

NBJack · on June 17, 2023

A tower defense game whose core concept was from a chance reading of a Game Programming Gems book in the university library. Large numbers of bad guys in screen with solid frame rates, home built engine, and an AI that always attempts to thwart you by trying new paths into your base to defeat you (but also runs in a separated thread that pathfinds without hurting your framerate on a dual core or better system). Did I mention I started it on the old XNA library from Microsoft back when they offered the $99 license to make games?

Hasn't made a cent yet, but it has been a LOT of fun to build from scratch and tinker with over the years.

rebataur · on June 17, 2023

We built a datascience tool to quickly build data apps which can be extended from the frontend, including data wrangling and datascience functions.

The hardest but most exciting part was using the PostgreSQL and SQL to process, clean and enhance data and write extensions and bring it all together.

We opensourced alpha version yesterday, more documentation to come.

https://github.com/rebataur/rapidiam

bob1029 · on June 16, 2023

The manufacturing operating system for Samsung. Most notably the UI that was used to manage real-time operations, but you had to understand the entire jungle to do your job.

kingnothing · on June 16, 2023

A distributed event-based transaction processing system that handled around 400k requests per second peak traffic with many petabytes of data stored in a purpose-built DB.

hurrrr · on June 16, 2023

cool. can you share more? what was the most difficult challenge? is something open source?

32gbsd · on June 16, 2023

A videogame that runs at 60fps with a custom engine in c. And it didn't even use real physics. Only hacks and approximations but it was lovely.

8chanAnon · on June 16, 2023

>hacks and approximations Otherwise known as "heuristics". Otherwise known as "making a damn good guess". You don't always need precise math to get something done. I once wrote a program to estimate the average volume level in a WAV file. It was just a process of taking samples at different points in the file and then making a good guess. I compared with another program which used complex math to estimate the average volume. Neither approach achieved 100% accuracy. Maybe 80% overall.

32gbsd · on June 18, 2023

I totally agree. When I had first looked at the math I could not for the life of me figure it out. But eventually - through lots of testing - came to a comfortable approximation of what I needed to achieve and I was fairly happy with it.

bediger4000 · on June 16, 2023

Credit card processing system for a baby bell that merged with another corporation. Somehow the combined system had everything of both old systems, plus a data race condition that could not be removed, lest the VRU/IVR have troubles.

fpdavis · on June 17, 2023

I have worked on a lot of complex systems but my favorite project was a Memcached compatible server for Windows written from the ground up.

asfarley · on June 17, 2023

-multiple integrated compilers for vital-logic systems, plus a runtime/simulation engine

-video-based traffic counting

throwaway21321 · on June 17, 2023

vision system of the apple vision pro. Cant elaborate

bradknowles · on June 18, 2023

There's three projects I've worked on that I've been particularly proud of.

First was the Airfields Database. The year was 1989. I was just hired with a fresh BSCS degree, and working at what was then the Defense Communications Agency, in the basement of the Pentagon. My full TS/SCI clearance investigation was still ongoing, so I couldn't do any of that work. I only had an interim TS clearance. So that meant they had to find work for me to do in the several months I still had to wait for the clearance investigation to be finished. They decided that the Airfields Database was in the most need of help, and so that's where I went.

The Airfields database, I was told, had a record of every single air base, air port, air field, road, or strip of grass where an aircraft had been known to take off or land there, even if it was just for an emergency. It had location information, runway length, runway direction, runway width, weight limitations, apron parking space, and almost a hundred other data elements that it tracked for each location. The software ran on the classified WWMCCS military mainframe systems, and was written in COBOL 66. It was our job to bring it "up to date" with COBOL 77.

There were two main code paths through the system. One where the operator selected the different fields that were relevant and what the constraints were, and one where all fields were considered relevant and you had to provide the constraints. I had the bright idea that we should eliminate the longer code path and only have one set of routines we had to modify, as we could simulate that longer path by pretending that the user had selected all fields in the other code path.

Well, it worked. We chopped out about 40% of the code from the system. During the process, I also took the only subroutine that was actually classified (it did great circle distance calculations between any two points), and I externalized that so it could be kept in a separate file. That meant unless you were actually working on the great circle calculations, that code printout could be kept locked in the safe, while the now unclassified code that composed most of the actual Airfields Database could now be left out on the desk of the programmers without having to lock it up every night.

The two full-time developers working on the Airfields Database were very happy at this outcome. Only thing is, when run on the Support system where we did our development, the printouts it created came out perfect. But when run on the production system, where all the real reports were ran, it generated an extra page break between each page of output. And if you modified the code to run correctly on the production system, it didn't work right on the Support system. We never did figure out that problem, at least not before I got reassigned to a different branch a couple of years later.

The second big system that I'm happy to say I worked on was the OpenStack cluster that AT&T used. I was part of a small team, and the work we were doing was inside of Chef for deploying their code to OpenStack, but there was a lot of duplication between the various modules, and lots of duplicated modules. We ended up cutting their code base to one third the original size, and with our all of our self tests we built in for the CI/CD system, we made sure that all the code still worked and ran exactly as the original code did. The weird thing was that we never got any feedback from them as to whether or not they liked any of our changes, or if we had actually broken anything without realizing it. That was a total one-way street.

The third big system I'm happy to say I worked on was the GPS OCX Next Generation Ground Control System for Raytheon. This had to control all of the GPS satellites currently in orbit at the time, plus all of the next-generation satellites that Lockheed was building. Again, this was Chef code, and we worked closely with the Raytheon developers to teach them everything we could about how to write Chef code, so that they could take what we taught them and apply that to all of their other code going forward.

After six months, we did a test that simulated bringing up the entire datacenter (we were just spinning up VMs instead of running on real hardware), starting from everything being racked and stacked, and the all of our code took all of their core infrastructure code (also including all the OSes and databases, etc...) and deployed it and configured it for operation. They started off with that process taking three months to do manually before we got there, and our first full test run was successful and completed in three hours. We were ecstatic.

Today, I work at AWS. Even just the tiny subsystem that I work on is bigger than anything else I've ever worked on before. It's a really challenging job, but also very rewarding. I honestly feel like I'm working with the best team of people I've ever worked with before.

defrost · on June 18, 2023

> I also took the only subroutine that was actually classified (it did great circle distance calculations between any two points)

Circa 1984 (ish) I was in Australia and we didn't go looking for such wizardry .. it was easy enough to derive Great "Circle" paths about an oblate spheroid from first principals and, if required, weight the problem by wind strength, etc.

I was coding, at the time, for airbourne geophysical surveys at continent scale, so we were also pulling in IGRF models, normalising for diurnal magnetic flux, calibrating radiometric systems for air colum mass under craft | cosmic breakdown gammas from above, and all that jazz.

Was there anymore to your classified great circle code snippet than the correct geometric answer for a "not an actual sphere" .. or was the secret sauce just the exact series terms chosen and the order of computation in order to minimise error and reproduce consistent answers?

( I'm not asking for the revelation of the actual classified details, just curious about shape of the classified hole from the outside )

joshstrange · on June 17, 2023

It won't hold a candle to some of these examples but for myself it's the POS software I currently work on.

What's that you say? How hard could it be? I can spin up a CRUD app where you can create items and add them to menus in an afternoon, and yes, you probably can do that but...

Are you going to make it so you can share options between items (Drink Size, pizza toppings, coffee flavor shots, etc)? Or are you going to manage those all independently on each item? That would be a pain when you go to add or remove 1 flavor from your 6 drinks that can have flavor shots added. Also think about reporting and/or stock tracking features. So instead you have "options" be another top-level entity that you can attach to items, problem solved!

But wait! Is "pepperoni" really $0.25 on all sizes? Or is it $0.25 on the small, $0.50 on the medium, etc? Ok so we need to have some way to adjust the price of the choices you select (pepperoni, sausage, etc) based on another option's selected choice (small/medium/large). Oh, and because you use different amounts of the toppings you are going to want to adjust the stock/recipe as well based on the selected choice. Let's also imagine the data structure for "gluten free is only available on the medium" and "stuffed crust is only available on the large", so we also need a way to disable other option's choices based on a previously selected choice. How about quantities on choices? 2x pepperoni? 2x espresso shots? What about 1/2 caramel shot? Are some options exclusive? Are some required? Do some have a max choice selection count? Do some have a minimum? Do some choices cost extra to add? Do you get 2 free selections then you pay after that?

Ok, ok, a little complex but we can handle this. What about discounts/coupons/etc? You'll need a way to apply a discount to an item or and order, ability to limit how many discounts can be used per item/order (different restaurants have different needs), ability to discount a flat amount or a %, limit which items a discount can be applied to ("$2 of any pizza"), limit which option-choices a discount can be applied to ("$2 off any medium pizza").

Ok, remember that menu we created? Do we really only have 1 menu? Or do we have a POS menu, an online menu, and maybe a doordash/Uber eats menu? Oh, and have fun converting your internal menu structure/concepts to the various food app's menu specifications. Do you want each location of your restaurant to have its own independent menu or do you want 1 top-level menu? Can you hide/remove items from your global menu on a per-store basis? Can you add items on a per-store basis (local specialty)? Do all your stores need to have a solid internet connection or function? (Spoiler, this is not a guarantee). Are you going to have local hardware/software so the store can function "offline"? Are you syncing menus down to these stores? Can your local hardware run the full stack without the cloud?

Are you accommodating for different types of restaurants? Drive thru? Delivery? In-store? Curbside? Table service? How about "little" things like "we want 2x Large coffee" or "we don't want quantities on items, they should all be separate"? Oh and those discounts? Might you want to also limit them based on the service type? This discount is for in-store only, this one is delivery only?

Do you have a loyalty program? Is it "points" or "stamps" based? Are there signup rewards? Are there visits rewards? How is your drip (or otherwise) marketing work? Are you making sure to only target people who aren't coming in every day/week/month already?

This and 1000 more things I never considered when thinking about POS software.