You're actually getting at something deeper which I've been noticing more and more lately. Many people seem really reluctant to learn how to use things. It only seems to happen with software. They see a program or a new language and seem to think "well, I'm already a coder, I should be able to use this". When it's even slightly different to what they expect they complain that it's "hard".
It's really strange to me. Imagine if people did that outside of software. I can already walk so I must be able to ride a bike. I can speak English, so I must be able to speak French. The same goes for software. I spent years trying to understand infix notation and learning the precedence rules. Why would it not take time and effort to learn prefix notation?
Nope, it happens with EVERYTHING. If it doesn't work like something they've used before, they have to really, really need it to use (as in, be forced to use it), otherwise, into the trash it goes.
Git is not hard. It's very simple. But people learn it the wrong way. You have to learn it from the DAG up. If you cannot grasp how the DAG works you'll forever be reading and writing articles like this one which do not help you to learn.
This is a horrible article. You should not bookmark it or use it. If you're not a programmer, you shouldn't use git. If you are a programmer, do yourself a favour and spend a day going through something like this: https://wyag.thb.lt/
It will make you better at git and better at programming. Git is a powerful tool and you need to learn how to use it. Imagine if people read articles like this one instead of learning how to drive.
Don't gatekeep. Git isn't just for programmers.. it's for people that are learning.. people using it in non programming capacities and tons more. Telling people to "git gud" is not helpful. Not everyone knows, or indeed cares to know what a Directed Acyclic Graph is, and sites like this help people's anxiety who are just learning, or who have already screwed up and just want a solution.
You can use Git for versioning all kinds of assets...3D models, fonts, textures, music. I know someone who stored his book on Github and took pull requests from editors.
Yes but those are hacks, no one should immediately reach for git as a tool to version EVERYTHING just because you can. If you treat git as a convenient hammer for your screw, don't be surprised when the screw breaks at an inopportune time
Are you imagining every use of git is either (1) a Computer Scientist writing Code, or (2) a hack? You can't imagine anything in the spectrum in between? LaTeX papers by academics in various fields, scientific coders (MATLAB etc.), people writing stuff in Markdown, students who are still learning even CS, etc. are all doing the wrong thing by using git?
First of all I read it as using git in those ways are hacks (in the computer sense), not that the people using git in those ways are hacks (in the bad at your job sense)
Secondly I think his point is (and I kind of agree) that while you certainly can use git to version you documents, 3D-models, and InDesign layouts, it's not necessarily the best tool for the job. Sure if you're already well versed in git go ahead and use it and if you're collaborating with people using git you're probably going to have to learn it, but at least realize that using git for not-code is probably not the best tool for the job especially if you're at the same time trying to learn git from zero.
> First of all I read it as using git in those ways are hacks (in the computer sense), not that the people using git in those ways are hacks (in the bad at your job sense)
Yes that's why I said "every use of git [...] (2) is a hack".
> Secondly I think his point is (and I kind of agree) that while you certainly can use git to version you documents, 3D-models, and InDesign layouts, it's not necessarily the best tool for the job. Sure if you're already well versed in git go ahead and use it and if you're collaborating with people using git you're probably going to have to learn it, but at least realize that using git for not-code is probably not the best tool for the job especially if you're at the same time trying to learn git from zero.
a) they can grow quite large and the diffs do not compress well so downloading the entire history is quite expensive. SVN supports only downloading part of the tree and history which is useful
b) SVN supports file locking which can help prevent conflicts between editing the same file which is important because of the next point:
c) These files generally do not have diff and merge tools so branching is generally not useful and so are most of git's advantages over SVN.
That said, I am now generally using git for such files because a) I use git for code anyway and b) gitlab (especially which CI is still useful).
Depends on the job. If you're making a game in Unreal, perhaps take a look at Perforce and Perforce integration that Unreal offers. Doing post work on a movie, consider something like Alien Brain. Doing some collaborative writing with a bunch of non-technical co-authors, then the tools that come with Google docs might be the best fit for you.
And if you're going to claim they should be using online-only tools, please explain why it's wrong for them to instead choose the tools that work locally...
so the people who maintain our site content using markdown and hugo, commit/push git, and trigger a CI build and deploy automatically MUST be developers?
answer: they are not, but they can handle basic git just fine. we aren’t some special class of super human: git is a tool, and you absolutely don’t need to know what a DAG is to use it
I agree that the DAG and Git storage model are at least not particularly complicated. The problem is that the Git user interface (the CLI, plus various concepts that are used in other interfaces as well, like refspecs, that are not fundamental to Git) is not very simple, and the correspondence between the DAG and mutations you may wish to perform, and Git commands, is often fairly obtuse and opaque.
I care about the internals of git about as much as I care about the internals of my filesystem.
It's probably helpful to know some basics, but do I need to know intimate details of my filesystem to use cp, mv, shell redirection? No. For most basic actions it Just Works™.
The problems in git are purely user-interface based. Other distributed systems have proven you can make a dcvs with a reasonably friendly UI.
That's because a file system is something you already understand, even if you've never actually used an old fashioned paper filing system. The software is providing you with something you understand.
Git is not providing you with something you understand. It is providing you with a DAG and you neither understand what that is or why you need it. The DAG is not the "internals of git". This is the big mistake. It is git. Everything about git is about building that DAG.
Git isn't hard but it's usually taught like absolute shit. From the get go you're told to use 4 commands, 2 of which are usually not explained and when they are it's often hand wavey. From there on its usually people pontificating about high level philosophy while failing to give concrete working examples. At least that was my experience.
I'm decent at got now for the work I do so I'm cool with it. It really is an awesome tool. But for some reason its just collectively taught like shit.
It's just one of those tools that is very foreign to new users, but once you know how to use it, you can't remember not knowing how to use it because it's so easy... this is like a lot of programming, actually.
I think you are right about the DAG. Once I understood what the high-level data structure of git is, many branch related commands immediately made sense and it was suddenly very easy to use. Many of my colleagues haven't taken the time to learn that and continuously struggle with basic commands.
Honestly the DAG isn't the whole story. The distributed nature also adds a twist that makes things entirely difficult, and the obtuseness of the commands is yet another layer of difficulty. People have every right to expect git commit to commit their changes to the remote repo. And it very well could, but it just doesn't happen to. Similarly you can't tell me with a straight face that notion of having to 'git add' a file you just deleted in order to be able to commit that change is somehow intuitive. I could go on and on but the point is knowing the DAG is hardly the end of the story.
I agree it's not the end of the story, but without that knowledge, learning git is almost impossible. Even with DAG knowledge, some commands are hard to remember.
Graph = graph, a structure composed of a set of objects (nodes or vertices) with links between them (edges).
Directed = the edges have an orientation / a direction.
Acyclic = there's no cycle, you can't come back to a node (in a directed graph you have to follow edge direction).
In Git, the commit objects are nodes, the "parent" link(s) is a directed edge, and because commit objects are immutable you can't create a commit which refers to one of its "descendants" and thus the graph is acyclic.
Unfortunately this article, like almost all others, is still wrong because it looks like commits get mutated when you rebase and the old commits disappear.
It is very important to understand that commits (in fact, all blobs) are immutable in git. You can only make new things. You can't modify old things. Git doesn't delete anything for a while either.
Directed acyclic graph. Basically each git commit points to zero or more parent commits (usually one, zero for root commits, more than one for merge commits) and that forms a DAG.
Speech itself is also covered by freedom of speech, does that mean you can shout into your neighbours property for hours a day? With amplification? From a recording?
I agree but I'd also like to see tax paid on size of the vehicle because it's literally using more of the public highway. So motorbikes would still pay less, probably.
The SUV users say it improves safety. OK, they can pay for it.
There shouldn't be separate endpoints that take an owner's ID. That's bad design. The owner endpoint should contain a list of invoices, ie. links to the invoice endpoint.
Maybe the data model is used for two use cases, one where you primarily access owner entities and one where you primarily access invoices and occasionally do something with their owner. Just because it does not fit some weird, hypothetical prototypical API it's not inherently "bad design". APIs aren't ER models, sure, they're supposed to make sense but they're also supposed to help their consumers perform a task.
And let's not kid ourselves, much of our world runs on APIs that would really deserve the "bad design" handwave, end of the day that's often an aesthetic question.
But the invoice endpoint is likely going to be something like "/customer/(id)/invoice/...". So what did we gain from getting it from the customer description first? (vs getting the customer id from the response and the link pattern from the docs)
I meant a whole tree of methods. Sure, you can get specific invoices from `/invoice/id`, but you probably still want `/customer/id/invoices` or similar for searching through them. You could use extra parameters for `/invoice/...`, but I think often it's nicer to namespace that. (use case dependent)
Do I misunderstand you or do you suggest that the customer object should include the list of all invoices? That does not scale. Imagine if it is something more common like transactions where a user can have made thousands or tens of thousands.
It can contain a link to the list of all invoices. So you could have /customer/<cid>/invoice which lists all invoices for the customer which are actually just links of the form /customer/<cid>/invoice/<iid>
This isn't too surprising. Microsoft's DevOps in Azure does the same thing (or did I haven't looked at it in a few months). There was literally no point in using it until, as you've pointed out, a user can leverage cache. If I have a multistage build with an SDK that weighs in around 1GB why would I ever want to use a tool that pulls that down every run?
I think, as many have said, that this is going after GitLab more than anyone else, although I can see a lot of users migrating away from Docker Hub given 1) the latest snafu/breach and 2) why keep my container repo over here and my container build pipeline over there? Doesn't make any sense and Docker Hub doesn't come with the pedigree of CDN baked in. I'm sure the same arguments work for other technologies in this consideration, but... Docker seems to continually be behind the 8-ball on the shifting field. My guess is Microsoft buys them in the next 3 years at a discount anyway. It fits their pattern of getting in front of the modern ecosystem and since Docker has leverage with containerd right now it would be an unsurprising move.
As usual it will take a disaster for people to realise it was a bad idea. Microsoft tried to destroy Linux in the past. Literally. Linux is what gave us git in the first place, and docker, and so much technology that we love today. Oh how quickly the past is forgotten when convenience is on the table.
What do you use your git history for? History is either worth keeping, in which case you should maintain it like any other artifact, or it's not, in which case you should squash down master to a single commit every time you merge.
But maybe you use your history for something else that I haven't considered.
While I like the idea of rearranging commits to convey a nice (but "not how it originally happened") development sequence, I think in practice this matters less than (say) good commit messages, or the difference between merging and rebasing.
(--fixup type commits aside).
Practical benefits from not squashing history:
- Can bisect to find bug introduction.
- Can annotate/praise/blame to find who/when some change was made.
- Adam Tornhill's "Code as a Crime Scene" argues that it'd be beneficial to
consume VCS history to provide health metrics on the codebase. (e.g. use VCS to check which sources have many contributors (thus potentially high defects), or check for "lost knowledge" from developers who have left).
- Can build/run an older version of the software.
But is there really a big advantage from putting time into maintaining a sequence of commits?
EDIT: Ah, I see another comment point out that "maintaining a nice history" tends to mean fixing very borked commits. That makes sense. :-)
All of these advantages don't make sense if half your commits are broken versions of the software. Rebasing helps ensure that each commit is valid. That's important for the reasons you mention. Having a log of what you actually did is not important.
History is the cleaned-up story we tell after the fact.
The fact that I had a bunch of stupid typos and broken tests that I didn't realize were broken before I committed doesn't need to be in the final history. What I really want for the preserved history is the conceptual chunks of changes I made along the way.
Is this really good for your team and the project?
If you have a safe work atmosphere, and yor teammates reviewing the work can discover pitfalls in your project's workflow, you as a team can have discussions about it can improve your test stuff. And you can maybe go back through the history and see how many times this kind of normal human mistake with other branches and developers.
You can still diff through the PR as a unit before merging, without getting bogged down in low level commits.
There’s a vast middle ground between those two extremes. Some history is worth keeping, and some is not. Noise commits are of the form “forgot a closing paren,” “comment/uncomment section while debugging,” “finally got it to compile,” “checkpoint,” “fix typo,” or “going home for the day.”
Code and by extension history should be easy for humans to read. For that reason, the signal is very much worth keeping and polishing, but the noise is not. Documentation of false starts, appealing but ultimately problematic design choices, and “why” information belong in comments, commit messages, or design documents — explicit rather than implicitly littered around the history.
I'm not talking about squashing the feature branch. I'm talking about squashing all of master down to one commit (initial commit). If you don't take care of your history, my question is why do you keep it at all?
What I really like to do is make actual fixup (or squash) commits during a code review and just push these to the branch normally. That way reviewers can easily keep up with the changes. Right at the end, the maintainer requests that the original developer does a rebase --autosquash before the branch is actually merged.
This works really well but I rarely see people talk about it. It means you don't have to use something complicated like github or gitlab to keep up with rebases. It works for everyone.
Indeed this is the whole reason I like fixups. Reviewers can actually keep up with the requested changes and know exactly which fixup-commit is related to which change.
It's really strange to me. Imagine if people did that outside of software. I can already walk so I must be able to ride a bike. I can speak English, so I must be able to speak French. The same goes for software. I spent years trying to understand infix notation and learning the precedence rules. Why would it not take time and effort to learn prefix notation?