ThomasMoll's comments

ThomasMoll · on Dec 15, 2023

We (when I worked at LinkedIn) did it with ETL clusters, we already had built them out for moving data between datacenters nightly. They would mirror an HDFS cluster, then ran batch jobs to transfer either directly to the outbound cluster or to another ETL cluster in another DC.

We used one of our ETL clusters to ship data to MSFT for various LinkedIn integrations, like seeing LinkedIn profile information in Outlook or Office products.

junto · on Dec 16, 2023

Which tools were you using for ETL? Or were they completely custom?

ThomasMoll · on Dec 15, 2023

Not only that but the Hadoop team literally had the guy who wrote the original HDFS whitepaper. Moving a service with that much in house expertise first never made sense. I worked on one of the original Azure PoCs for Hadoop, even before Blueshift and it was immediately clear that we operated at a scale that Azure couldn't handle at the time. Our biggest cluster had over 500PB and total we had over an exabyte as of 2021 [1]. It was exorbitantly expensive to run a similar setup on VMs, and at the scale that we had I think it would have taken over 4,000 - 5,000 separate Azure Data Lake namespaces to support one of our R&D clusters. I believe most of this "make the biggest cluster you can" mentality was a hold over from the Yahoo! days.

[1] https://engineering.linkedin.com/blog/2021/the-exabyte-club-...

ThomasMoll · on July 18, 2022

Absolutely! It's some a great place to put the work I'm most proud of!

Don't get me wrong, I do a lot of cool stuff at $JOB but there's a limit on how much info I can discuss about various internal systems. So just doing a greenfield project that's a 180 from my current profession (in my case doing biological modelling for art purposes) is a great joy.

I've gotten a lot of recruiters reaching out because they saw an article or repost on HN or lobste.rs

ThomasMoll · on July 21, 2021

I think the advantage here is that instead of centralizing the relationships between all types in one place, these relationships can be defined anywhere, including external packages. This makes composition of additional types extremely easy, even when multiple people are working on the project.

ThomasMoll · on July 21, 2021

Groovy supports multi-dispatch, if you needed to encapsulate a generic type, the compile would dynamically resolve that type correctly. See my example in Java (which has method overloading, but not MD)

ThomasMoll · on July 21, 2021

Julia is fairly fast, since its type system _only_ does dynamic/runtime typing, the JIT is optimized towards that. You'll experience some minor startup lag, typically due to initial JIT'ing of any new used functions. However, this has largely be remedied with a compiler backend that completely precomputes this behavior. https://julialang.github.io/PackageCompiler.jl/dev/

ThomasMoll · on July 21, 2021

Hey, author here.

1) Including the "Onyx holding a hard stone executing Earthquake in a Sandstorm against a Flygon with the ability Hover?" state is just doing an N-constraint solver. Since multiple dispatch is a generalized system, we can dispatch on N different types.

Take the "Dual Type" in Pokémon for example: https://pokemondb.net/type/dual. You'll notice that instead of just an NxN grid, we're dealing with an NxNxN. Where a single attack needs to be related against 2 different defenses.

Simple enough `eff(atk::T1, def1::T2, def2::T3) = ...`, the we can just encapsulate this second type within a `Pokémon` structure and route to the correct function dynamically.

2) The "Super Effectiveness" of a MD system is that you don't _need_ to put everything into a singular table, something that's functionally impossible to extend. The idea is that we can build up the correct relationships between types completely independent of one another. The issue is, who owns that table? How do you merge more than one new type in? (see my section about Composition in the post)

If someone else wants to make a new `Foo` type Pokémon, and another person is doing `Baz`, they can work completely separately, only defining the `eff` functions _only_ concerning their type. And there's _zero_ integration work to use both, just import the new types and their functions. This is incredibly extensible!

thaumasiotes · on July 21, 2021

> Take the "Dual Type" in Pokémon for example: https://pokemondb.net/type/dual. You'll notice that instead of just an NxN grid, we're dealing with an NxNxN. Where a single attack needs to be related against 2 different defenses.

> Simple enough `eff(atk::T1, def1::T2, def2::T3) = ...`, the we can just encapsulate this second type within a `Pokémon` structure and route to the correct function dynamically.

This doesn't look like a good approach to me. One thing that bothers me is that it draws a distinction between def1 and def2 that doesn't, in reality, exist. You should not be handling the cases of "fire attack deals damage to grass/ice" and "fire attack deals damage to ice/grass" separately, because those are not separate cases. No type has a different effect when listed first than it does when listed second. No pair of types has any effect other than the independent effects of each type considered individually.

The same issue reoccurs at a higher level: fundamentally, you aren't dealing with an NxNxN grid. You're free to represent the data that way, but it's redundant -- the NxNxN grid contains no information that isn't already present in the NxN grid. You could reapply the same logic and produce an NxNxNxN grid detailing what would happen if a single-typed attack hit a triple-typed defender, or if a dual-typed attack hit a dual-typed defender, but... why would you do that?

snovv_crash · on July 21, 2021

So, it's an Nx(NxN/2) half grid. This is easily solved on the implementation side by making sure, for example, that the enum values for the second 2 arguments are always in ascending order.

thaumasiotes · on July 21, 2021

> So, it's an Nx(NxN/2) half grid.

No, it's an NxN grid. Look at the second half of my comment.

> This is easily solved on the implementation side by making sure, for example, that the enum values for the second 2 arguments are always in ascending order.

So that when somebody invokes your function and passes the defender's types in the order listed for the Pokemon rather than sorting them beforehand, you crash?

snovv_crash · on July 21, 2021

Well the real issue is that we're using N instead of A and D. It is A X (D X D / 2).

And why the sudden helplessness? Just sort the 2 arguments before passing them to the internal Impl.

thaumasiotes · on July 21, 2021

> Well the real issue is that we're using N instead of A and D. It is A X (D X D / 2).

No, it isn't. It's AxD, where A and D are always equal. There is no reason to add another dimension to the result table when the defense or offense might pick up another type. The expanded table will never contain any more information than the two-dimensional table already does.

(Dividing by 2 isn't correct either, even from your perspective; you're forgetting about the table's diagonal. In the "space is no object" approach you're advocating, the diagonals need to be filled by special-casing, since they represent a phenomenon that doesn't exist (a Pokemon which bears multiple instances of the same type) and obscure a phenomenon that does exist (a Pokemon which bears fewer types than the maximum possible number).)

superdimwit · on July 21, 2021

You can just define a generic fallback method like `eff(p, d1, d2) = eff(p, d2, d1)`

ThomasMoll · on July 21, 2021

Yes! In fact I think Julia borrows a lot of concepts from CL and the like for their type system implementation.

snicker7 · on July 21, 2021

Julia is basically a trojan horse for Lisp. Syntactically, it looks kind-of like Matlab. But semantically it is very much in the Lisp family. Since 1959, Lisp remains the best idea in computer programming. And Julia is bringing it to the masses.

pjmlp · on July 21, 2021

Yeah, although Java did it first.

"We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp." - Guy Steele

I assume Guy Steele knows his stuff when talking about Lisp like languages.

_19qg · on July 21, 2021

Java mostly only got typical runtime ideas (managed memory, virtual machine, code loading, calling conventions, runtime safety) of the JVM from Lisp (often via Smalltalk, etc.), but not ideas like executable memory heaps, code as data, etc.

Higher language feature from Lisp (CLOS, macros, conditions, closures, interactive development, ...) were not brought to mainstream Java. Closures, some interactivity, ... eventually were added many years later.

From a Lisp user perspective Java was more than 'halfway' away, and probably still is.

pjmlp · on July 21, 2021

Guy Steele did not say 100%, after all.

So we could argue bullet points about how someone highly relevant in Lisp and Scheme community was wrong on his assertion, if you like.

_19qg · on July 21, 2021

There are lots of highly relevant people in the Lisp and Scheme communities.

https://people.csail.mit.edu/gregs/ll1-discuss-archive-html/...

pjmlp · on July 22, 2021

Sure there are, yet none of them were responsible for designing parts of Java architecture, and made the statement we are discussing about.

Had not been for Guy Steele's background, and the context of the talk where he made that statement, I would agree with you.

_19qg · on July 22, 2021

There were many (ex-)Lispers working on comparable languages: inkl. Java, C#, Dylan, ... The discussion took place on the Little-languages list, where also a bunch of people with actual Lisp experience and general language design&implemetation experience were participating. I'm pretty sure many of them had a good idea where Java was technically positioned in the language landscape between C, C++, Ada, ... ML, Scheme, Lisp, Prolog, Smalltalk, Self, Perl, TCL, ...

Also keep in mind that SUN at that time was aggressively marketing Java as THE new language for system and application development, especially for the enterprise (a main target market for SUN). Though the origins of Java was as a programming language for set-top boxes, when it was still called Oak.

The quote from Guy was kind of an excuse there, for the modest design goals: at least we (-> SUN) dragged C++ developers towards Lisp, even though Smalltalk, Lisp, etc. people themselves were not a target and were not that impressed. Things like Garbage Collection in a language designed to replace C++ in many scenarios was still revolutionary.

ThomasMoll · on July 21, 2021

Depending on the language target and method of execution dynamic resolution of runtime types can be quite hard. IIRC Java's JVM is super optimized for static codepaths, whereas Julia uses a JIT system that is "slow" on first run but speeds up considerably after all methods in a hot code path are resolved.

ThomasMoll · on July 21, 2021

I'll admit, I'm a bit disappointed that my original title "Julia used Multiple Dispatch! It's Super Effective!" didn't make the HN frontpage.