genuinelydang's comments

genuinelydang · on Nov 3, 2024

What? If you don’t have external dependencies, just remove your bundler/transpiler and rely on browsers to import your code.

genuinelydang · on Nov 1, 2024

No. That’s like saying you can transplant a person’s neuronal action potentials into another person’s brain and have it make sense to them.

helloplanets · on Nov 1, 2024

That metaphor is skipping the most important part in between! You wouldn't be transplanting anything directly, you'd have a separate step in between, which would attempt to translate these action potentials.

The point of the translating model in between would be that it would re weight each and every one of the values of the embedding, after being trained on a massive dataset of original text -> vector embedding for model A + vector embedding for model B. If you have billions of parameters trained to do this translation between just two specific models to start with, wouldn't this be in the realm of possible?

quantadev · on Nov 1, 2024

A translation between models doesn't seem possible because there are actually no "common dimensions" at all between models. That is, each dimension has a completely different semantic meaning, in different models, but also it's the combination of dimension values that begin to impart real "meaning".

For example, the number of different unit vector combinations in a 1500 dimensional space is like the number of different ways of "ordering" the components, which is 5^4114 .

EDIT: And the point of that factorial is that even if the dimensions were "identical" across two different LLMs but merely "scrambled" (in ordering) there would be that large number to contend with to "unscramble".

tempusalaria · on Nov 1, 2024

This is very similar to how LLMs are taught to understand images in llava style models (the image embeddings are encoded into the existing language token stream)

genuinelydang · on Nov 1, 2024

”you could almost build a new kind of Job Search Service that matches job descriptions to job candidates”

The key word being ”almost”. Yes, you can get similarity matches between job requirements and candidate resumes, but those matches are not useful for the task of finding an optimal candidate for a job.

For example, say a job requires A and B.

Candidate 1 is a junior who has done some work with A, B and C.

Candidate 2 is a senior and knows A, B, C, D, E and F by heart. All are relevant to the job and would make 2 the optimal candidate, even though C–F are not explicitly stated in the job requirements.

Candidate 1 would seem a much better candidate than 2, because 1’s embedding vector is closer to the job embedding vector.

coldtea · on Nov 1, 2024

Even that is just static information.

We don't know if Candidate 2 really "knows A, B, C, D, E and F by heart", just that they claim to. They could be adding whatever to their skill list just, even though they hardly used it, just because it' a buzzword.

So Candidate 1 could still blow them out of the water in performance, and even be able to trivially learn D, and E in a short while on the job if needed.

The skill vector wont tell much by itself, and even prevent finding the better candidate if its used for screening.

inkyoto · on Nov 2, 2024

> We don't know if Candidate 2 really "knows A, B, C, D, E and F by heart", just that they claim to. They could be adding whatever to their skill list just, even though they hardly used it, just because it' a buzzword.

That is indeed a problem. I have been thinking about a possible solution to the very same problem for a while.

The fact: people lie on their resumes, and they do it for different reasons. There are white lies (e.g. pumps something up because they aspire to something but were not presented with an opportunity to do it, yet they are eager to skill themselves up, learn and do it, if given an opportunity). Then there are other lies. Generally speaking, lies are never black or white, true or false; they are a shade of grey.

So the best idea I have been able to come up with so far is a hybrid solution that entails the text embeddings (the skills similarity match and search) coupled with the sentiment analysis (to score the sincerity of the information stated on a resume) to gain an extra insight into the candidate's intentions. Granted, the sentiment analysis is an ethically murky area…

quantadev · on Nov 3, 2024

Sincerity score on a resume? I can't tell if you're joking or not. I mean yeah, any sentence that ends in something like "...yeah, that's the ticket." would be detectable for sure, but I'm not sure everyone is as bad a liar as Jon Lovitz.

inkyoto · on Nov 3, 2024

Are you speaking hypothetically or from your own experience? The sentiment analysis is a thing, and it mostly works – I have tested it with satisfactory results on sample datasets. It is relatively easy to extract the emotional context from a corpus of text, less so when it comes to resumes due to their inherently more condensed content. Which is precisely why I mentioned ethical considerations in my previous response. With the extra effort and fine tuning, it should be possible to overcome most of the false negatives though.

quantadev · on Nov 3, 2024

Sure AI can detect emotional tones (being positive, being negative, even sarcasm sometimes) in writing, so if you mean something like detecting negativity in a resume so it can be thrown immediately in the trash, then I agree that can work. Any negative emotionality is always a red-flag.

But insofar as detecting lies in sentences, that simply cannot be done, because even if it ever did work the failure rate would still be 99%, so you're better off flipping a coin.

quantadev · on Nov 1, 2024

So your point is that LLMs can't tell when job candidates are lying on their resume? Well that's true, but neither can humans. lol.

inkyoto · on Nov 2, 2024

> The key word being ”almost”. Yes, you can get similarity matches between job requirements and candidate resumes, but those matches are not useful for the task of finding an optimal candidate for a job.

Text embeddings are not about matching, they are about extracting the semantic topics and the semantic context. Matching comes next, if required.

If a LLM is used to generate the text embeddings, it would «expand» the semantic context for each keyword. E.g. «GenAI» would make the LLM expand the term into directly and loosely related semantic topics, say, «LLM», «NLP» (with a lesser relevance though), «artificial intelligence», «statistics» (more distant) and so forth. The generated embeddings will result in a much richer semantic context that will allow for straightforward similarity search as well as for exploratory radial search with ease. It also works well across languages, provided the LLM had a linguistically and sufficiently diverse corpus it was trained on.

Fun fact: I have recently delivered a LLM assisted (to generate text embeddings) k-NN similarity search for a client of mine. For the hell of it, we searched for «the meaning of life» in Cantonese, English, Korean, Russian and Vietnamese.

It pulled up the same top search result across the entire dataset for the query in English, Korean and Russian. Effectively, it turned into a Babelfish of search.

Cantonese and Vietnamese versions diverged and were less relevant as the LLM did not have a substantial corpus in either language. This can be easily fixed in the future, once a new LLM version that will have been trained on a better corpus in both, Cantonese and Vietnamese, languages – by regenerating the text embeddings on the dataset. The implementation won't have to change.

OutOfHere · on Nov 1, 2024

The trick is evaluate the score for each skill, also weighing it by the years of experience with the skill, then sum the evaluations. This will address your problem 100%.

Also, what a candidate claims as a skill is totally irrelevant and can be a lie. It is the work experience that matters, and skills can be extracted from it.

nostrebored · on Nov 1, 2024

That's not accurate. You can explicitly bake in these types of search behaviors with model training.

People do this in ecommerce with the concept of user embeddings and product embeddings, where the result of personalized recommendations is just a user embedding search.

quantadev · on Nov 1, 2024

> not useful for the task of finding an optimal candidate

That statement is just flat out incorrect on it's face, however it did make me think of something I hadn't though of before, which is this:

Embedding vectors can be made to have a "scale" (multiplier) on specific terms which represent the amount of "weight" to add to that term. For example if I have 10 years experience in Java Web Development, then we can take the actual components of that vector embedding (i.e. for string "Java Web Development") and multiply them by some proportionality of 10, and that results in a vector that is "Further" into that direction. This represents an "amount" of directional into the Java Web direction.

So this means even with vector embeddings we can scale out to specific amounts of experience. Now here's the cool part. You can then take all THOSE scaled vectors (one for each individual job candidate skill) and average them to get a single point in space which CAN be compared as a single scalar distance from what the Job Requirements specify.

genuinelydang · on Nov 2, 2024

Then you would have to renormalize the vectors. You really really want to keep the range -1..1 because that is a special case where cosine similarity equals dot product equals Euclidean distance.

quantadev · on Nov 2, 2024

I meant the normalized hyperspace direction (unit vector) represents a particular "skill" and the distance into that direction (extending outside the unit hypersphere) is years of experience.

This is geometrically "meaningful", semantically. It would apply to not just a time vector (experience) but in other contexts it could mean other things. Like for example, money invested into a particular sector (Hedge fund apps).

This makes me realize we could design a new type of Perceptron (MLP) where specific scalars for particular things (money, time, etc.) could be wired into the actual NN architecture, in such a way that a specific input "neuron" would be fed a scalar for time, and a different neuron a scalar for money, etc. You'd have to "prefilter" each training input to generate the individual scalars, but then input them into the same "neuron" every time during training. This would have to improve overall "Intelligence" by a big amount.

genuinelydang · on Oct 31, 2024

This discussion is being monitored and steered by nameless PR companies doing paid damage control for multinational companies that rhyme with the expressions ”YouDont” and ”Lemurs”.

Of course this is the perfect place to do it: if you can convince HN, HN will convince the rest of the world.

genuinelydang · on Oct 31, 2024

Joke’s on you, the same chemical conglomerates manufacture Teflon _and_ the plastic used in plastic cookware!

dylan604 · on Oct 31, 2024

And who do you think owns Spatula City?

genuinelydang · on Oct 31, 2024

Overheat your Teflon pan, take a good huff, wait a few hours and post that comment again.

https://en.wikipedia.org/wiki/Polymer_fume_fever

”When PTFE is heated above 450 °C the pyrolysis products are different and inhalation may cause acute lung injury. Symptoms are flu-like (chills, headaches and fevers) with chest tightness and mild cough. Onset occurs about 4 to 8 hours after exposure to the pyrolysis products of PTFE.”

There is basically no safe limit for these chemicals — EPA limit for drinking water is 4 ppt. U.S. residents already have average blood PFAS levels to the tune of 4000 ppt.

timr · on Oct 31, 2024

> wait a few hours

Indeed. When was the last time you left your nonstick pan sitting on a cooktop with nothing in it, for hours?

If you're the kind of person to leave empty pans burning for that long, I'd be more worried about cognitive decline and/or the risk you'll die in a fire of your own making.

kelipso · on Oct 31, 2024

You only have to huff it for a few seconds, and then turn off the heat. The symptoms are what shows up hours later.

timr · on Nov 1, 2024

how much of "it" is there? what is the concentration? dose makes the poison, not time.

totallydang · on Nov 4, 2024

These so-called perfluorochemicals are toxic to humans at single-digit parts per trillion.

If you live in the US, chances are your blood already contains these chemicals at 4,000 ppt or greater (four thousand parts per trillion is the nationwide average).

timr · on Nov 4, 2024

> These so-called perfluorochemicals are toxic to humans at single-digit parts per trillion.

No, they aren't. At least, not in the way you're interpreting the word "toxic".

> If you live in the US, chances are your blood already contains these chemicals at 4,000 ppt or greater

The fact that you're telling me that I'm currently thriving with 1000x the "toxic" dose you just quoted should tell you that at least one of the statements is exaggerated.

Again, there are people out there who will tell you that any exposure to certain chemicals is "toxic". These people are not worth listening to.

totallydang · on Nov 4, 2024

You are doing damage control for multinational chemical corporations. Why would you be worth listening to?

nullstyle · on Nov 16, 2024

That isn't a response to your own crappy post, its a misdirection.

teucris · on Oct 31, 2024

Mistakes happen all the time. Cooktops have terrible user interfaces. People need to juggle multiple things at once, especially parents.

Furthermore, the quote above merely states that the pan has to reach a specific temp, not be out for hours.

whatshisface · on Oct 31, 2024

Pretty much any heating element setting will take a pan to 450 degrees. Do people do it? - I doubt the parent commenter is lying about their bird.

beepbooptheory · on Oct 31, 2024

Just keep in mind gp is talking celcius. A good sear on a steak will happen around 200-250°c

timr · on Oct 31, 2024

Yeah, see...I just deleted the 450 degree part (right before I saw your response), because somehow I knew someone would pick at it.

The temperature is the least relevant part of what I wrote.

whatshisface · on Oct 31, 2024

A pan left on the stove will turn red, and it is an accident that happens with some regularity. This issue is a lot like ground fault protectors: a rare accident that could be avoided by never interacting with a product in a certain way nonetheless occurs, and can only be eliminated through technical means. Just imagine that you're at your parent's house, and you look over at a glowing pan. Oops, you have a headache...

seadan83 · on Nov 1, 2024

No, the onset of symptoms is several hours after exposure. There is no magic time per se of heating. Just get the pan hot enough.

genuinelydang · on Oct 31, 2024

That level of intentional misunderstanding just confirms my suspicion of damage control efforts at play in this thread.

justjash · on Oct 31, 2024

Who is really heating teflon pans to 850 F on the stovetop?

tolciho · on Oct 31, 2024

Old folks, with a fine case of white matter decline. Distracted folks, because the baby just threw up. Sick folks, whose processing power is a bit covided. Young folks doing stupid things, possibly on a video dare.

TheRealPomax · on Oct 31, 2024

To quote Chris Rock: if you're old and you die in an accident, you died of old age, not "that specific accident". If your mind's going, and that makes you do something that'll kill you, your mind going is what killed you.

account42 · on Nov 4, 2024

Sure we could make excuses for unsafe products.

Or we could recognize that lapses of judgement are something every human is capable of and demand such outrageous things that our cookware remains safe at any temperature a reasonable stovetop can produce. Really it shouldn't just be safe but also not break the cookware.

mossTechnician · on Nov 1, 2024

And for distracted folks in the same example, it's what... The baby?

If a company knowingly uses a toxic chemical, it shouldn't be everybody else's fault they did that.

TheRealPomax · on Nov 1, 2024

Tell me you haven't had a baby without telling me you haven't had a baby.

You are fucking 10x more aware of bullshit you're doing just to keep baby safe. Probably more like 100x, really. Nothing focusses your mind like having CREATED A HUMAN THAT MUST BE KEPT SAFE AT ALL COST.

mossTechnician · on Nov 2, 2024

I'm confused by your interest in shifting the discussion away from the company that has enough money and lawyers to know better, and back towards anyone but them.

Can't we discuss the company's responsibility?

TheRealPomax · on Nov 2, 2024

Not without picking correct similes or metaphores, no.

mossTechnician · on Nov 2, 2024

Okay, if we must look at metaphors and not the thing you insist we must not look at:

> if you're old and you die in an accident, you died of old age, not "that specific accident"

Your metaphor absolves any drunk driver from murder charges as long as their victim is old enough.

account42 · on Nov 4, 2024

... until you aren't. Plenty of dead babies to prove the point.

genuinelydang · on Oct 31, 2024

Teflon starts to degrade at 260 degrees Celsius / 500 F. That’s within steak searing temperatures.

GuB-42 · on Nov 3, 2024

I don't think a regular stovetop can get a pan to 450°C, my gas stove gets an empty pan to about 300°C maximum. It doesn't happen in normal situations, if it happens it probably means you forgot your pan on the stove. Heating Teflon at 300°C for several hours is bad, but personally, in that situation, I would worry more about causing a house fire.

Teflon flu is a thing, but it is relatively rare, especially considering how widespread Teflon pans are. That's a few hundred cases per year in the US, by comparison, there are about 1000x more house fire, with cooking equipment being a leading cause.

genuinelydang · on Oct 30, 2024

The Nth commit there has N leading zeroes