Hacker Newsnew | past | comments | ask | show | jobs | submit | bulldoa's commentslogin

> minimum standards to get called into interviews.

I am a bit curious on what is the minimum standard? It does feel to me (an outsider that took 1 class of ML in college) that you need at least Master in ML to get intuition on the probabilistic and linear algebra theory behind ML concepts.


>I am a bit curious on what is the minimum standard? It does feel to me (an outsider that took 1 class of ML in college) that you need at least Master in ML to get intuition on the probabilistic and linear algebra theory behind ML concepts.

On the computer science side it makes sense to understand how to build a tree, not just know how to use one. In computer science you're learning how to write algorithms, not just use them. On the MLE or Machine Learning Software Engineer side, the same rule applies and knowing linear algebra as well as probability theory is helpful so you can write your own ML.

On the data science side you're rarely inventing new ML. What you want to specialize in is: 1) data mining using ML, 2) data cleaning, by knowing what kind of input ML needs, 3) feature engineering, also by knowing what kind of input ML needs, and 4) What kind of ML is ideal to choose, eg, due to the bias/variance trade off.

A lot of these boot camps, classes, and books teach the underlying structure of ML to become an MLE, though for some sort of reason they often advertise it as data science. I find this odd, because MLE pays better and is in higher demand.

On the data science side we rarely need to know these finer details. We just need to know the ML's characteristics so we know when it is the right tool for the job, similar to knowing how to use a data structure but not needing to know how to invent new data structures.

A data scientist needs to be a research specialist which is more of a phd than masters skill, so knowing the underlying math also doesn't matter as much because we know when it is necessary to research it. It's not that a data scientist can't know it, and many do know it out of hobby or classes, but it's far from mandatory. Knowing probability theory and how to digest a problem into multiple paths forward, like how to collect data, is far more valuable as a skill.

And finally, many data scientists barely know how to write code. They're a kind of analyst. I feel like the job title isn't sufficiently explained so software engineers make a lot of assumptions mixing up MLE with DS.

edit: Also, Linear Algebra isn't that bad and is an undergrad class. Probability theory is taught lightly in DS101 freshman year, but the first year of getting a masters probability theory is often taught again to a much more rigorous. This can get a bit harder, but if you understand the basics it's not bad.


curious, what are some cool application of SVD?


Not the OP, but I've got some links for you:

1. https://en.wikipedia.org/wiki/Singular_value_decomposition#A... of which the "Low-rank matrix approximation" is the most important one (it's like looking inside the matrix, seeing its significant components, and zeroing out the remaining ones to save space). See also PCA in statistics.

2. 1976 video about SVD https://www.youtube.com/watch?v=R9UoFyqJca8 that shows a visualization for an algorithm for how to compute it.

3. Good two-part blog post series https://jeremykun.com/2016/04/18/singular-value-decompositio... https://jeremykun.com/2016/05/16/singular-value-decompositio...


Not an application, but this my favorite explanation of SVD:

http://gregorygundersen.com/blog/2018/12/10/svd/

It's such an important operation that I'd say understanding it changes how you understand a lot of linear algebra. That Kun post is also good.


Latent semantic indexing[1] uses SVD to identify relationships between words in unstructured text.

You can use it to search for words and find related texts even though those texts do not contain the actual words you searched for. Or you can use it to find similar texts, even though important words may differ.

Not sure how relevant LSI is these days, not my field at all, but mapping words to vector spaces and using SVD like this kinda blew my mind a bit when I stumbled upon it many years ago.

[1]: https://en.wikipedia.org/wiki/Latent_semantic_analysis#Mathe...


I’ve used to on neural spike data before. If your responses are not too “messy”, you can estimate the part of the response that was due to a stimulus instead of just noise, pretty well.


Very useful for working with MIMO (multi input multi output) control systems.


Off the top of my head,

- Principal component analysis

- Fitting a plane to a set of points

- Linear least squares


I work with scientific data but have to say I have a very shallow math background and also forgotten most what I learnt at school/uni. However whenever I click open the source for scikit-learn / high level code published in papers I see SVD. A lot of the scenarios in biology have been abstracted to be matrix manipulation which is fascinating and I really need to learn.


All of these examples are equivalent to least squares :)


Yeah, for the most part all of engineering is equivalent to least squares I'd say, there's always some non-linear optimization procedure that uses a norm^2 metric since it's so well studied and solved already. It's disappointing as a mathematician, but the rest of me is fine with it :)


How is PCA the same as least squares? I've always understood it as the eigendecomposition of the covariance matrix.


>Now I'm thinking of deploying k8s at home

Are we talking about k8 base on your own server rack at your house?


K3s on few devices. Thinking of grabbing a HP microserver or something with similar case for ITX ryzen (embedded EPYC would be probably too expensive), some storage space, maybe connect few extra bits of compute power into a heterogenous cluster. Put everything except maybe PiHole on it, with ZFS pool exported over bunch of protocols as backing store for persistent volume claim support.


Do you have a recommended tutorial for engineer with backend background to setup a simple k8 infra in ec2, I am interested in understanding devops role better



Do you have a recommended tutorial for engineer with backend background to setup a simple k8 infra in ec2?


Take a look at https://github.com/kelseyhightower/kubernetes-the-hard-way.

That’d be a great first step if the purpose is to learn Kubernetes. If, however, you want to set up a cluster for real use then you will need much more than bare bones Kubernetes (something that solves networking, monitoring, logging, security, backups and more) so consider using a distribution or a managed cloud service instead.


Setting up your own k8s from scratch is kind of like writing your own string class in C++: it’s a good exercise (if it’s valuable for your learning path) but you probably don’t want to use it for actual work.

Maintaining a cluster set up like that is a ton of work. And if you don’t perform an upgrade perfectly, you’ll have downtime. Tools like kops help a lot but you’ll still spend far more time than the $70/month it costs for a managed cluster.


I find that K3s is great for getting started. It has traefik included and its less of a learning curve to actually be productive, vs diving in with K8s and having to figure out way more pieces.


does brave route calls through their own server or is this also p2p?


According to the Brave privacy policy Together uses 8x8’s servers, so my guess is that this is using Jitsi Meet.

The last time I looked through the Jitsi Meet config files, by default it uses Google’s TURN servers to route the calls. Not sure how Brave Together is doing it.


what kind of mathematical models do you guys use (like convex optimizations?), I am very interested in this subject and would hope to learn more.


Whats your opinion on postgres vs nosql like cassandra and aerospike, fundamentally is there any reason that postgres can't scale as well as nosql? If I store key value in postgres and add read replica to scale read and partition to scale write will I not be able to keep up with other nosql solutions? If so what are the reasons?


I would say that by the time you (potentially) have those kinds of problems, you'll most likely have the resources to deal with them.

It's sort of a nice class of problems to have, and a lousy heuristic for choosing solutions before you know what exactly you're dealing with.


Does min account history + min karma to comment enforcement helps?


Woah, I didn't know there is an entire ecosystem of contract job and auditors for big projects. How do companies usually hire contract jobs (outsourced HR, upwork, Accenture)? And how do they hire auditors?


This used to be called "quality assurance". Web and .com blew that apart. We ain't got time for that.


This sounds like a security audit, and auditors found the build system difficult to work with when attempting to audit code.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: