> minimum standards to get called into interviews.
I am a bit curious on what is the minimum standard? It does feel to me (an outsider that took 1 class of ML in college) that you need at least Master in ML to get intuition on the probabilistic and linear algebra theory behind ML concepts.
>I am a bit curious on what is the minimum standard? It does feel to me (an outsider that took 1 class of ML in college) that you need at least Master in ML to get intuition on the probabilistic and linear algebra theory behind ML concepts.
On the computer science side it makes sense to understand how to build a tree, not just know how to use one. In computer science you're learning how to write algorithms, not just use them. On the MLE or Machine Learning Software Engineer side, the same rule applies and knowing linear algebra as well as probability theory is helpful so you can write your own ML.
On the data science side you're rarely inventing new ML. What you want to specialize in is: 1) data mining using ML, 2) data cleaning, by knowing what kind of input ML needs, 3) feature engineering, also by knowing what kind of input ML needs, and 4) What kind of ML is ideal to choose, eg, due to the bias/variance trade off.
A lot of these boot camps, classes, and books teach the underlying structure of ML to become an MLE, though for some sort of reason they often advertise it as data science. I find this odd, because MLE pays better and is in higher demand.
On the data science side we rarely need to know these finer details. We just need to know the ML's characteristics so we know when it is the right tool for the job, similar to knowing how to use a data structure but not needing to know how to invent new data structures.
A data scientist needs to be a research specialist which is more of a phd than masters skill, so knowing the underlying math also doesn't matter as much because we know when it is necessary to research it. It's not that a data scientist can't know it, and many do know it out of hobby or classes, but it's far from mandatory. Knowing probability theory and how to digest a problem into multiple paths forward, like how to collect data, is far more valuable as a skill.
And finally, many data scientists barely know how to write code. They're a kind of analyst. I feel like the job title isn't sufficiently explained so software engineers make a lot of assumptions mixing up MLE with DS.
edit: Also, Linear Algebra isn't that bad and is an undergrad class. Probability theory is taught lightly in DS101 freshman year, but the first year of getting a masters probability theory is often taught again to a much more rigorous. This can get a bit harder, but if you understand the basics it's not bad.
1. https://en.wikipedia.org/wiki/Singular_value_decomposition#A...
of which the "Low-rank matrix approximation" is the most important one (it's like looking inside the matrix, seeing its significant components, and zeroing out the remaining ones to save space). See also PCA in statistics.
Latent semantic indexing[1] uses SVD to identify relationships between words in unstructured text.
You can use it to search for words and find related texts even though those texts do not contain the actual words you searched for. Or you can use it to find similar texts, even though important words may differ.
Not sure how relevant LSI is these days, not my field at all, but mapping words to vector spaces and using SVD like this kinda blew my mind a bit when I stumbled upon it many years ago.
I’ve used to on neural spike data before. If your responses are not too “messy”, you can estimate the part of the response that was due to a stimulus instead of just noise, pretty well.
I work with scientific data but have to say I have a very shallow math background and also forgotten most what I learnt at school/uni. However whenever I click open the source for scikit-learn / high level code published in papers I see SVD. A lot of the scenarios in biology have been abstracted to be matrix manipulation which is fascinating and I really need to learn.
Yeah, for the most part all of engineering is equivalent to least squares I'd say, there's always some non-linear optimization procedure that uses a norm^2 metric since it's so well studied and solved already. It's disappointing as a mathematician, but the rest of me is fine with it :)
K3s on few devices. Thinking of grabbing a HP microserver or something with similar case for ITX ryzen (embedded EPYC would be probably too expensive), some storage space, maybe connect few extra bits of compute power into a heterogenous cluster. Put everything except maybe PiHole on it, with ZFS pool exported over bunch of protocols as backing store for persistent volume claim support.
Do you have a recommended tutorial for engineer with backend background to setup a simple k8 infra in ec2, I am interested in understanding devops role better
That’d be a great first step if the purpose is to learn Kubernetes. If, however, you want to set up a cluster for real use then you will need much more than bare bones Kubernetes (something that solves networking, monitoring, logging, security, backups and more) so consider using a distribution or a managed cloud service instead.
Setting up your own k8s from scratch is kind of like writing your own string class in C++: it’s a good exercise (if it’s valuable for your learning path) but you probably don’t want to use it for actual work.
Maintaining a cluster set up like that is a ton of work. And if you don’t perform an upgrade perfectly, you’ll have downtime. Tools like kops help a lot but you’ll still spend far more time than the $70/month it costs for a managed cluster.
I find that K3s is great for getting started. It has traefik included and its less of a learning curve to actually be productive, vs diving in with K8s and having to figure out way more pieces.
According to the Brave privacy policy Together uses 8x8’s servers, so my guess is that this is using Jitsi Meet.
The last time I looked through the Jitsi Meet config files, by default it uses Google’s TURN servers to route the calls. Not sure how Brave Together is doing it.
Whats your opinion on postgres vs nosql like cassandra and aerospike, fundamentally is there any reason that postgres can't scale as well as nosql? If I store key value in postgres and add read replica to scale read and partition to scale write will I not be able to keep up with other nosql solutions? If so what are the reasons?
Woah, I didn't know there is an entire ecosystem of contract job and auditors for big projects. How do companies usually hire contract jobs (outsourced HR, upwork, Accenture)? And how do they hire auditors?
I am a bit curious on what is the minimum standard? It does feel to me (an outsider that took 1 class of ML in college) that you need at least Master in ML to get intuition on the probabilistic and linear algebra theory behind ML concepts.