First of all, I don't think this is satire. I'll admit that the use of a gmail account by a researcher at a Chinese uni is facially suspicious, but it's not that odd given that cursory googling shows that both authors appear to be faculty members at Shanghai Jiao Tong University as claimed on the paper- though neither appears to have much, if any, background or expertise in machine learning.
I'm not much of a fan of a lot of the arguments made in Weapons of Math Destruction, but I do appreciate that in summarizing you draw the distinction between the biases of the engineer or (illogically, but oft-claimed nonetheless) the algorithm itself and the data which is used to train said model, and I think it's quite a valuable concept in regards to this particular paper.
For instance, the data set they're using here is fairly small, and while, they did use 10-fold cross-validation, that's still a bit on the less than ideal side generally speaking neural nets, especially CNN architectures, which are usually pretty deep. Furthermore, the dataset itself seems fairly questionable to me. I'm not sure how much I trust the Chinese criminal justice system to adequately adjudicate culpability in the first place, but even setting aside such admittedly conspiratorial notions, it seems rather odd indeed that nearly half of their positive samples are not in fact convicted criminals but merely suspects. I do not find their attempts at devil's advocate persuasive as it's not readily obvious exactly how they used or obtained any of their testing with the three different data sets.
As for the appropriateness of the broader topic, I'm more or less of the persuasion that all questions deserve to be examined, and that provided the work does not cause direct harm, it's hard for me to support a prohibition on examination of a given topic. That said, I do think that the more controversial the question, the higher quality of research required, and, good lord, does this mess fall well short of the mark. Perhaps if there existed a hypothetical criminal justice system free of systemic biases or, more realistically, a method by which to exactly define those prejudices and account for them in the composition of a data set, this could be a potentially useful question to investigate, but even then it seems to me quite unlikely that there's any particularly significant relationship between one's upper lip curvature and criminal disposition.
But to do it you need experts in criminology, physiology and machine learning, not just a couple of people who can follow the Keras instructions for how to use a neural net for classification.
For example, I think I remember reading a papers in the physiology field that show a link between increased testosterone and different facial features - but from memory (and I don't have the paper) there was no link between that and criminal offending.
In this case, the features they are finding don't seem to make any sense. A slight smile in the criminals seems more likely to be due to the way that set of photos are taken, and a number of the other features could possibly be explained by the fact the criminal set came from a single police department (in a single geographical area), while the other dataset was collected online. Given the small size of the dataset, if it included a single "family"-gang of criminals it is likely that would have been enough to taint the features.
Having dealt with some what similar datasets myself, there is a really, really good chance that the police department grabbed a days of arrests from one or two cities. There are only 730 positive cases - it's pretty easy to imagine that many of them could be from a single gang - either family or ethnically based.
The link between increased testosterone and criminal offending has been established by research:
> Testosterone plays a significant role in the arousal of these behavioral manifestations in the brain centers involved in aggression and on the development of the muscular system that enables their realization. There is evidence that testosterone levels are higher in individuals with aggressive behavior, such as prisoners who have committed violent crimes.
> Inmates who had committed personal crimes of sex and violence had higher testosterone levels than inmates who had committed property crimes of burglary, theft, and drugs. Inmates with higher testosterone levels also violated more rules in prison, especially rules involving overt confrontation.
Though the connection can be said to be weak, and definitely not the only factor (high testosterone alone is not sufficient for criminal offending), it is there.
> In this case, the features they are finding don't seem to make any sense. A slight smile in the criminals seems more likely to be due to the way that set of photos are taken
From the paper: "We stress that the criminal face images in Sc are normal ID photos not police mugshots."
> by the fact the criminal set came from a single police department (in a single geographical area)
Subset Sc contains ID photos of 730 criminals, of which 330 are published as wanted suspects by the ministry of public security of China and by the departments of public security for the provinces of Guangdong, Jiangsu, Liaoning, etc.; the others are provided by a city police department in China under a confidentiality agreement.
> if it included a single "family"-gang of criminals it is likely that would have been enough to taint the features.
Family resemblance is an interesting one. But unlikely to significantly affect the accuracy difference between proper labeling and random labeling (they'd all need to be related)
Overfit is sufficiently ruled out (to me), but leakage is not. Unfortunately it is not possible to replicate this study (even if the dataset was available, the implementation details are scarce). Differently sized raw ID pictures, or compression artifacts, could lead to near undetectable leakage for outsiders. I would probably not give this paper my stamp of approval, even if it was on an uncontroversial subject, but it is not abysmally bad.
I do think one has to be careful to separate moral concerns from technical concerns. Sure, this all feels very wrong to me, and should be taken into account when creating new regulation for ML systems, but the research itself (apart from the small sample size, and vague data gathering methods) is sufficiently solid for debate. Maybe we don't want to admit that phrenology can have a measurable impact on behavior, but that is wishful thinking, not science. Like you said: 'a link between increased testosterone and different facial features' exists, and I just sourced you that a link between criminal behavior and testosterone exists. Logic would deem us to conclude that different facial features are indicative of different criminal behavior, no matter the bizarre, scary, immoral research that supports it.
I'd note that they claim 89.5% accuracy(!) using the CNN classifier. One paper they reference[1] use a similar technique to attempt the (seemingly much easier) task of classifying people into Chinese, Korean or Japanese. They get 75% accuracy.
89% accuracy means that there is almost no other feature that influences criminality.
That should set off all kinds of alarms. If there was some kind of relationship between facial features and criminality (and I don't discount that there could be) I'd expect it to be a very weak one, not one that is accurate 9/10 times.
I don't read a lot of techcrunch to be honest, so maybe this is their standard fare and isn't worth the time I'll take to write this. On the other hand, I dislike that this man just painted my entire industry as xenophobes, without so much as a quote from the party in question. Is this acceptable now? I certainly reject the idea of a world in which psuedo-journalistic conjecture is taken at face value, without so much as an anonymous source.
can you explain more about how you want "coders" to do "missions", but you're charging 20k/yr for access to part of the data, and you have an obnoxiously low rate limit? are you bloomberg? I think we should all stick with quandl for now. But seriously though, it's disingenuous to brand yourself as "open data" but not offer bulk downloads and a realistic API limit.
deeplearning.net and the ufldl tutorial are excellent places to start. I've also perused this ebook recently and found it to be pretty solid in terms of giving mathematically solid but still intuitive.
last but not least, absolutely do no whatsoever "do a small project on deep learning or try out [a] few kaggle competitions." instead, pick up a paper that interests you and implement the methods they describe therein.
edit: here's the ebook link neuralnetworksanddeeplearning.com
I found that one particular phrase,"technically privileged", both hilarious and utterly horrifying. Hilarious because I suppose I never paused to consider that the folks in my major who were both curious and motivated enough to be involved in the field outside of class to be particularly "privileged". I suppose I should have given greater thought to how unfair it was that read API documents and I didn't!
Like I said though, it wasn't just good times and passive aggressive complaining about first world problems while I read this. I can't even begin to fathom the miserable, pathetic, and generally small, unexamined existence that would lead one to believe that h(s)e is somehow the victim of deep injustice at the hands of those people with their prejudiced technical abilities and natural curiosity! How dare they not level the playing field just because she never bothered to explore the use of code outside the classroom; clearly everyone missed that she's the subjugated one, with a comfortable liberal arts education and regular internet access.
But still, we should all take a moment to recognize the plight of the comfortable, generally satisfactory lives of those among us struggling silently with the burden of "technical un-privilege."
What an uncharitable response to someone who had a legitimate revelation and used it as a catalyst for personal growth! Software development is intimidating to outsiders - of course it's a privilege to be on the inside. (Or at least it appears that way.) This isn't a complaint about an unlevel playing field - it's an awakening on how to get better at doing something you love.
What does it even mean to be on the inside? Nobody in my family had any experience with computers. My first programming book was a gift for me when I was ten that I asked for (it was a learn C++ in 24 hours book). You can imagine how that turned out with nobody there to help me. I didn't understand what variables were let alone how to add the compiler to the path on windows. The only help I had was that my dad bought me the book and I had a family computer in the living room.
This article reminded me of an NPR piece I saw a few days ago about "When Women Stopped Coding"[1]. The theory goes that without having prior knowledge of working with computers, people cannot compete in many introductory computer science classes. Computers had been marketed like toys, to one gender: boys. Because of this, it was rare for women to have prior computer experience, and so struggled and dropped out of CS.
It made me reconsider what "technical privilege" really means and why I think it's a valid thing. Exposure to computers and technology outside of the classroom is definitely becoming more common, but not the kind of hobbyist interest that this article talks about.
I don't think the author meant "technically privileged" as "those unfairly given an advantage for whatever reason". To me, it read more as, "folks who grew up immersed in this stuff" vs "folks who come to it later in life". I started coding in high school, and I think I approach it very differently than people who never wrote a line of anything until sometime in college or later. It strikes me as a legit difference, not one to bemoan as unjust but worth pointing out to people just getting started who might be intimidated.
Having had a bit of empathy during my college years, I spent a lot of time complaining about how my instructors and my program leaned heavily on the fact that a large percentage of students would be hobbyists, and used that as an excuse to not teach.
I'm sure it would have been easier to lean on the fact that I'm the son of two computer programmers, always had a computer (and various manuals) in the house, and started learning C when I was 10 - and to look at the plight of people who were less fortunate than me as a "first world problem", as education and employment are normally rarely described.
>But still, we should all take a moment to recognize the plight of the comfortable, generally satisfactory lives of those among us struggling silently with the burden of "technical un-privilege."
Because everybody who goes to college must come from comfortable, generally satisfactory lives - they all must have grown up with the internet, owned computers, went to good schools, and couldn't have come from crushing poverty or from the third world. They must have all been exactly like you, except lazy.
I must be misunderstanding your post, because it seems exceptionally cruel, in an cocky, stupid way.
Did we read the same thing? I got the impression that the author realized at the end that "technically privileged" was just an illusion, and the whole point is that they don't exist.
Yes, to me this piece flies way past the "technically privileged" assertion. And in this way, it is one of the best counter-arguments to the "technically privileged" accusation some people have against our field.
Well they do in a sense. It's the difference between growing up in a family like Roald Dahl's 'Matilda' and in a family that encourages growth. I also think that having a family member in the software development would make the transition into the world of work quite a bit easier. But in the end the right attitude can trump any of these things, which is what the author discovered. I am happy for him.
I'm not much of a fan of a lot of the arguments made in Weapons of Math Destruction, but I do appreciate that in summarizing you draw the distinction between the biases of the engineer or (illogically, but oft-claimed nonetheless) the algorithm itself and the data which is used to train said model, and I think it's quite a valuable concept in regards to this particular paper.
For instance, the data set they're using here is fairly small, and while, they did use 10-fold cross-validation, that's still a bit on the less than ideal side generally speaking neural nets, especially CNN architectures, which are usually pretty deep. Furthermore, the dataset itself seems fairly questionable to me. I'm not sure how much I trust the Chinese criminal justice system to adequately adjudicate culpability in the first place, but even setting aside such admittedly conspiratorial notions, it seems rather odd indeed that nearly half of their positive samples are not in fact convicted criminals but merely suspects. I do not find their attempts at devil's advocate persuasive as it's not readily obvious exactly how they used or obtained any of their testing with the three different data sets.
As for the appropriateness of the broader topic, I'm more or less of the persuasion that all questions deserve to be examined, and that provided the work does not cause direct harm, it's hard for me to support a prohibition on examination of a given topic. That said, I do think that the more controversial the question, the higher quality of research required, and, good lord, does this mess fall well short of the mark. Perhaps if there existed a hypothetical criminal justice system free of systemic biases or, more realistically, a method by which to exactly define those prejudices and account for them in the composition of a data set, this could be a potentially useful question to investigate, but even then it seems to me quite unlikely that there's any particularly significant relationship between one's upper lip curvature and criminal disposition.