Hacker Newsnew | past | comments | ask | show | jobs | submit | Componica's commentslogin

Location: Iowa City, IA

Remote: Yes

Willing to relocate: Iowa, Minnesota, Michigan, Wisconsin, Chicago, Maine, Vermont, and other areas less affected by future climate change with a reasonable cost of living.

Technologies: C/C++, Python, Javascript, OpenCV, PyTorch

Resume/CV: Upon request, LinkedIn https://www.linkedin.com/in/stevencharlesmitchell/

Email: steve@componica.com

More than a decade of experience in creating innovative products and co-founding startups within the computer vision space, I have a strong background in developing object tracking in 2D and 3D, facial features interpretation, and eye gaze tracking. My current project involves eye-tracking technology for fighter pilots, designed for use in both simulator cockpits and helmet displays. Previous projects include systems to optically grade student assessments via Chromebooks and WSAM in real-time, 3D multi-camera tracking of body poses, visual interpretation of emotional affect in both humans and animals, and technology for measuring pupil responses. I also have a background in medical computer vision applications, including automated computation of ejection fractions from cardiac MR and echocardiogram data, as well as interpretation of intravascular ultrasound (IVUS) images.


The Yann LeCun paper 'Gradient-Based Learning Applied to Document Recognition' specified the modern implementation of a convolutional neural network and was published in 1998. AlexNet, which woke up the world to CNNs, was published in 2012.

Between that time in the early 2000s I was selling implementations of really good object classifiers and OCRs.


It's not like people had been ignoring Yann LeCun's work prior to AlexNet. It received quite a few citations and was famously used by the US Postal Service for reading handwritten digits.

AlexNet happened in 2012 because the conditions necessary to scale it up to more interesting problems didn't exist until then. In particular, you needed:

- A way to easily write general-purpose code for the GPU (CUDA, 2007).

- GPUs with enough memory to hold the weights and gradients (~2010 - and even then, AlexNet was split across 2 GPUs).

- A popular benchmark that could demonstrate the magnitude of the improvement (ImageNet, 2010).

Additionally, LeCun's early work in neural networks was done at Bell Labs in the late 80s and early 90s. It was patented by Bell Labs, and those patents expired in the late 2000s and early 2010s. I wonder if that had something to do with CNNs taking off commercially in the 2010s.


My take during that era was neural nets were considered taboo after the second AI winter of the early 90s. For example, I once proposed a start-up to consider a CNN as an alternative to their handcrafted SVM for detecting retina lesions. The CEO scoffed, telling me neural networks were dead only to acknowledge they were wrong a decade later. Younger people today might not understand, but there was a lot of pushback if you even considered using a neural network during those years. At the time, people knew that multi-layered neural networks had potential, but we couldn’t effectively train them because machines weren't fast enough, and key innovations like ReLU, better weight initializations, and optimizers like Adam didn't exist yet. I remember it taking 2-3 weeks to train a basic OCR model on a desktop pre-GPU. It wasn't until Hinton's 2006 work on Restricted Boltzmann Machines that interest in what we now call deep learning started to grow.


> My take during that era was neural nets were considered taboo after the second AI winter of the early 90s.

I'm sure there is more detail to unpack here (more than one paragraph, either yours or mine, can do). But as written this isn't accurate.

The key thing missing from "were considered taboo ..." is by whom.

My graduate studies in neural net learning rates (1990-1995) were supported by an NSF grant, part of a larger NSF push. The NeurIPS conferences, then held in Denver, were very well-attended by a pretty broad community during these years. (Nothing like now, of course - I think it maybe drew ~300 people.) A handful of major figures in the academic statistics community would be there -- Leo Breiman of course, but also Rob Tibshirani, Art Owen, Grace Wahba (e.g., https://papers.nips.cc/paper_files/paper/1998/hash/bffc98347...).

So, not taboo. And remember, many of the people in that original tight NeurIPS community (exhibit A, Leo Breiman; or Vladimir Vapnik) were visionaries with enough sophistication to be confident that there was something actually there.

But this was very research'y. The application of ANNs to real problems was not advanced, and a lot of the people trying were tinkerers who were not in touch with what little theory there was. Many of the very good reasons NNs weren't reliably performing well are (correctly) listed in your reply starting with "At the time".

If you can't reliably get decent performance out of a method that has such patchy theoretical guidance, you'll have to look elsewhere to solve your problem. But that's not taboo, that's just pragmatic engineering consensus.


You're probably right in terms of the NN research world, but I've been staring at a wall reminiscing for a 1/2 hour and concluded... Neural networks weren’t widely used in the late 90s and early 00s in the field of computer vision.

Face detection was dominated by Viola-Jones and Haar features, facial feature detection relied on active shape and active appearance models (AAMs), with those iconic Delaunay triangles becoming the emblem of facial recognition. SVMs were used to highlight tumors, while kNNs and hand-tuned feature detectors handled tumors and lesions. Dynamic programming was used to outline CTs and MRIs of hearts, airways, and other structures, Hough transforms were used for pupil tracking, HOG features were popular for face, car, and body detectors, and Gaussian models & Hidden Markov Models were standard in speech recognition. I remember seeing a few papers attempting to stick a 3-layer NN on the outputs of AAMs with limited success.

The Yann LeCun paper felt like a breakthrough to me. It seemed biologically plausible, given what I knew of the Neocognitron and the visual cortex, and the shared weights of the kernels provided a way to build deep models beyond one or two hidden layers.

At the time, I felt like Cassandra, going from past colleagues and computer vision-based companies in the region, trying to convey to them just how much of a game changer that paper was.


My anecdote on the AI winter: I went for grad studies in ML (really, just to learn ANNs) in the early/mid 2000s and we had two tenured professors.

One taught all of the data mining/ML algorithms including SVMs, and was clearly on their way up.

The other was relegated to teaching a couple of ANN courses and was backwatered.

The agreement was that they wouldn’t overlap in topics. Yet the first professor would take subtle couldn’t help but to take one or two swipes at ANNs when discussing SVMs.


In our area of the rural Midwest, a local AM station made a niche for themselves being the critical go to point of information during tornado, storm, flooding, and derecho warnings. Sadly as the storms keep showing up more often and more powerful, I've actual bought a couple more portable AM/FM radios spread around the house because of it.

There have been several times being in a car during those warnings that that AM station was the only reliable source of information.


My three partners and I have be developing and selling multi-camera arrays specifically for eye tracking as well as measuring other physiological features for several years now. Our main customers are a couple university research groups, a human factors group in Lockheed, and just recently the US Air Force. In fact we just returned from a trip to Wright-Patterson installing an array in a hypobaric chamber to perform gaze tracker and pupil response for pilots under hypoxic conditions. Phase two will be a custom gaze tracker for their centrifuge. Our main features are accurate eye and face tracking up to a meter from the array, minimal calibration per subject (about 10 seconds staring at a dot), pupil response for measuring fatigue and other things, plus we can adapt the array for the client ranging from a cockpit to a large flat screen TV. We've looked into medical usage such as ALS, but we're bootstrapped based in Iowa and found the military niche as a more direct way to generate cash flow. It's ashame we can't apply this work towards people with medical needs, but we don't have the funds nor the clients to make such a pivot.


Have you thought about settinng up a subsidiary that licenses your base tech for a reasonable royalty fee and raise capital for the subsidiary to develop a medical product from it?

The risk and part of the returns there are for the investors. While it will generate additional revenue (and diversification) for your bootstrapped company allowing you to keep building and mitigate some of the risk of having a narrow (military) client base.

And if it becomes a major success (sounds like pg thinks that's possible) you'll co-own it.


One of our partners is a renowned neuro-ophthalmologist at our local university, and he's always suggested medical applications for our array. However, the combination of low volume, slim margins, and regulatory challenges (acquiring patient training sets and navigating FDA approval) makes this direction potentially perilous for us. What helps us is our human factors partner has decades long connections with NASA, FAA, and DoD which gets us a foot in the door, and allows us adopt a low volume/high margin strategy. This approach is letting us refine our technology and then we can focus on standardizing it, reducing costs, and perhaps refocus on medical applications.

I do think your suggestion is a good approach, but what's key is finding smart investment that has experience in developing medical devices.


How expensive is your system?

Do you have software that converts internal tracking info into pixel coordinates? Multiple screens?


Luxury car prices per array due to development, customization and the type of clientele. We have a mech engineer onsite to adapt the camera arrays to the environment, such as a cockpit or centrifuge, while taking into account the field of views and mounts. Yes it can handle multiple screens and curves like domes, the fun part is working out the geometry to convert gaze vectors into pixel coordinates. Pixel accuracy is a function of the number of camera (6 to 12), distance to the array, and geometry of the subject and array but we can typically discern which button / control / digit a pilot observed.

Of course cost can be substantially decreased if everything was standardized and the mech engineering was done once. BOMs (excluding the high-end desktop computer) is about $2000-$3000. My dream would be to reduce that cost by moving the computer vision to compute modules per pairs of cameras reducing the BOM to < $1000 and avoiding the desktop.


Location: Iowa City, Iowa

Remote: Yes

Willing to relocate: Impossible, due to family.

Technologies: Computer Vision (PyTorch / OpenCV), C++, Python, Javascript

Résume: https://www.linkedin.com/in/stevencharlesmitchell/

Email: componica [] gmail.com

Over 20 years of computer vision experience (it's true) and been involved in several startups involving document scanning / grading, education, medical, face and eye tracking. My favorite things to do are smashing conventional and deep learning computer vision algorithms into embedded devices, tracking landmarks on video, and figuring out where a human is looking.



Yes, that's the one! The containers are probably full of electronic products for the EOY push.


Quoting Wikipedia: "As of 2020, the majority of hydrogen (∼95%) is produced from fossil fuels by steam reforming of natural gas, partial oxidation of methane, and coal gasification"

"As of 2020 most of hydrogen is produced from fossil fuels, resulting in carbon emissions."

https://en.wikipedia.org/wiki/Hydrogen_production


It's most likely a demand by China so that they can create an infrastructure to locate political dissidents. Oh look, a Winnie the Pooh Xi meme ended up in your gallery/inbox. Why is there a knock at the door? I'm pretty sure thats the real reason.


It’s silly to think the US government hasn’t been twisting their arm to do this for years.


My ears perk up whenever I hear a "Just think of the Children" argument because after Sandy Hook, I'm pretty certain the US could careless about children. There's a real reason behind this.


Whenever someone makes a "think of the children" argument it has absolutely nothing to do with whether they actually care about children. They just want to make it extremely difficult to counter-argue without being labelled as pedophile adjacent. It is a completely disingenuous argument 99% of the time it is used.


I feel like there’s a fallacy for this but I’m not sure. Either way doesn’t matter the logic here isn’t that the US could careless about children , it’s that the US cares more about Gun rights than it does children , but that doesn’t say anything about the minimum level of care they have , only the maximum.


My bet it was the US government. Predicted fallout doesn’t seem something Apple would do themselves with no monetary gain.


what's this take based on - have a link or anything?


I don't know about the suggestion that it's the government of China pushing for the feature itself, but the fact the feature now exists and WILL be used by authoritarian regimes to scan for political content is clearly understood by Apple employees. From the article:

> Apple employees have flooded an Apple internal Slack channel with more than 800 messages on the plan announced a week ago, workers who asked not to be identified told Reuters. Many expressed worries that the feature could be exploited by repressive governments looking to find other material for censorship or arrests, according to workers who saw the days-long thread.


your quote contradicts your statement. it seems like the employees are worried that it could be exploited, not that it necessarily will be.


[flagged]


In fairness, this is literally happening on Chinese messenger services today so I wouldn't call it either of those


Apple isn't being pressured from China, but criticizing the CCP as the brutal dictatorship it is is not racist (and that idea is done verbatim CCP propaganda).


Randomly suggesting that every authoritarian decision taken unilaterally by an american company was made to please the evil chinese government while offering no substantial proof is not "criticizing the CCP", it is just shoehorning US's far-right talking points into a thread that has nothing to do with it.


Why then release this feature in the US? Why not just release it in China and avoid all this negative press? Bad take is bad


Imagine taking a photo or have in your gallery a photo a dear leader doesn't want to spread. Ten minutes later you heard a knocking at your door. That's what I'm most worried about, how is this not creating the infrastructure to ensnare political dissidents.


I am profoundly disappointed that almost all of the discussion is about the minutiae of the implementation, and "Hmm.. Am I ok with the minutiae of Apple's specific implementation at rollout?" And almost nobody is discussing the basic general principle of whether they want their own device to scan itself for contraband, on society's behalf.


But Apple says, "You need several hash matches to trigger a review." See, that makes it OK!


Maybe people realize that’s not a winning strategy and thus keep going back to technical details…


Imagine taking a photo or have in your gallery a photo a dear leader doesn't want to spread. Ten minutes later you heard a knocking at your door.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: