Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Aggregating away the signal in your data (stackoverflow.blog)
113 points by yurivish on March 22, 2022 | hide | past | favorite | 11 comments


One person's junk is another's treasure; one person's "normalized data" is another person's "you removed the one data point I cared most about!"

One thing this article is reinforcing to me is the value of domain knowledge to an analyst. I am deeply skeptical of "one size fits all" analysis tools, services, and consultancies for exactly this reason. Making insight actionable requires knowing what actions can be taken, and how.


I got hung up early on by the use of "aggregation"... these visualizations still aggregate data, by the necessity of mapping to a fixed number of pixels! However, the principle is strong: the author is proposing visualizations that make full use of the pattern matching over 2.5 dimensions that our eyes afford us, and by using that range they are able to make fewer assumptions about which summaries of data are sufficient.

Domain knowledge is still essential, both to pick meaningful projections of the data and to drill into patterns once observed. But since domain knowledge is always limited, it's nice to have techniques that allow you to notice patterns you didn't know well enough to summarize.


I agree, but I think the visualizations presented here can be useful in many domains and aren’t generally used. Furthermore, I think showing uncertainty in visualizations is hugely important and this is a step in the right direction there.


Excellent article, aggregation can also obfuscate problems with sensors (for example, weird quantization or duplicating points). It is useful whenever you have high frequency time series to look at the data points for the highest resolution possible for short segments of few data points.


I cannot recommend enough the "John Lamping - The One Weird Trick for Analyzing Big Data ... Eyeball it Early and Often!" video:

https://www.youtube.com/watch?v=jYH8CQS6Ab0

One of the best tips, straight from a practitioner - from a former Google search ranking engineer who touched multiple other domains later in the career. Stop tuning knobs and watching metrics, look at the data!


An example that elucidates the point: https://en.wikipedia.org/wiki/Simpson's_paradox


The full title is "Stop aggregating away the signal in your data".


I'm only halfway through the article, but must say that it's amazing so far and the dataviz much better and careful than what I regularly see.


Excellent article. Faceted visualisation is an incredibly powerful technique.

Something the author hints at but isn't quite explicit: manual inspection of individual examples from your dataset can help you understand what questions to ask, what category to facet on, or the bug in your aggregation.


I think this is a piece written to promote observablehq and their visualization tools.

I think the graphs look great and I want to make similar stuff - does anyone have experience with observable? Does it beat ggplot, tableu and others?


When I saw the yellow of the graph it made me think of Tufts famous book. A few paragraphs in I had to check that it actually wasn't authored by him.

The depth and knowledge the author displays is fantastic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: