'Whether you can observe a thing or not depends on the theory you use.'
...'It is the theory that decides what can be observed.’-- Albert Einstein
I love exploring statistics and the underlying computational math. Not to say I understand it instantly or in some cases at all — but it is this magical underpinning to thought and process. Mathematical modeling was a noble pursuit but for a while I have been relying on it less and less. Viewed as a glimpse into a black box of assumptions and selected variables that may or may not apply to forward thinking when we move beyond a training dataset.
It is the theory which decides what can be observed.' Right? And so, it's not that the data lights up somehow and says, 'Hey, I'm relevant, you should look at me.' It's the theory that tells us what to look for and what's important.
And I think, even, like I said, in our daily lives, what we have rattling in our head--what we're thinking about--starts to structure what becomes salient and important. And so, I think that we are engaging in a quasi-scientific exercise in any interaction that we have with the world: but it's not one where it's world-to-mind. It's mind-to-world. And that. I think is something that, I don't know how you would program a point of view into a computer.—Teppo Fellin, EconTalk podcast with Russ Roberts
Theories basically are telling us what data to look for. Maybe even worse — we are also told how to analyze it or interpret it.
I vividly recall this article by Chris Anderson from 2008. It was important then and even more relevant now.
The End of Theory: The Data Deluge Makes the Scientific Method Obsolete
This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.— Chris Anderson, Wired June 2008
I like the analogy of learning to build a watch or telling the time. Think of it like an homage to early days if you are a seasoned data scientist or perhaps you are deep into the learning how to code or use the myriad of resources and tools out there. Several decades as a data scientist and technical writer, I think this is my ‘telling time’ season. From time-to-time I will accept invitations to run workshops where we are in it. Python, SQL, QGIS — its all up for grabs.
Quantitative storytelling has been the bridge between the years of doing and the new season of understanding. Artificial Intelligence and the energy required to meet the challenge of exponential growth with finite natural resource capital has aligned geospatially with both my curiosity and humanity.
Let’s use sudoko as an introduction. After reading an article in Ecological Informatics by Mario Giampietro and Sandra G.F. Bukkens, Analogy between Sudoku and the multi-scale integrated analysis of social metabolism I am convinced it is a brilliant metaphor.
Sudoko, if you are not familiar, creates constraints across a multilevel grid system.
Each column, each row, and each of the nine 3 × 3 sub-grids contains all of the digits from 1 to 9.
Indeed, the numbers entered in individual cells of the grid (local scale) constrain the overall numerical pattern (bottom–up causation), whereas the partial overall pattern already present constrains the entry of new numbers into the grid (top–down causation).—Giampietro and Bukkens
The multi-scale integrated analysis of societal metabolism (MuSIASEM) accounting method creates a complex information space, analogous to Sudoku, when characterizing the societal metabolic pattern across scales and dimensions. Organizing the data on flows and funds in a multi-level, multi dimensional grid reveals an integrated set of constraints over the quantitative characteristics of the metabolic pattern perceived and represented at different scales and referring to both the external (feasibility) and internal views (viability). The resulting complex information space, represented by the metabolic characteristics and the integrated set of constraints, shows features that are similar to those of Sudoku.
The first row in the table is Whole Society. The level of analysis is n in the hierarchal context. The flow and fund elements as described by Georgescu-Roegen's Flow-Fund Theory of Production are headings where you can see how the lower level categories use the production factors (energy throughput, human activity, power capacity). These categories are the sweet spot for geospatial thought, analysis, and inquiry—land-use, energy, water, agriculture, food, infrastructure…
I will continue to breakdown these concepts but I think they are quite provocative even from a mile high overview.
The way data are organized across hierarchical levels (rows) and dimensions (columns) reveals two internal constraints on the socio- economic functioning: competition for limited resources (vertical constraint) and limited substitutability of production factors (horizontal constraint). Together with the block (regional) constraint, imposed by the dynamic equilibrium between the hypercyclic and dissipative compartment, they generate the “Sudoku effect” in MuSIASEM. This effect is rooted in the impredicative relations among the metabolic characteristics of the parts (local and regional) and between the parts and the whole metabolic pattern (mutual information). Indeed, the characteristics of the parts do affect the characteristics of the whole and vice versa.—Giampietro and Bukkens
For the time being in the sciences we will continue to work from hypotheses but the data is so massive we won’t be able to pick the right variables or externalities — we won’t see them or know which are relevant.
Models for example, might be really good at fitting the data that I have on hand. But because of the massive scale of the data, the variables in the actual data are evolving. We will continue to need our flawed and limited perspectives to frame the questions— hopefully our sudoko brains will be helpful.
Now biology is heading in the same direction. The models we were taught in school about "dominant" and "recessive" genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton's laws. The discovery of gene-protein interactions and other aspects of epigenetics has challenged the view of DNA as destiny and even introduced evidence that environment can influence inheritable traits, something once considered a genetic impossibility.
In short, the more we learn about biology, the further we find ourselves from a model that can explain it.—Chris Anderson, Wired June 2008