How much data?
I’ve been thinking about this a lot. In our recent work designing predictive algorithms using linear regressions and neural networks, and similar approaches, we’ve discussed the use of EHR (electronic health record) data, and have had some success using such algorithms to reduce deaths from sepsis (blog post from 10/6/2021).
One of many problems, is “how much data?” And it has been interesting to work with our data science colleagues on creating a model, and then carefully slimming it down so that our models can run on smaller data sets, more efficiently, more quickly, with less computing power.
A related problem is “when do we need to forget?” EHR data ages, the way clinicians record findings can change. Our understanding of diseases change. The diseases themselves change. (Delta variant, anyone?)
Will our models perform worse if we use data that is too old? Will they perform better because we gave them more history? Do our models have an “expiration date?”
The Wired.com article above talks about having to remove data that was perhaps illegally acquired, or perhaps after a lawsuit, MUST be removed from a database that powers an algorithm.
Humans need to forget. What about algorithms?
Isn’t human memory about selective attention, selective use of memory? Wouldn’t a human’s perfect memory be the enemy of efficient and effective thinking? I’ve read that recalling a memory slightly changes the memory. Why do we work this way? Is that better for us?
Is there a lesson here for what we are building in silico?
CMIO’s take? As we build predictive analytics, working toward a “thinking machine”, consider: what DON’T we know about memory and forgetting? Are we missing something fundamental in how our minds work as we build silicon images of ourselves? What are you doing in this area? Let me know.