Info

social, health, political imagery through the lens of George J Huba PhD © 2012-2019

Posts tagged data mining

Aaahhh… GiGo (garbage in/garbage out). The GiGo phenomenon haunts data analysts, statisticians, researchers, theorists, and someone who loses their identity.

So these huge [health] datasets we keep hearing about … who controls them? what is their validity? reliability? utility? who else gets to see them?

And the data mining algorithms… proprietary or public? based on which tests and algorithms? who developed? who validated? are the methods valid? reliable? have utility?

And the results coming out of big data and proprietary data mining algorithms… reliable? valid? useful? clearly interpreted? limitations stated? misinterpreted?

Is big data and data mining about using world-wide data to find solutions to some of the world’s problems or to sell more books, videos, and cola?

I don’t think anyone really understands the big data sets and their limitations. I doubt that more than a small percentage of the data mining algorithms are valid. I sure as hell do not want somebody blindly using these algorithms on data they do not understand and then helping the government limit healthcare visits for high need, low resource individuals (sound familiar to anyone?).

An experienced statistician-data analyst-methodologist knows that when analyzing a large data set you must spend 98% of your time looking at (and fixing if possible) bad data points. The final 2% of your work is then much more likely to show something that is reliable, valid, and useful.

Big Data may save us, or it might kill us first. Or it might make us Borg or batteries.

Right now the analysts are reticulating splines.

No mo …. GiGo. [Is Nicki Minaj available to record this mantra?]

splines

In the USA, you do not have the right to yell fire in crowded theaters. And you shouldn’t. There are limits to free speech and the invasion of privacy. Somebody needs to tell Emperor Trump that.

It is now possible to data mine public information, the Internet, photos of my home from outer space, credit card records, records of what I read (from book purchases and Internet clicks), records of what I watch (from movie ticket purchases and Internet clicks and cable/satellite clicks), my family tree (and what they read, watch, buy, etc), my health insurers’ reimbursements, my pharmaceutical purchases, and once again what I charge to my credit card. Somebody knows that I regularly go to a Thai restaurant in Chapel Hill and the Panera in Durham (I never use a credit card at McDonald’s so no one will know I like steak and egg bagels) and get my prescriptions filled in Chapel Hill. In many real ways it is now possible for data miners to create models that are almost me by adding in routines that account for the fact that I am predictably unpredictable. And even more scary, they can use a computer model of me to infer things about my family (genetically linked conditions, personality proclivities, intelligence, potential problem behaviors, lifespan). Really sucks, doesn’t it.

WTF have we been thinking? If there was ever an industry in need of regulation it is the folks who are recreating ME as a computer model. Don’t feel left-out… they are also recreating YOU.

I predict that a new industry will arise to “fool” the data miners and make the computer models less accurate by adding random noise and random data to the information the marketers use. This can be done by a variety of techniques probably well-known to the CIA, FBI, NSA, Amazon, Google, health insurers, and Walmart. I currently try to manually introduce random information into my various computer profiles.

You do not have the right to be ME. In person or on the Internet. For any purpose.

To the data miners and marketers who are stealing ME and YOU and YOU and the next two generations of our families I say … “I wonder if there is a special corner in Hell for you.”

This slideshow requires JavaScript.

The penalty for yelling fire in a crowded theater might well be fire.

PS. For those of you who do not like Obamcacare and universally covering pre-existing conditions, remember that current computer models are sophisticated enough to make a good estimate of the odds your own unborn great grandchildren will have certain serious medical conditions and behavior problems. Hell, we better cover pre-existing conditions for the next six generations before somebody decides to pre-disqualify my unborn grandchildren.