FU, Big Data Pimp.
social, health, political imagery through the lens of G J Huba PhD © 2012-2021
FU, Big Data Pimp.
Big Data (in service to the NSA) wants to be able to document what you do and when and where and with whom. All of the current databases that companies and public agencies maintain can now be tightly linked to get a pretty good profile of any individual.
But, these models of what people will do when you ask them to buy a DVD of Thor 2 or a suit from Brooks Brothers, are actually fairly dumb brute force computer algorithms that break down when certain types of problematic data are fed into them.
Hhhhmmm. Some thoughts below in the mind map. Click the image twice for a full expansion.
No, I haven’t lost “it” and this is not a science fiction story.
With the unleashing of big data, big computing, big temptations, and big greed, it is going to be real tempting to develop a George Huba (and also perhaps a Bill Smith or Mary Doe or heaven forbid, a Donald Trump) computer model that can fairly accurately predict from my lifetime experiences whether one will buy a new car next year (and what type and in which cost range and maybe from which car dealer), purchase or sell a home, shift from converse to adidas sneakers, become emotionally distressed if no chocolate is available, and purchase Apple stock to invest or trade. Or run a simulation of one as the CEO of a particular company to determine who gets the job. Or look at one’s medical history and determine whether it is likely thatgrandchildren will have each of 20 expensive diseases that no insurer wants to touch.
Already the IRS runs programs to estimate the likelihood I cheated on my income taxes, Amazon runs programs to estimate the likelihood I will purchase certain books and socks before or after the December holidays, and my credit card company runs models to determine whether it is likely or not that I purchased shoes while on a business trip (yup, they once froze my credit card while I was in DC on business because their computer model says I only buy sneakers).
OK, so the accuracy of the big data scientists is only something like 20-50% now. What do you think it will be when your book purchasing history is fully integrated with your job history, income, ice cream purchases, pharmaceutical purchases, video watching history, total hamburger purchases, and BMI? And then fine tuned with the grade you got in college English, chemistry, or psychology; whether you had a hiking or a beach vacation and if you purchased (used) sunscreen and had a history of purchasing sun hats; the diseases that all four of your grandparents and parents had at different times in their lifetimes. And whether your car is more than 3 years old. And what do you think it will be when we create a generation of data scientists willing to capitalize on huge data to build such models for salaries that will approach those of professional athletes and rock stars? Most of the world’s richest people are already individuals who are promoting and benefitting from big data to predict what you will do.
Ten years from now, the computer models produced of selected individuals will make Mark Zuckerberg, the Google guys, Apple, and Jeff Bezos look like rank amateurs in profiling. The Russians and Chinese, of course, are already well advanced and a threat every time we vote.
[Oh, and by the way while writing this post Google knows that I looked up Mark Zuckerberg’s name and the spelling of adidas.]
I want to tell anyone that wants to develop a mathematical, computer model of me (or my behavior, beliefs, attitudes, skills, history, and future intentions) to cease and desist. Or [more indelicately] fuck off.
Which raises the questions … Do I own the copyright (patent, trademark) to my own life? [And if I do, what are the limits and will violations of those laws by a number of countries and companies be ignored?]
This is not so far-fetched. I spent my whole life becoming the person I am. Does anybody have the right to take all of the big data about me and distill my life down to formulae and algorithms that will explain my past and current behavior and predict what I will do in the future? Should people be allowed to model individuals, I fear that the suicide rate will go up dramatically as people find out how much these models can be used to control them and when they are predicting you will die.
As a psychologist, I spent my career studying people so that we might better understand their fears and concerns, help them better use their full potential, become happier, control their own aggressive or violent tendencies, and generally become the people THEY WANTED to be. And I, nor other ethical psychologist, struck out with the intent to model the behaviors of others so well that the resulting models could be sold to governments and corporations.
Big and huge data, data scientists, companies, and governments need to be prohibited from violating the rights of individuals to “own” their individual lives. If we ever let others “own” our individual identities, we will have crossed into new territory from which there is no return. The technology is almost there to create such individual mathematical models. Russia, for instance, seems to have a pretty good Trump model they are licensing to the Saudis and others.
I was endowed by my creator to own the copyright, patents, and trademarks of my own life… and to answer for what I chose to do with that intellectual property (free will). I choose not to sell my soul to the devil, it’s data scientists, psychologists, and hackers.
A few more thoughts are in the mind map below.
Click on the diagram to zoom.
BIG Data is coming (or has already come) to healthcare. [It is supposed to usher in new eras of research, economic responsibility, quality and access to healthcare, and better patient outcomes, but that is a subject for another post because it is putting the carriage before the horse to discuss it here.]
What is a data scientist? A new form of bug, a content expert who also knows data issues, an active researcher, someone trained in data analysis and statistics, someone who is acutely aware of relevant laws and ethical concerns in mining health data, a blind empiricist?
This is a tough one because it also touches on how many $$$$$ (€€€€€. ¥¥¥¥¥ , £££££, ﷼﷼﷼﷼﷼, ₩₩₩₩₩, ₱₱₱₱₱) individuals and corporations can make off the carcass of a dying healthcare system.
Never one to back away from a big issue and in search of those who value good healthcare for all over the almighty $ € ¥ £ ₨ ﷼ ₩ ₱, here are some of my thoughts on this issue.
Click image to zoom.
Content knowledge by a well-trained, ethical individual who respects privacy concerns is Queen. Now and forever.
topics and subtopics: who is a “health” data scientist? trained in healthcare? methodology research databases management information systems psychology? psychometrics other public health? epidemiology other medicine? nursing? social work? education? biostatistics? medical informatics? applied mathematics? engineering? theoretical mathematics? theoretical-academic statistics? information technology? computer science? other? conclusions must know content 70% methods 30% must honor ethics 100% laws practice privacy criminal civil federal state other greatest concerns correctness of results conclusions ethical standards meaningfulness validity reliability privacy utility expert in content field data analysis data systems ethics and privacy other member? association with ethics standards licensed? physician nurse psychologist social worker other regulated? federal hipaa state other insured? professional liability errors and omissions continuing education requirements? ethics renewal of licensure regulatory standards insurer commonsense laws go away if not well trained content field data analysis not statistics committed clean data meaningfulness subject privacy peer review openness ethics ethics ethics are arrogant narrow-minded purely commercial primarily motivated $$$$$ blind number cruncher atheoretical © 2013 g j huba
I wouldn’t go on a bus trip with a driver who is unlicensed. Would you?
Who is driving the Big Data bus? Data scientists? Mindless algorithms? Content experts and their teams of data scientist support staff? Marketing? Security firms (including those run by governments)? Terrorists?
I say this once, I will say this a million times … Content is Queen.
Algorithms that are primarily empirical without an understanding of the validity of the data being analyzed and the theoretical issues are dangerous.
An algorithm can predict — and I have no doubt several are doing so at this minute — how happy I will be on a global question (how happy are you?) or a behavioral index (at a sporting event, at the bank cashing a check, four days after the death of a parent) or the perceptions of others (just got tagged in somebody’s photo, got mentioned in a tweet, had a happy blog entry, had birthday, just had a child born, got back a favorable medical test result, used a smiley face).
I have observed and analyzed and proposed new ways of measuring “happiness” and “anxiety” and “grieving” and “intelligence” for 40 years. I don’t really know what “happiness” or “anxiety” or “grieving” or “intelligence” is although I do know a lot about how experts have tried to define these constructs. I do know that a blind algorithm is not going to answer the question of what “happiness” is.
Do you want an algorithm driving the bus or someone who knows the limits of current data? I don’t want a blind algorithm predicting whether I am “happy” (and happy enough to buy something). I don’t want a blind algorithm predicting the economy. I don’t want a blind algorithm predicting how many healthcare visits I should receive under health insurance.
Content is Queen. The algorithms that drive the organization of Big Data need to be guided by content specialists (psychologists, sociologists, physicians, nurses, economists, physicists, chemists, bioelectrical engineers, etc.) not data scientists without expertise in one or more of the relevant content fields.
If the Queen rules, all will probably be well in the kingdom. If blind algorithms rule we probably will end up as batteries in The Matrix.
I vote (before it is too late) for the monarchy of content. I am not a battery.
In the USA, you do not have the right to yell fire in crowded theaters. And you shouldn’t. There are limits to free speech and the invasion of privacy. Somebody needs to tell Emperor Trump that.
It is now possible to data mine public information, the Internet, photos of my home from outer space, credit card records, records of what I read (from book purchases and Internet clicks), records of what I watch (from movie ticket purchases and Internet clicks and cable/satellite clicks), my family tree (and what they read, watch, buy, etc), my health insurers’ reimbursements, my pharmaceutical purchases, and once again what I charge to my credit card. Somebody knows that I regularly go to a Thai restaurant in Chapel Hill and the Panera in Durham (I never use a credit card at McDonald’s so no one will know I like steak and egg bagels) and get my prescriptions filled in Chapel Hill. In many real ways it is now possible for data miners to create models that are almost me by adding in routines that account for the fact that I am predictably unpredictable. And even more scary, they can use a computer model of me to infer things about my family (genetically linked conditions, personality proclivities, intelligence, potential problem behaviors, lifespan). Really sucks, doesn’t it.
WTF have we been thinking? If there was ever an industry in need of regulation it is the folks who are recreating ME as a computer model. Don’t feel left-out… they are also recreating YOU.
I predict that a new industry will arise to “fool” the data miners and make the computer models less accurate by adding random noise and random data to the information the marketers use. This can be done by a variety of techniques probably well-known to the CIA, FBI, NSA, Amazon, Google, health insurers, and Walmart. I currently try to manually introduce random information into my various computer profiles.
You do not have the right to be ME. In person or on the Internet. For any purpose.
To the data miners and marketers who are stealing ME and YOU and YOU and the next two generations of our families I say … “I wonder if there is a special corner in Hell for you.”
The penalty for yelling fire in a crowded theater might well be fire.
PS. For those of you who do not like Obamcacare and universally covering pre-existing conditions, remember that current computer models are sophisticated enough to make a good estimate of the odds your own unborn great grandchildren will have certain serious medical conditions and behavior problems. Hell, we better cover pre-existing conditions for the next six generations before somebody decides to pre-disqualify my unborn grandchildren.