social, health, political imagery through the lens of George J Huba PhD © 2012-2019

Posts tagged Big Data

Big Data (in service to the NSA) wants to be able to document what you do and when and where and with whom. All of the current databases that companies and public agencies maintain can now be tightly linked to get a pretty good profile of any individual.

But, these models of what people will do when you ask them to buy a DVD of Thor 2 or a suit from Brooks Brothers, are actually fairly dumb brute force computer algorithms that break down when certain types of problematic data are fed into them.

Hhhhmmm. Some thoughts below in the mind map. Click the image twice for a full expansion.


The only way I see to develop effective medical treatments and care models for many of the thousands of rare diseases is to pool the RESEARCH resources that individual countries are spending and the data countries are collecting about individual rare diseases and put those research resources under international control for prioritizing research agenda and ensuring public access to ALL results and research data.

Yes, I know the USA (probably the largest resource contributor) Congress will go in front of the television cameras and say that the failure of the United Nations and the disproportionate contributions to a pooled resource fund will ensure failure. They will point to the failure of the world to effectively coordinate collaborative research on HIV/AIDS and point to politics, homophobia, disrespect, and the hatred of American politics by certain national and fundamentalist groups and say we would be wasting our money by letting Africans and Arabs and the Russians and Chinese and Indians and Asians and South Americans collaborate with the USA on research and ensuring that research leads to effective treatments for at least some rare diseases.

Enough already. Let’s rise to the occasion of solving resource limitations in studying rare diseases and get an effective mechanism in place for expanding the impact of admittedly small research efforts by individual countries through international cooperation. I trust the governments of the world to collaborate, contribute as they can, and help us start to get some of these diseases treatable. Disease knows no boundaries.

In the last century we collectively developed very advanced medical research techniques. In this century we need to use these methods to solve all of the medical problems possible by putting aside the nonsense politics and nationalism and individual egos and predatory profits and focus on solving many medical issues and ensuring access to effective treatment world wide.

Here’s a way to start. Any yes, this is a test of our humanity and commitment to universal human rights of which medical treatment is but one. But let’s start somewhere that should be relatively easy to agree on (and let a few hundred angry politicians in the USA know that the world considers them bratty children and cannot tolerate their obstructionist and oppositional behavior).

Click on the image to expand. And let’s start the process of collaboration.

rare diseases time for effective international cooperation

EU rare disease rare disease

Remember the “gold standard” research paradigm for determining if a medical treatment works: the DOUBLE BLIND, RANDOM ASSIGNMENT EXPERIMENT?

The design has historically been considered the best way to “prove” that new medical interventions work, especially if the experiment is replicated a number of times by different research teams. By the double blind (neither the treating medical team nor the patient know whether the patient is taking a placebo or active medication) design, investigators expect to negate the placebo effects caused by patient or medical staff beliefs that the “blue pill” is working.

A key part of virtually all double-blind research designs is the assumption that all patient expectations and reports are independent. This assumption is made because of the statistical requirements necessary to determine whether a drug has had a “significantly larger effect” as compared to a placebo. Making this assumption has been a “standard research design” feature since long before I was born more than 60 years ago.


Google the name of a new drug in clinical trials. You will find many (hundreds, thousands) of posts on blogs, bulletin boards for people with the conditions being treated with the experimental drug, and social media, especially Twitter and Facebook. Early in most clinical trials participants start to post and question one another about their presumed active treatment or placebo status and whether those who guess they are in the experimental condition think the drug is working or not. Since the treatments are of interest to many people world-wide who are not being treated with effective pharmaceuticals, the interest is much greater than just among those in the study.

Google the name of a new drug being suggested for the treatment of a rare or orphan disease that has had no effective treatments to date and you will find this phenomenon particularly prevalent for both patients and caregivers. Hope springs eternal (which it SHOULD) but it also can effect the research design. Obviously data that are “self reported” from patient or caregiver questionnaires can be affected by Internet “the guy in Wyoming says” or the caregiver of “the woman in Florida.”

OK you say, but medical laboratory tests and clinical observations will not be affected because these indices cannot be changed by patient belief they are in the experimental or placebo conditions. Hhmmm, Sam in Seattle just posted that he thinks that he in the experimental condition and that his “saved my life” treatment works especially well if you walk 90 minutes a day or take a specific diet supplement or have a berry-and-cream diet. Mary in Maine blogs the observation that her treatment is not working so she must be in the placebo condition and becomes very depressed and subsequently makes a lot of changes in her lifestyle, often forgetting to take the other medications she reported using daily before the placebo or experimental assignment was made.

Do we have research designs for the amount of research participant visible (blogs, tweets, bulletin boards) and invisible (email, phone) communication going on during a clinical trial? No. Does this communication make a difference in what the statistical tests of efficacy will report? Probably. And can we ever track the invisible communications going on by email? Note that patients who do not wish to disclose their medical status will be more likely to use “private” email than the public blog and bulletin board methods.

Want an example. Google davunetide. This was supposed to be a miracle drug for the very rare neurodegenerative condition PSP. The company (Allon) that developed the drug received huge tax incentives in the USA to potentially market an effective drug for a neglected condition. The company, of course, was well aware that after getting huge tax incentives to develop the pharmaceutical, if the drug were to prove effective in reducing cognitive problems (as was thought), it would then be used with the much more common (and lucrative from the standpoint of Big Pharma) neurodegenerative disorders (Alzheimer’s, Parkinson’s) and schizophrenia.

Patients scrambled to get into the trial because an experimental medication was better than no medication (as was assumed, although not necessarily true) and the odds were 50/50 of getting the active pills.

Patients and caregivers communicated for more than a year, with the conversations involving patients from around the world. In my opinion, the communications probably increased the placebo effect, although I have no data nor statistical tests of “prove” this and it is pure conjecture on my part.

The trial failed miserably. Interestingly, within a few weeks after announcing the results, the senior investigators who developed and tested the treatment had left the employ of Allon. Immediately after the release of the results, clinical trial participants (the caregivers more than the patients) started trading stories on the Internet.

Time for getting our thinking hats on. I worked on methodological problems like this for 30+ years, and I have no solution, nor do I think this problem is going to be solved by any individual. Teams of #medical, #behavioral, #communication, and #statistical professionals need to be formed if we want to be able to accurately assess the effects of a new medication.

Click on the image to expand.

Clinical Trial  Double-Blind  Treatment Evaluation  in the Era of the Internet

Banks and online merchants use fairly sophisticated algorithms to identify probable cases of financial fraud and then protect themselves from the consequences of lost or stolen credit cards, etc. One of the most prevalent forms of elder abuse is financial. Aging adults are attacked by predators trying to get them to refinance their homes with reverse mortgages at exorbitant rates; make huge gifts for “kindness” from strangers; and one scheme after another. Sadly, much of the financial abuse is perpetrated by family members. And predatory financial scams are often targeted at aging immigrants to the US. Instead of just checking credit card records for fraud so as to protect themselves from liability, banks could use the same types of algorithms to scan withdrawals from savings and brokerage accounts as well as charges to credit cards to determine if they are atypically large for someone in their 80s.  (At least in California) Banks are mandated reporters (to law enforcement) of suspected financial abuse of elders. Wouldn’t it be nice if banks used the algorithms they already use to protect themselves (at the expense of your privacy) to at least protect older individuals (at a loss of the privacy they already gave up when they opened accounts) from the scum who try to separate cognitively impaired or depressed seniors from their lifetime savings? Wouldn’t that be nice …..



Big Data/Data Science 1

Big Data/Data Science 2

Big Data/Data Science 3

Big Data/Data Science 4

Big Data/Data Science 5

Big Data/Data Science 6

Big Data/Data Science 7

Big Data/Data Science 8

Big Data/Data Science 9

Big Data/Data Science 10

Big Data/Data Science 11

Big Data/Data Science 12

Big Data/Data Science 13

Big Data/Data Science 14

A few thoughts about the importance of knowing the theories and prior studies in the content area of the modeling and data collection and data analysis and generation of conclusions.

You can’t model data without knowing what the data mean.

Click on mind map to expand.

Data Scientist

We have had many data science fields in the past 50 years. Among others, the fields include applied statistics, biostatistics, psychometrics, quantitative psychology, econometrics, sociometrics, epidemiology, and many others. The new emphasis on data science ignores content knowledge about the data and their limitations and the permissible conclusions.

We do not need to replace a round wheel with a square one.

See also previous post on Big Data/Data Science adopting the mistakes of Big Pharma.

a HubaMap™ by g j huba phd

Dec 13 2013: I have been experimenting with some formatting. This is the same map content as above, but using iMindMap 7 which was recently released.

Data Scientist sketch

Can Big Data/Data Science avoid the train wreck of Big Pharma? I believe that the Big Data disaster will make the Big Pharma issues seem small in comparison.

But the issues will be about the same. A lot of the Big Pharma execs have become quite skilled at “beating the system” using “undocumented science” and many will move to Big Data and employ all of their very “best” moves and tricks. Big Data/Data Science has the potential to hurt the average individual even more than the greediness of Big Pharma.

Big Pharma

Big Pharma Train Wreck

Big DataBig Data Train Wreck


HubaMap™ by g j huba phd

Divvy is a wonderful free data visualizer program for the Mac. The program permits a number of data reductions using highly informative transformations, cluster analysis, and plots.

Indispensable for exploring data. Free. VERY fast.

Click on images to expand.

Click here to go to the Divvy web site.


Divvy Cluster

This afternoon I went to the local Panera and paid by credit card. My bank declined my charge of $4.82. I figured it was the magnetic strip on the card which had failed or that the new trainee using the cash register may have made a mistake. She ran the card three more times and it was rejected. Then I got four text messages from the bank saying that they are rejected my charges. To text me, they used my phone number.

I called. They had put a hold on my card because they had some questions about my charges from the prior few days. The red flag event was that I had made an earlier charge of $9.65 at Panera about eight hours before. Their computer program was not smart enough to figure out that it was not unreasonable for someone to have breakfast at 6:30am at a Panera in Durham and then walk into a Panera in Chapel Hill later in the day with 30 minutes to kill and had a coffee (and a Danish I probably should not have had) while I played with my iPad on their free wireless connection. The computer also questioned the $1 charge at a gas station this afternoon (which the human representative immediately recognized as the established practice of gas stations opening charge lines with their automated payment systems of $1 when you swipe your card and then next day putting a $92 charge on the card for filling the tank). I was also asked if the payment made on the account was one I had made (I asked the customer service rep if she thought that if someone had paid a bill for me that I would tell her it was an erroneous transaction and she laughed for a long time) as well as a $71 charge to a software company outside the US.

They had freaked out because they could not reach me by phone at three numbers that were old ones not active (I know they have my current number because they sent me texts at it and same bank sometimes calls about my other accounts at the cell phone I never turn off and which has a voice mailbox). Of course, if they did not have a no reply text address, I could have responded to the four texts they sent.

Predictive models have been around for a decade or more in banks as they attempt to identify fraud and protect themselves. The episodes I have with my bank about every 2-3 months illustrate what happens when somebody blindly runs predictive analytic programs through big datasets without using some commonsense to guide the modeling process. Just because anyone can buy a $100,000 program from IBM or others for developing predictive analytics does not mean that the model that comes out of the Big Data and expensive program makes any sense at all.

Or that the NSA or FBI or CIA or Google or Amazon models make much sense as they probe your private information.

If a computer predictive system is going to think that somebody is committing credit card fraud because they purchase two cups of coffee at the same national restaurant chain in a day, we are in big trouble.

The bottom line is that Big Data models are going to have to be regulated before some idiot accidentally turns on Sky Net.

Or maybe the problem is that the NSA or FBI or CIA or Google has done it already.


Irv Oii is known to many international news organizations and researchers as a star data journalist. Being a home worker (although home may be the UK, Ohio, the Middle East, Central Africa, Hong Kong, or Antartica) and a fairly reclusive person, nobody seems to have met Irv. Some speculate that he might be a Jewish Asian-American. Others believe Irv is short for Irvelina, a Russian immigrant physician who went to Ohio (or was it Ojai, California) when the Soviet science programs collapsed and turned into the lower funded Russian collaborative efforts with the EU and USA. The collapse of the Soviet Union resulted in the closing of her laboratory in Minsk. Some even think Irv Oii is an acronym.

Irv is thus an enigma and no pictures of her/him seem to exist. An artist’s conception (mine) based on the writings and consultations of Irv Oii on healthcare breakthroughs is shown below. My belief is that a portrait of Irv should hang over the desk of every data journalist and researcher.

Please click the image to zoom.

Irv Oii

Click on mind map to expand.

academia and  healthcare  big data