Without accurate data it is all just mushed together by the analyst (or computer program of the week).
Posts tagged Big Data
FU, Big Data Pimp.
Big Data (in service to the NSA) wants to be able to document what you do and when and where and with whom. All of the current databases that companies and public agencies maintain can now be tightly linked to get a pretty good profile of any individual.
But, these models of what people will do when you ask them to buy a DVD of Thor 2 or a suit from Brooks Brothers, are actually fairly dumb brute force computer algorithms that break down when certain types of problematic data are fed into them.
Hhhhmmm. Some thoughts below in the mind map. Click the image twice for a full expansion.
A Way to Kickstart the Development of Effective Treatments for Rare Diseases without Taking Needed Resources from Research on Diseases that Affect Many
The only way I see to develop effective medical treatments and care models for many of the thousands of rare diseases is to pool the RESEARCH resources that individual countries are spending and the data countries are collecting about individual rare diseases and put those research resources under international control for prioritizing research agenda and ensuring public access to ALL results and research data.
Yes, I know the USA (probably the largest resource contributor) Congress will go in front of the television cameras and say that the failure of the United Nations and the disproportionate contributions to a pooled resource fund will ensure failure. They will point to the failure of the world to effectively coordinate collaborative research on HIV/AIDS and point to politics, homophobia, disrespect, and the hatred of American politics by certain national and fundamentalist groups and say we would be wasting our money by letting Africans and Arabs and the Russians and Chinese and Indians and Asians and South Americans collaborate with the USA on research and ensuring that research leads to effective treatments for at least some rare diseases.
Enough already. Let’s rise to the occasion of solving resource limitations in studying rare diseases and get an effective mechanism in place for expanding the impact of admittedly small research efforts by individual countries through international cooperation. I trust the governments of the world to collaborate, contribute as they can, and help us start to get some of these diseases treatable. Disease knows no boundaries.
In the last century we collectively developed very advanced medical research techniques. In this century we need to use these methods to solve all of the medical problems possible by putting aside the nonsense politics and nationalism and individual egos and predatory profits and focus on solving many medical issues and ensuring access to effective treatment world wide.
Here’s a way to start. Any yes, this is a test of our humanity and commitment to universal human rights of which medical treatment is but one. But let’s start somewhere that should be relatively easy to agree on (and let a few hundred angry politicians in the USA know that the world considers them bratty children and cannot tolerate their obstructionist and oppositional behavior).
Click on the image to expand. And let’s start the process of collaboration.
- Rare Disease Treatments On The Rise: Will Big Pharma’s Focus On Orphan Drugs Benefit Us All? (medicaldaily.com)
- More Than 450 Innovative Medicines in Development for Rare Diseases (hispanicbusiness.com)
The design has historically been considered the best way to “prove” that new medical interventions work, especially if the experiment is replicated a number of times by different research teams. By the double blind (neither the treating medical team nor the patient know whether the patient is taking a placebo or active medication) design, investigators expect to negate the placebo effects caused by patient or medical staff beliefs that the “blue pill” is working.
A key part of virtually all double-blind research designs is the assumption that all patient expectations and reports are independent. This assumption is made because of the statistical requirements necessary to determine whether a drug has had a “significantly larger effect” as compared to a placebo. Making this assumption has been a “standard research design” feature since long before I was born more than 60 years ago.
Google the name of a new drug in clinical trials. You will find many (hundreds, thousands) of posts on blogs, bulletin boards for people with the conditions being treated with the experimental drug, and social media, especially Twitter and Facebook. Early in most clinical trials participants start to post and question one another about their presumed active treatment or placebo status and whether those who guess they are in the experimental condition think the drug is working or not. Since the treatments are of interest to many people world-wide who are not being treated with effective pharmaceuticals, the interest is much greater than just among those in the study.
Google the name of a new drug being suggested for the treatment of a rare or orphan disease that has had no effective treatments to date and you will find this phenomenon particularly prevalent for both patients and caregivers. Hope springs eternal (which it SHOULD) but it also can effect the research design. Obviously data that are “self reported” from patient or caregiver questionnaires can be affected by Internet “the guy in Wyoming says” or the caregiver of “the woman in Florida.”
OK you say, but medical laboratory tests and clinical observations will not be affected because these indices cannot be changed by patient belief they are in the experimental or placebo conditions. Hhmmm, Sam in Seattle just posted that he thinks that he in the experimental condition and that his “saved my life” treatment works especially well if you walk 90 minutes a day or take a specific diet supplement or have a berry-and-cream diet. Mary in Maine blogs the observation that her treatment is not working so she must be in the placebo condition and becomes very depressed and subsequently makes a lot of changes in her lifestyle, often forgetting to take the other medications she reported using daily before the placebo or experimental assignment was made.
Do we have research designs for the amount of research participant visible (blogs, tweets, bulletin boards) and invisible (email, phone) communication going on during a clinical trial? No. Does this communication make a difference in what the statistical tests of efficacy will report? Probably. And can we ever track the invisible communications going on by email? Note that patients who do not wish to disclose their medical status will be more likely to use “private” email than the public blog and bulletin board methods.
Want an example. Google davunetide. This was supposed to be a miracle drug for the very rare neurodegenerative condition PSP. The company (Allon) that developed the drug received huge tax incentives in the USA to potentially market an effective drug for a neglected condition. The company, of course, was well aware that after getting huge tax incentives to develop the pharmaceutical, if the drug were to prove effective in reducing cognitive problems (as was thought), it would then be used with the much more common (and lucrative from the standpoint of Big Pharma) neurodegenerative disorders (Alzheimer’s, Parkinson’s) and schizophrenia.
Patients scrambled to get into the trial because an experimental medication was better than no medication (as was assumed, although not necessarily true) and the odds were 50/50 of getting the active pills.
Patients and caregivers communicated for more than a year, with the conversations involving patients from around the world. In my opinion, the communications probably increased the placebo effect, although I have no data nor statistical tests of “prove” this and it is pure conjecture on my part.
The trial failed miserably. Interestingly, within a few weeks after announcing the results, the senior investigators who developed and tested the treatment had left the employ of Allon. Immediately after the release of the results, clinical trial participants (the caregivers more than the patients) started trading stories on the Internet.
Time for getting our thinking hats on. I worked on methodological problems like this for 30+ years, and I have no solution, nor do I think this problem is going to be solved by any individual. Teams of #medical, #behavioral, #communication, and #statistical professionals need to be formed if we want to be able to accurately assess the effects of a new medication.
Click on the image to expand.
an easy way for Big Data to do BIG Good (applying financial fraud algorithms to predicting elder financial abuse)
Banks and online merchants use fairly sophisticated algorithms to identify probable cases of financial fraud and then protect themselves from the consequences of lost or stolen credit cards, etc. One of the most prevalent forms of elder abuse is financial. Aging adults are attacked by predators trying to get them to refinance their homes with reverse mortgages at exorbitant rates; make huge gifts for “kindness” from strangers; and one scheme after another. Sadly, much of the financial abuse is perpetrated by family members. And predatory financial scams are often targeted at aging immigrants to the US. Instead of just checking credit card records for fraud so as to protect themselves from liability, banks could use the same types of algorithms to scan withdrawals from savings and brokerage accounts as well as charges to credit cards to determine if they are atypically large for someone in their 80s. (At least in California) Banks are mandated reporters (to law enforcement) of suspected financial abuse of elders. Wouldn’t it be nice if banks used the algorithms they already use to protect themselves (at the expense of your privacy) to at least protect older individuals (at a loss of the privacy they already gave up when they opened accounts) from the scum who try to separate cognitively impaired or depressed seniors from their lifetime savings? Wouldn’t that be nice …..
A few thoughts about the importance of knowing the theories and prior studies in the content area of the modeling and data collection and data analysis and generation of conclusions.
You can’t model data without knowing what the data mean.
Click on mind map to expand.
We have had many data science fields in the past 50 years. Among others, the fields include applied statistics, biostatistics, psychometrics, quantitative psychology, econometrics, sociometrics, epidemiology, and many others. The new emphasis on data science ignores content knowledge about the data and their limitations and the permissible conclusions.
We do not need to replace a round wheel with a square one.
a HubaMap™ by g j huba phd
Dec 13 2013: I have been experimenting with some formatting. This is the same map content as above, but using iMindMap 7 which was recently released.
Can Big Data/Data Science avoid the train wreck of Big Pharma? I believe that the Big Data disaster will make the Big Pharma issues seem small in comparison.
But the issues will be about the same. A lot of the Big Pharma execs have become quite skilled at “beating the system” using “undocumented science” and many will move to Big Data and employ all of their very “best” moves and tricks. Big Data/Data Science has the potential to hurt the average individual even more than the greediness of Big Pharma.
HubaMap™ by g j huba phd
Created by the Gapminder Foundation: http://www.gapminder.org
Creative Commons [public domain] subject to crediting the Gapminder Foundation.
Click on image TWICE to fully expand.
Divvy is a wonderful free data visualizer program for the Mac. The program permits a number of data reductions using highly informative transformations, cluster analysis, and plots.
Indispensable for exploring data. Free. VERY fast.
Click on images to expand.
This afternoon I went to the local Panera and paid by credit card. My bank declined my charge of $4.82. I figured it was the magnetic strip on the card which had failed or that the new trainee using the cash register may have made a mistake. She ran the card three more times and it was rejected. Then I got four text messages from the bank saying that they are rejected my charges. To text me, they used my phone number.
I called. They had put a hold on my card because they had some questions about my charges from the prior few days. The red flag event was that I had made an earlier charge of $9.65 at Panera about eight hours before. Their computer program was not smart enough to figure out that it was not unreasonable for someone to have breakfast at 6:30am at a Panera in Durham and then walk into a Panera in Chapel Hill later in the day with 30 minutes to kill and had a coffee (and a Danish I probably should not have had) while I played with my iPad on their free wireless connection. The computer also questioned the $1 charge at a gas station this afternoon (which the human representative immediately recognized as the established practice of gas stations opening charge lines with their automated payment systems of $1 when you swipe your card and then next day putting a $92 charge on the card for filling the tank). I was also asked if the payment made on the account was one I had made (I asked the customer service rep if she thought that if someone had paid a bill for me that I would tell her it was an erroneous transaction and she laughed for a long time) as well as a $71 charge to a software company outside the US.
They had freaked out because they could not reach me by phone at three numbers that were old ones not active (I know they have my current number because they sent me texts at it and same bank sometimes calls about my other accounts at the cell phone I never turn off and which has a voice mailbox). Of course, if they did not have a no reply text address, I could have responded to the four texts they sent.
Predictive models have been around for a decade or more in banks as they attempt to identify fraud and protect themselves. The episodes I have with my bank about every 2-3 months illustrate what happens when somebody blindly runs predictive analytic programs through big datasets without using some commonsense to guide the modeling process. Just because anyone can buy a $100,000 program from IBM or others for developing predictive analytics does not mean that the model that comes out of the Big Data and expensive program makes any sense at all.
Or that the NSA or FBI or CIA or Google or Amazon models make much sense as they probe your private information.
If a computer predictive system is going to think that somebody is committing credit card fraud because they purchase two cups of coffee at the same national restaurant chain in a day, we are in big trouble.
The bottom line is that Big Data models are going to have to be regulated before some idiot accidentally turns on Sky Net.
Or maybe the problem is that the NSA or FBI or CIA or Google has done it already.
No, I haven’t lost “it” and this is not a science fiction movie.
With the unleashing of big data, big computing, big temptations, and big greed, it is going to be real tempting to develop a George Huba (or Bill Smith or Mary Doe or heaven forbid, a George Bush) computer model that can fairly accurately predict from my lifetime experiences whether I will buy a new car next year (and what type and in which cost range and maybe from which car dealer), purchase or sell a home, shift from converse to adidas sneakers, become emotionally distressed if I do not have chocolate, and purchase Apple stock. Or run a simulation of me as the CEO of a particular company to determine if I get the job. Or look at my medical history and determine whether it is likely that my grandchildren will have each of 20 expensive diseases that no insurer wants to touch.
Already the IRS runs programs to estimate the likelihood I cheated on my income taxes, Amazon runs programs to estimate the likelihood I will purchase certain books and socks before or after the December holidays, and my credit card company runs models to determine whether it is likely or not that I purchased shoes while on a business trip (yup, they once froze my credit card because their computer model says I only buy sneakers).
OK, so the accuracy of the big data scientists is only something like 20-50% now. What do you think it will be when your book purchasing history is integrated with your job history, income, ice cream purchases, pharmaceutical purchases, and BMI? And then fine tuned with the grade you got in college English, chemistry, or psychology; whether you had a hiking or a beach vacation; if you purchased (used) sunscreen and had a history of purchasing sun hats; the diseases that all four of your grandparents and parents had at different times in their lifetimes. And whether your car is more than 3 years old. And what do you think it will be when we create a generation of data scientists willing to capitalize on huge data to build such models for salaries that will approach those of professional athletes and rock stars?
Ten years from now, the computer models produced of selected individuals will make Mark Zuckerberg, the Google guys, and Jeff Bezos look like rank amateurs in profiling.
[Oh, and by the way while writing this post Google knows that I looked up Mark Zuckerberg’s name and the spelling of adidas.]
I want to tell anyone that wants to develop a mathematical, computer model of me (or my behavior, beliefs, attitudes, skills, history, and future intentions) to cease and desist. Or fuck off.
Which raises the questions … Do I own the copyright (patent, trademark) to my own life? [And if I do, what are the limits and will violations of those laws by a number of countries be ignored?]
This is not so far-fetched. I spent my whole life becoming the person I am. Does anybody have the right to take all of the big data about me and distill my life down to formulae and algorithms that will explain my past and current behavior and predict what I will do in the future? Should people be allowed to model individuals, I fear that the suicide rate will go up dramatically as people find out how much these models can be used to control them.
As a psychologist, I spent my career studying people so that we might better understand their fears and concerns, help them better use their full potential, become happier, control their own aggressive or violent tendencies, and generally become the people THEY WANTED to be. And I, and no other ethical psychologist, struck out with the intent to model the behaviors of others so well that the resulting models could be sold to governments and corporations.
Big and huge data, data scientists, companies, and governments need to be prohibited from violating the rights of individuals to “own” their individual lives. If we ever let others “own” our individual identities, we will have crossed into new territory from which there is no return. The technology is almost there to create such individual mathematical models.
I was endowed by my creator to own the copyright, patents, and trademarks of my own life… and to answer for what I chose to do with that intellectual property (free will). I choose not to sell my soul to the devil.
A few more thoughts are in the mind map below.
Click on the diagram to zoom.
Irv Oii is known to many international news organizations and researchers as a star data journalist. Being a home worker (although home may be the UK, Ohio, the Middle East, Central Africa, Hong Kong, or Antartica) and a fairly reclusive person, nobody seems to have met Irv. Some speculate that he might be a Jewish Asian-American. Others believe Irv is short for Irvelina, a Russian immigrant physician who went to Ohio (or was it Ojai, California) when the Soviet science programs collapsed and turned into the lower funded Russian collaborative efforts with the EU and USA. The collapse of the Soviet Union resulted in the closing of her laboratory in Minsk. Some even think Irv Oii is an acronym.
Irv is thus an enigma and no pictures of her/him seem to exist. An artist’s conception (mine) based on the writings and consultations of Irv Oii on healthcare breakthroughs is shown below. My belief is that a portrait of Irv should hang over the desk of every data journalist and researcher.
Please click the image to zoom.
Click on mind map to expand.