On my first day at the Erice School of Science Journalism this past week I attended a lecture by Alessio Cimarelli, who discussed "When Data Journalism meets Science: a "Hackathon"". The speaker (who owns the site called "dataninja") showed several examples of how to mine the web to construct databases and display results on several topics. It was quite interesting to see the techniques he used, but I felt compelled to interrupt him at some point, in the interest of the school participants.
The fact is, he was showing his results as if they were accurate measurements of the researched feature, while for a journalist it should be very important to be able to keep a critical attitude toward whatever "result" is discussed - especially those one can extract by oneself by automatic means.
I took his web page where he was showing data for migrants killed in trying to reach Europe, and in a few clicks I got to see some of the original "data" entries which were the basis of the map his software had produced.
The third one I browsed was a newspaper article on the French magazine "Liberation", who discussed how a migrant had slipped on a stone while washing himself on the beach, and died.
So I could point out to the speaker - and to the audience - how as a scientist I have a reverence for the data, and I pay extreme care to avoid any spurious entries in a dataset I use for some analysis. The Liberation piece, included in the automated search for "migrant killed", was an example of how an automated search collecting data for killed migrants was liable to produce a biased result and how spurious data could easily make it into the analysis.
In the end, it boils down to the fact that a scientist values (or should value) more the error bar around an estimate than the estimate itself, as an estimate without error bar is more useless (and potentially deceiving) than an error bar without a central value: the latter tells at least something precise about the accuracy of the measurement, while the former says nothing at all.
I do not know whether my point was understood by the audience - I played the arrogant scientist, and I know I did not excel in sympathy when I do that. But it was on purpose: if they drove home the fact that they should be more skeptical of what is erroneously or deceivingly called "raw data" by their peer (or even by scientists who should move on to some more suitable occupations in their lives) I did not waste my listeners' time.
A couple of days later, it was the turn of Ayelet Baram Tsabari, who discussed "Using the web to analyse and increase people's interest in science". She discussed in detail how "Google trends" can be used to extract information on the interest of internet browsers in scientific topics based on their search terms and the graphs that the site provides. It was again my turn to play the hard-nosed professor of statistics as I interrupted her when she was showing one of those trend graphs, which had a large peak in coincidence with an important news event, and then a small secondary peak very close to it.
Although the proposed explanation for the secondary peak looked quite plausible, I felt compelled to explain that since Google trends only provides relative frequency graphs - the original absolute numbers are hidden - one can hardly associate an error bar to the graph, and thus a "feature" one observes in the graph cannot, in general, be taken as proof that something particular has occurred causing people to search for a particular term.
(By the way, can anybody guess what is the search used in google trends to produce the graph above?)
My comment was not understood by the speaker, who referred to the graph as "the data" while I tried to explain to her that it was not raw data but a statistic (a statistic is a function of the data). She insisted that it was not a statistic, and I decided that her punishment would be to have her live with her ignorance... But I believe the discussion did allow the listeners to bring home the point - which was again the former one: data without an error estimate can only be taken as qualitative indications, and prove nothing in general.
- PHYSICAL SCIENCES
- EARTH SCIENCES
- LIFE SCIENCES
- SOCIAL SCIENCES
Subscribe to the newsletter
Stay in touch with the scientific world!
Know Science And Want To Write?
- Wait, Let's Not Rush To Be Multiplanetary Or Interstellar - A Comment On Elon Musk's Vision
- Bizarre Forelimb Evolution In Ancient Drepanosaurus Fossil
- Paleo: In A Clinical Trial, Bread Made With Ancient Grains Could Benefit Heart Health
- Ground Squirrels Use The Sun To Hide Food
- Study Explains Mechanisms Behind Glioblastoma Influence On The Immune System
- A Book By Guido Tonelli
- 42 Million Years: Central Asia Is Used To Westerly Winds By Now
- "Oh, that's a really interesting point, hadn't thought of that. I agree! Perhaps the most advanced..."
- "Yes that's what I'm saying :). If you want more reassurance do message me on quora, quite a few..."
- "Fantastic article! One thought on the Kardashev scale though, it ignores energy efficiency. It..."
- "So, Irish-catholic ghettos in Belfast in 60ies through 80ies would be a perfect historical example..."
- "Why to fear of nibiru when we know that we all have to die someday Live Big not Long..."
- Kratom Is A Drug By Any Measure - Treat It Like One
- San Francisco Soda Tax: A Feel-Good Policy Based on Junk Science
- Watson and Crick did not discover DNA
- Is Parenting Kids of Human and Canine Persuasion the Same? Yes!
- Diabetes: MiniMed 670G Hybrid Closed-Loop Insulin System Is A Waypoint To An Artificial Pancreas
- Celebrate Oktoberfest with Beer Chemistry
- 51 U.S. House Members Urges DEA To Delay "Hasty" Ban On Natural Herbal Supplement Kratom
- Women are a quarter of the 1 percent
- Wetlands and agriculture, not fossil fuels, behind the global rise in methane
- Mass immigration is correlated to higher levels of crime, but not causal
- How would you like a kitchen surface that cleans itself?