Comments on BishopBlog: Data sharing: Exciting but scary

I was experimenting with LabArchives but am waitin...

2014-07-05T18:34:34.039+01:00

I was experimenting with LabArchives but am waiting until paper is accepted before turning it live. Since then I have been also experimenting with using OpenScienceFramework

Out of curiosity, where did you post the data set ...

2014-07-02T13:10:15.502+01:00

Out of curiosity, where did you post the data set and R code? GitHub?

An example of data sharing at its best: https://ww...

2014-06-14T17:22:06.101+01:00

An example of data sharing at its best: https://www.youtube.com/watch?v=N2zK3sAtr-4

Re : Piketty A rather good (devastating?) respons...

2014-06-01T15:28:53.050+01:00

Re : Piketty

A rather good (devastating?) response by Thomas Piketty to Chris Giles criticisims in the Financial Times.

ineteconomics.org/sites/inet.civicactions.net/files/Piketty2014TechnicalAppendixResponsetoFT.pdf

One of the things that is very impressive is that he appears to have made every scrap of data available on line.

Thanks for directing me to Nate's website. He...

2014-05-28T19:49:57.537+01:00

Thanks for directing me to Nate's website. He makes some good points and I have not yet had a change to read Chris Giles' article as my local university library does not have an electronic version of the FT. I am going to have to track it down in hard copy. It's got to be in the university library somewhere--I'm not likely to be able to buy a copy here in a small city in Canada.

Re Excel. I gave up on it some time ago simply because I don't like the new interface (and I was a beta tester for the Mac version back in the 80's). When I really need a spreadsheet I use Apache Open Office.

I find it's usually a lot faster and easier to just go directly to R for any analyses. Matter of taste I guess although I really don't trust the Excel stats routines. I know of one instance, some years ago, where someone ran a linear regression and ended up with a negative Rsq.

thanks for your comment.Re Excel : I think it has ...

2014-05-28T09:14:25.940+01:00

thanks for your comment.Re Excel : I think it has its uses- I will often do a preliminary quick and dirty look at a dataset in Excel, not least because it is so easy to see data and plots alongside one another. I then do the serious analysis in R or SPSS, but having the Excel version provides a good double check, and I have trapped errors when different approaches give discrepant results.
I am fascinated by the current debate on Piketty - I had only been very vaguely aware of this until someone on Twitter asked if my post was inspired by the Piketty case. I can now see why - very interesting parallels in terms of error detection. I liked this account of the story, which I think is v balanced:
http://fivethirtyeight.com/features/be-skeptical-of-both-piketty-and-his-skeptics/

Speaking of making errors I just wiped out all my ...

2014-05-27T19:06:00.238+01:00

Speaking of making errors I just wiped out all my comments. If the preceeding seems a bit out of context that's because this was supposed to preceed it:

Well done Dr. Bishop.

Yes, it is all too easy to make a mistake. I remember as a graduate students, some (cough) years ago, running a correlation with the result of r = 1. I was quite excited until common sense took hold. One should not correlate line numbers with sequential ID numbers.

Recently in the economics field there seem to have been some rather dismaying data entry and analysis errors--none of which look deliberate but very distressing particularly the Reinhart-Rogoff paper which has had a very significant influence on government policy in many countries. It was several years later (4-5?) before they released the data to a grad student who proceeded to point out a multitude of errors.

Having the data released immediately, probably would have allowed some immediate corrections and damage control rather than having it help set monetary policy for a country like the USA.

I doubt that economists are naturally more prone to these mistake than any other researchers but they have been having a bit of a rough time at the moment (see below). I will point out that Piketty for his book, Capitalism in the 21 Century did publish his data at the same time as the book.

As a pet peeve of mine, it looks like all three examples used Excel as their main analysis tool. I personally feel that a spreadsheet has no place in serious, or even frivolous, data analysis.

The Reinhart-Rogoff error – or how not to Excel at...

2014-05-27T19:03:59.387+01:00

The Reinhart-Rogoff error – or how not to Excel at economics
http://theconversation.com/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646

Richard Tol
Errors in estimates of the aggregate economic impacts of climate change
http://www.lse.ac.uk/GranthamInstitute/Media/Commentary/2014/April/Errors-in-estimates-of-the-aggregate-economic-impacts-of-climate-change-%E2%80%93-Part-II.aspx

Financial Times Finds “Many” Errors in Piketty Analysis, Argues They Undermine His Thesis
http://www.nakedcapitalism.com/2014/05/financial-times-finds-many-errors-piketty-analysis-argues-undermine-thesis.html

Thanks, Mark. My experience exactly. Also think th...

2014-05-27T09:32:40.797+01:00

Thanks, Mark. My experience exactly.
Also think this post by Betsy Levy Paluck is worth a read as a riposte to those who say they don't have time to prepare data for sharing. She agrees that it slows you down, but points out that this is no bad thing:
http://www.betsylevypaluck.com/blog/2014/5/25/what-i-stand-for-in-this-discussion-about-scientific-rigor

Hi Dorothy, Great post! I completely agree that p...

2014-05-27T08:28:50.581+01:00

Hi Dorothy,

Great post! I completely agree that posting data online also improves one's own approach to the analysis. The simple act of preparing the data for someone else to understand is a useful debugging tool, gives you a different perspective and forces you to be a bit more organised/systematic than you might otherwise be. Russ Poldrack made a similar point recently, in a nice post dissecting a coding error he only detected when he shared his analysis scripts: http://www.russpoldrack.org/2013/02/anatomy-of-coding-error.html. Basically, error is inevitable, especially for bespoke script-based analyses, so we really do need another pair of eyes. I am always struck by how much scrutiny we put to the text of a manuscript, send it around to colleagues and co-authors for endless proof-reads, edits, corrections, but rarely show anyone else the original working out of analyses. This must be the wrong way around. Moreover, I worry that error is not random. We are far more likely to double check anomalies that contradict our hypotheses than nice publishable results (biased debugging: http://the-brain-box.blogspot.co.uk/2013/02/biased-debugging.html). Preparing data (and analysis scripts) for public scrutiny is a great way to improve the reliability of research findings.

Mark