We are Christian Collberg and Todd Proebsting from the Department of Computer Science at the University of Arizona.
In order to repeat, reproduce, or extend published computer science research, we need, at the very least, access to the code and data (the research artifacts) that were used in collecting the results of the original paper. Experience has shown that gaining access to research artifacts can be difficult, particularly when a few years have passed since publication. A related problem is being able to contact the authors of a paper to enquire about some issue, given that email addresses change over time. We want to encourage sharing in the computer science research community. Please read our CACM paper and technical report which describe our previous experience with sharing of research artifacts.
If you are interested in your favorite conference being included on FindResearch.org and you are willing to help, please get in touch with us at email@example.com. The major part of the manual work we have to do is verifying email addresses found in papers, and chasing down missing addresses. We have a system set up that makes this easy, albeit tedious, work, and we would love to get help doing this so that we can expand our coverage.
When an author clicks "Verify" in the survey (see sample), the status of their paper changes from "not verified" to "verified" and this is displayed on our website. Thus, "Authors have verified information" means that (at least one of) the authors visited the survey, possibly made changes, and clicked "Verify". The status "Authors have not verified information" could mean that
We use DBLP to get lists of published papers. We get email addresses of authors from a variety of sources, mainly through semi-automatic extraction from the paper PDFs themselves, but also by scanning NSF's database of funded grant abstracts, and, when all else fails, by searching for the authors online.
We have received a 5-year grant for $357,568 from a private foundation. If you like what we're doing, consider helping us out with additional funding.
Yes, we have University of Arizona Institutional Review Board (IRB) approval.
No, we will not do that.
We will remove comments only if they are libelous.
We ask all authors to verify the information we have about their papers, in particular the location of any supporting artifacts. We send out emails to authors asking them to update or verify this information incrementally, one conference at a time. For conferences where these emails have yet to be sent out we present the information in gray text.
A research artifact is any code or data produced in the research that leads to the publication of a research paper which did not make it into the published paper itself, but which may be useful when assessing or extending the work.
Your colleagues may need access to an artifact for many different reasons. They may want to read the source code to better understand your paper. They may want to rerun the experiments you present in your paper to ensure you got things right. They may want to build upon and extend your work. They may want to compare your results to their own. They may want to run different experiments on your code, or run the same experiments you ran but on different data sets. Thus, it is important to share all the code, external libraries, makefiles, instructions for proof assistants, scripts to run experiments, installation instructions—anything your colleagues may need to better understand or build upon your work.
Putting all your code on a public repository is certainly a good thing, but unless you are careful, it may not be enough to support repeatable work. Artifacts include not only source code, but also the experiments you ran for the paper, the exact versions of the external libraries your code relied on, etc. If you put all these in your repository, and add proper tags corresponding to every paper you publish, you may be OK! Even so, sharing the link to your GitHub repository on FindResearch.org will make it easier for your colleagues to find the exact version of your codebase that corresponds to a particular published work.
To help your colleagues assess, repeat, and reproduce your published research, we recommend that you provide a link to a permanent package (zip-file, virtual machine, container, etc.) comprising all the sources, makefiles, external libraries, data sets, and experimental setups that went into producing the results reported on in the final, published, paper. Alternatively, you can provide a link to a "tagged" version of a public repository that corresponds to the work reported on in your paper, as long as the repository contains all the code and data needed to repeat the experiments.
Regardless, keep in mind that code changes all the time—not only yours, but the external libraries your code depends on. To support
repeatable work, it is essential that you make available all the code necessary to build your system. Concretely, instead of
assuming that, years from now, a colleague will be able to
apt-get a particular version of a particular library, it is much more
helpful to include that library version with your packaged artifact.
Additionally, for any software you rely on but do not include, make sure you precisely document which version you used and where it can be found: "We used gcc version 4.2.1 and library abclib version 1.2.3 which can be downloaded from http://abclib.org." Computer scientists are notorious for overloading and there may be multiple distinct packages named the same, so it is important to be precise. (For entertainment, see how many computing projects you can find that have been named "Occam".)
In addition to a link to a package of the artifact versions that correspond to a particular published paper, you may also want to include a link to an evolving repository, such as GitHub. Such a link ensures that your colleagues have access to the latest developments, bug fixes, etc. that may have been made subsequent to the publication of your paper.
Yes! Be sure to hold on to the email (see sample) we sent you. You can revisit the link in this email as often as you want and update your information as it becomes available.
It is important not to let the perfect become the enemy of the good. We have seen many situations where researchers had legitimate reasons not to share all their artifacts, including licensing issues, privacy issues, disk crashes, etc. (See Section 4.3 of our technical report for a complete list.) That, however, should not stop you from sharing what you do have. The ultimate goal is to help your colleagues assess and build upon your research, and anything you can provide towards that effort may be useful.