Frequently Asked Questions

Who are you?
We are Christian Collberg and Todd Proebsting from the Department of Computer Science at the University of Arizona.
Why are you doing this?
In order to repeat, reproduce, or extend published computer science research, we need, at the very least, access to the code and data (the research artifacts) that were used in collecting the results of the original paper. Experience has shown that gaining access to research artifacts can be difficult, particularly when a few years have passed since publication. A related problem is being able to contact the authors of a paper to enquire about some issue, given that email addresses change over time. We want to encourage sharing in the computer science research community. Please read our CACM paper and technical report which describe our previous experience with sharing of research artifacts.
What are the goals of this work?
1. To be the go-to point where the public can locate the artifacts related to a paper they are reading;
2. To encourage researchers to share their artifacts by showing how common sharing is in their communities;
3. To collect longitudinal data of artifact sharing trends for the benefit of those with a vested interest in sharing, repeatability, and reproducibility, such as funding agencies and the tax-paying public.
How can I help?
If you are interested in your favorite conference being included on FindResearch.org and you are willing to help, please get in touch with us at info@findresearch.org. The major part of the manual work we have to do is verifying email addresses found in papers, and chasing down missing addresses. We have a system set up that makes this easy, albeit tedious, work, and we would love to get help doing this so that we can expand our coverage.
How does your system work?
1. When a new conference proceedings is published, we semi-automatically extract bibliographical, funding, and artifact information for each paper;
2. We contact each author by email (see sample), asking them to correct and complete the information we have collected;
3. The authors fill out a survey (see sample); the author can fill out the survey multiple times, in case their information changes over time;
4. The information is added to our website;
5. We compute and display statistical sharing trends as they evolve over time.
What information do you collect about an author?
Please see our Privacy Policy.
What does it mean for information to be verified?
When an author clicks "Verify" in the survey (see sample), the status of their paper changes from "not verified" to "verified" and this is displayed on our website. Thus, "Authors have verified information" means that (at least one of) the authors visited the survey, possibly made changes, and clicked "Verify". The status "Authors have not verified information" could mean that
- None of the authors received our emails and thus were unable to verify the information;
- The authors received our emails but declined to visit the survey;
- The authors visited the survey, but none of them clicked the "Verify" button.
How do you get access to your data?
We use DBLP to get lists of published papers. We get email addresses of authors from a variety of sources, mainly through semi-automatic extraction from the paper PDFs themselves, but also by scanning NSF's database of funded grant abstracts, and, when all else fails, by searching for the authors online.
Who is funding you?
We have received a 5-year grant for $357,568 from a private foundation. If you like what we're doing, consider helping us out with additional funding.
Do you have IRB approval?
Yes, we have University of Arizona Institutional Review Board (IRB) approval.
Do you share your code and data?
We have a snapshot of both the code and the publicly available data for download.
Can you please remove my paper from your website?
No, we will not do that.
I disagree with a particular comment in the discussion about my paper; can you please remove it?
We will remove comments only if they are libelous.
Why are some conferences and papers in gray text?
We ask all authors to verify the information we have about their papers, in particular the location of any supporting artifacts. We send out emails to authors asking them to update or verify this information incrementally, one conference at a time. For conferences where these emails have yet to be sent out we present the information in gray text.
What, exactly, do you mean by a research artifact?
A research artifact is any code or data produced in the research that leads to the publication of a research paper which did not make it into the published paper itself, but which may be useful when assessing or extending the work.
Why should I share my research artifacts?
Your colleagues may need access to an artifact for many different reasons. They may want to read the source code to better understand your paper. They may want to rerun the experiments you present in your paper to ensure you got things right. They may want to build upon and extend your work. They may want to compare your results to their own. They may want to run different experiments on your code, or run the same experiments you ran but on different data sets. Thus, it is important to share all the code, external libraries, makefiles, instructions for proof assistants, scripts to run experiments, installation instructions—anything your colleagues may need to better understand or build upon your work.
Why should I share links to my artifacts on FindResearch.org? Isn't it enough that I have the code in my public GitHub repository?
Putting all your code on a public repository is certainly a good thing, but unless you are careful, it may not be enough to support repeatable work. Artifacts include not only source code, but also the experiments you ran for the paper, the exact versions of the external libraries your code relied on, etc. If you put all these in your repository, and add proper tags corresponding to every paper you publish, you may be OK! Even so, sharing the link to your GitHub repository on FindResearch.org will make it easier for your colleagues to find the exact version of your codebase that corresponds to a particular published work.
To facilitate repeatability, what, exactly, should I share in the link I publish on FindResearch.org?
To help your colleagues assess, repeat, and reproduce your published research, we recommend that you provide a link to a permanent package (zip-file, virtual machine, container, etc.) comprising all the sources, makefiles, external libraries, data sets, and experimental setups that went into producing the results reported on in the final, published, paper. Alternatively, you can provide a link to a "tagged" version of a public repository that corresponds to the work reported on in your paper, as long as the repository contains all the code and data needed to repeat the experiments.

Regardless, keep in mind that code changes all the time—not only yours, but the external libraries your code depends on. To support repeatable work, it is essential that you make available all the code necessary to build your system. Concretely, instead of assuming that, years from now, a colleague will be able to apt-get a particular version of a particular library, it is much more helpful to include that library version with your packaged artifact.

Additionally, for any software you rely on but do not include, make sure you precisely document which version you used and where it can be found: "We used gcc version 4.2.1 and library abclib version 1.2.3 which can be downloaded from http://abclib.org." Computer scientists are notorious for overloading and there may be multiple distinct packages named the same, so it is important to be precise. (For entertainment, see how many computing projects you can find that have been named "Occam".)
To help my colleagues build upon my work, what should I share in the link I publish on FindResearch.org?
In addition to a link to a package of the artifact versions that correspond to a particular published paper, you may also want to include a link to an evolving repository, such as GitHub. Such a link ensures that your colleagues have access to the latest developments, bug fixes, etc. that may have been made subsequent to the publication of your paper.
Can I update the information about my artifacts on FindResearch.org at a later time?
Yes! Be sure to hold on to the email (see sample) we sent you. You can revisit the link in this email as often as you want and update your information as it becomes available.
I'm not able to share all my code. Should I just say "code not available" on FindResearch.org?
It is important not to let the perfect become the enemy of the good. We have seen many situations where researchers had legitimate reasons not to share all their artifacts, including licensing issues, privacy issues, disk crashes, etc. (See Section 4.3 of our technical report for a complete list.) That, however, should not stop you from sharing what you do have. The ultimate goal is to help your colleagues assess and build upon your research, and anything you can provide towards that effort may be useful.