Skip to content

Latest commit

 

History

History
154 lines (115 loc) · 4.59 KB

README.md

File metadata and controls

154 lines (115 loc) · 4.59 KB

RunBugRun

[!WARNING]
RunBugRun is currently being revisied. The split will change and should not be relied on.

What is RunBugRun

RunBugRun is an APR dataset of ~450'000 executable buggy/fixed pairs of short programs taken from IBM Project CodeNet written in 8 languages (C++, C, Python, Java, Ruby, JavaScript, Go, PHP).

It can be used to evaluate APR tools, that is, tools that automatically find and repair bugs in source code.

RunBugRun comes with tests, bug labels and infrastructure to execute programs. In order to warrant safe execution it uses Bubblewrap as a sandbox.

RunBugRun has pre-defined training, validation and test sets. APR tools can use the training set as they please. For evaluation, they are given a test set of buggy programs that do not pass all tests. A tool's performance is measured as the percentage of programs that the tool can fix in such a way that it passes all tests.

Installation

Data

RunBugRun's data can be downloaded in the form of gzipped JSONL files from here or downloaded directly with the rbugr utility.

rbugr

As of today, we only support Ubuntu 22.04. For other distributions, please open an issue. The rbugr utility is written in Ruby. You'll need a recent version of Ruby (3.1) on your system (installed e.g. through rbenv). In addition to a Ruby to run the utility, you'll need a Ruby to run Ruby submission programs. Here version 3.0, the version packaged by Ubuntu, is sufficient.

Prerequisities

Use the following to install the compilers/interpreters needed to run submission programs:

$ apt-get install php-cli nodejs gcc g++ default-jdk ruby python3 golang-go bubblewrap

Installation

In order to install the utility itself do:

$ git clone https://github.com/giganticode/run_bug_run.git
$ cd https://github.com/giganticode/run_bug_run.git
$ gem install bundler
$ bundle install

Usage

Download data

The rbugr helper utility can be used to manage dataset versions, obtain information on bugs, run bugs or evaluate the entire test set. To download the RunBugRun data at a particular version use:

$ bundle exec rbugr download 0.0.1

Sanity Check

It is advised to do a sanity check of your setup by evaluating the fixed program versions.

$ bundle exec rbugr eval --fixed --output-filename=sanity_check.json.gz

Showing Bug Information

To show information on a particular bug use

$ bundle exec rbugr bugs show BUG_ID

For instance:

$ bundle exec rbugr bugs show 4229

will give:

{
  "id": 4299,
  "language": "ruby",
  "problem_id": "p00000",
  "change_count": 1,
  "labels": [
    "call.function.change",
    "io.output.change"
  ]
}

Printing a Diff

$ bundle exec rbugr bugs diff 42290

 #include <bits/stdc++.h>
 
 using namespace std;
 
 #define int long long
 #define N 100005
 
 int n, m, a, b, c, cnt = 0, from[N], to[N], f, t, s = 1, e = 1;
 int ans = 0;
 
 signed main() {
   ios_base::sync_with_stdio(0);
   cin >> n >> m;
   cin >> t;
   for (int i = 1; i < m; ++i) {
     f = t;
     cin >> t;
     from[i] = min(f, t);
     to[i] = max(f, t);
   }
-  sort(from + 1, from + m - 1);
-  sort(to + 1, to + m - 1);
+  sort(from + 1, from + m);
+  sort(to + 1, to + m);
   for (int i = 1; i < n; ++i) {
     cin >> a >> b >> c;
     while (i == from[s]) {
       cnt++;
       s++;
     }
     while (i == to[e]) {
       cnt--;
       e++;
     }
     ans += min(a * cnt, c + b * cnt);
   }
+
   cout << ans << "\n";
 }

Evaluation

In order to evaluate your tool's output use the following

$ bundle exec rbugr eval PATH_TO_OUTPUT --output-filename=PATH_TO_EVAL_FILE

where PATH_TO_OUTPUT should point to your tool's output file. This file should be a JSONL file in the following format:

{id: BUG_ID, preds: [FIX_CANDIDATE_CODE1, FIX_CANDIDATE_CODE2, ...]}
...

Analysis of Evaluation

Once evaluated you can use rbugr analyze to calculate various evaluation metrics. For instance:

$ bundle exec rbugr analyze PATH_TO_EVAL_FILE

You can use --by-language to get a per-language break-down of performance.