Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed loading data issues & Access restriction #4

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 26 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,38 @@
# join-order-benchmark
# Join-Order-Benchmark

This package contains the Join Order Benchmark (JOB) queries from:
"How Good Are Query Optimizers, Really?"
"[How Good Are Query Optimizers, Really?](http://www.vldb.org/pvldb/vol9/p204-leis.pdf)"
by Viktor Leis, Andrey Gubichev, Atans Mirchev, Peter Boncz, Alfons Kemper, Thomas Neumann
PVLDB Volume 9, No. 3, 2015
http://www.vldb.org/pvldb/vol9/p204-leis.pdf

the csv_files/schematext.sql and queries/*.sql is modified to MySQL syntax.

IMDB Data Set
The `csv_files/schematext.sql` and `queries/*.sql` are modified to MySQL syntax.

The CSV files used in the paper, which are from May 2013, can be found at http://homepages.cwi.nl/~boncz/job/imdb.tgz

### load
## Quick Start

use `load_data.sh` to load data into mysql or TiDB.
1. Get the `imdb` dataset:
```bash
cd csv_files/
wget https://event.cwi.nl/da/job/imdb.tgz
tar -xvzf imdb.tgz
```
2. Create the database `imdbload`:
```bash
csv_files/schema-tidb.sql
```
3. Run the script:
```
./load_data.sh
```
If you encounter any access restrictions, please modify the configuration as follows:
```
mysql --local-infile=1 -h 127.0.0.1 -P 4000 -u root -D imdbload < $sql_file
```

### order problem
## Order Problem

`queries/17b.sql` and `queries/8d.sql` have order problem, because we use diffrent order rule from MySQL, so it is not a real bug.
Please note that `queries/17b.sql` and `queries/8d.sql` may exhibit order issues due to the use of different order rules from MySQL. This is not a real bug.

### use analyze table to make TiDB faster
exec `analyze_table.sql`
## Improving TiDB Performance using Analyze Table
Please execute analyze_table.sql to optimize the TiDB performance.
170 changes: 170 additions & 0 deletions csv_files/schema-tidb.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
CREATE TABLE aka_name (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
person_id INT NOT NULL,
name VARCHAR(255),
imdb_index VARCHAR(3),
name_pcode_cf VARCHAR(11),
name_pcode_nf VARCHAR(11),
surname_pcode VARCHAR(11),
md5sum VARCHAR(65)
);

CREATE TABLE aka_title (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
movie_id INT NOT NULL,
title VARCHAR(255),
imdb_index VARCHAR(4),
kind_id INT NOT NULL,
production_year INT,
phonetic_code VARCHAR(5),
episode_of_id INT,
season_nr INT,
episode_nr INT,
note VARCHAR(72),
md5sum VARCHAR(32)
);

CREATE TABLE cast_info (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
person_id INT NOT NULL,
movie_id INT NOT NULL,
person_role_id INT,
note VARCHAR(255),
nr_order INT,
role_id INT NOT NULL
);

CREATE TABLE char_name (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
imdb_index VARCHAR(2),
imdb_id INT,
name_pcode_nf VARCHAR(5),
surname_pcode VARCHAR(5),
md5sum VARCHAR(32)
);

CREATE TABLE comp_cast_type (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
kind VARCHAR(32) NOT NULL
);

CREATE TABLE company_name (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
country_code VARCHAR(6),
imdb_id INT,
name_pcode_nf VARCHAR(5),
name_pcode_sf VARCHAR(5),
md5sum VARCHAR(32)
);

CREATE TABLE company_type (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
kind VARCHAR(32)
);

CREATE TABLE complete_cast (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
movie_id INT,
subject_id INT NOT NULL,
status_id INT NOT NULL
);

CREATE TABLE info_type (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
info VARCHAR(32) NOT NULL
);

CREATE TABLE keyword (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
keyword VARCHAR(255) NOT NULL,
phonetic_code VARCHAR(5)
);

CREATE TABLE kind_type (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
kind VARCHAR(15)
);

CREATE TABLE link_type (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
link VARCHAR(32) NOT NULL
);

CREATE TABLE movie_companies (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
movie_id INT NOT NULL,
company_id INT NOT NULL,
company_type_id INT NOT NULL,
note VARCHAR(255)
);

CREATE TABLE movie_info_idx (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
movie_id INT NOT NULL,
info_type_id INT NOT NULL,
info VARCHAR(255) NOT NULL,
note VARCHAR(1)
);

CREATE TABLE movie_keyword (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
movie_id INT NOT NULL,
keyword_id INT NOT NULL
);

CREATE TABLE movie_link (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
movie_id INT NOT NULL,
linked_movie_id INT NOT NULL,
link_type_id INT NOT NULL
);

CREATE TABLE name (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
imdb_index VARCHAR(9),
imdb_id INT,
gender VARCHAR(1),
name_pcode_cf VARCHAR(5),
name_pcode_nf VARCHAR(5),
surname_pcode VARCHAR(5),
md5sum VARCHAR(32)
);

CREATE TABLE role_type (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
role VARCHAR(32) NOT NULL
);

CREATE TABLE title (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(255) NOT NULL,
imdb_index VARCHAR(5),
kind_id INT NOT NULL,
production_year INT,
imdb_id INT,
phonetic_code VARCHAR(5),
episode_of_id INT,
season_nr INT,
episode_nr INT,
series_years VARCHAR(49),
md5sum VARCHAR(32)
);

CREATE TABLE movie_info (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
movie_id INT NOT NULL,
info_type_id INT NOT NULL,
info VARCHAR(255) NOT NULL,
note VARCHAR(255)
);

CREATE TABLE person_info (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
person_id INT NOT NULL,
info_type_id INT NOT NULL,
info VARCHAR(255) NOT NULL,
note VARCHAR(255)
);
4 changes: 2 additions & 2 deletions load_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ load_data() {
bname=${csv_file%.*}
sql_file="$bname.sql"
table=${bname#$PREFIX}
sql="LOAD DATA LOCAL INFILE '$csv_file' INTO TABLE $table FIELDS TERMINATED BY ',';"
sql="LOAD DATA LOCAL INFILE '$csv_file' INTO TABLE $table FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"';"
echo $sql > $sql_file
echo $sql_file
{
mysql -h 127.0.0.1 -P 4000 -u root -D imdbload < $sql_file
mysql --local-infile=1 -h 127.0.0.1 -P 4000 -u root -D imdbload < $sql_file
echo >&1000
}&
done
Expand Down
4 changes: 2 additions & 2 deletions load_data_mysql.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ load_data() {
bname=${csv_file%.*}
sql_file="$bname.sql"
table=${bname#$PREFIX}
sql="LOAD DATA LOCAL INFILE '$csv_file' INTO TABLE $table FIELDS TERMINATED BY ',';"
sql="LOAD DATA LOCAL INFILE '$csv_file' INTO TABLE $table FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"';"
echo $sql > $sql_file
echo $sql_file
{
mysql -h 127.0.0.1 -u root -p123456 -D imdbload < $sql_file
mysql --local-infile=1 -h 127.0.0.1 -u root -p123456 -D imdbload < $sql_file
echo >&1000
}&
done
Expand Down