Skip to content

API usage manual

Stanislav Vojíř edited this page Jun 5, 2017 · 19 revisions

EasyMinerCenter API - How to use it?

EasyMiner provides complex API for usage of almost all functionality implemented in all modules of the EasyMiner project.

The work using API is fully combinable with the work using UI.

API Location

API is available on the URL {APPLICATION_URL}/api, where {APPLICATION_URL} is the location of EasyMinerCenter.

The API is documented using Swagger (see documentation on our development server)

API KEY

For usage of the API, the API KEY is required. This key is unique for each user. You can find it in the graphical UI:

  1. Go to the {APPLICATION_URL}
  2. Log-in using your user account
  3. Click on the "user image" on the in the top right corner of the screen and use the link Show my profile
  • alternatively, you can go to the site {APPLICATION_URL}/em/user/details
  1. Copy your API KEY

The API key has to be included in all request sent to the API endpoint. It can be attached in the GET param ?apiKey={YOUR_API_KEY} or send in a header: Authorization: ApiKey {YOUR_API_KEY}

Example usage of the API

The following section presents a sample use of the API in Python. The code uses the libraries and constants

import requests,json
import time

API_KEY = ''
API_URL = 'https://br-dev.lmcloud.vse.cz/easyminercenter/api'

The included response data are based on the test file:

Upload data

EasyMiner supports data saved in a CSV file. Small files can be simply send in the POST request using this API. Big files can be uploaded using the UI or using the uploader in the module EasyMiner-Data.

Once the data file is uploaded, it can be used for solving of more data mining tasks.

# for upload, you have to send the CSV file and params "type" (with value limited), "separator" and "encoding", optionally, you can use also the params "enclosure", "escape" or "nullValue"  
headers = {"Accept":"application/json"}
files = {("file", open(CSV_FILE, 'rb'))}
r = requests.post(API_URL+'/datasources?separator=%2C&encoding=utf8&type=limited&apiKey='+API_KEY, files=files,headers=headers)
datasource_id= r.json()["id"]

The response of this request contain details about the uploaded datasource. For next usage, it is necessary to parse at least the "id".

For usage of an existing datasource, you can identify it using the id. The list of available datasources can be loaded using the request:

headers = {"Accept":"application/json"}
r = requests.get(API_URL+'/datasources?apiKey='+API_KEY, headers=headers)

Create miner

"Miner" is an instance of EasyMiner data workspace, based on a selected datasource.

For miner creation, you should send a request like:

headers = {'Content-Type': 'application/json',"Accept":"application/json"}
json_data = json.dumps({"name": "TEST MINER","type": "cloud","datasourceId":datasource_id})
r= requests.post(API_URL+"/miners?apiKey="+API_KEY,headers=headers,data=json_data.encode())
miner_id= r.json()["id"]

Preprocess data (create attributes from data fields)

Currently, the cloud version of EasyMiner supports preprocessing methods each value-one bin, nominal enumeration, interval enumeration, equidistant intervals, equifrequent intervals and equisized interfals. You have to send preprocessing request for each required data field.

The following code reads the list of available data fields and sends the preprocessing requests for them.

headers = {'Content-Type': 'application/json', "Accept": "application/json"}
r = requests.get(API_URL+'/datasources/'+datasource_id+'?apiKey='+API_KEY,headers=headers)
datasource_columns=r.json()['column']
attributed_datafields_map={}

for col in datasource_columns:
  column_name=col["name"]
  json_data = json.dumps({"miner": miner_id,"name": column_name,"columnName": column_name, "specialPreprocessing":"eachOne"})
  r = requests.post(API_URL+"/attributes?apiKey="+API_KEY,headers=headers,data=json_data.encode())
  if r.status_code!=201:
    break; #error occured
  attributed_datafields_map[column_name]=r.json()['name']; #map of created attributes (based on the existing data fields)

Create (define) data mining task

For new task definition it is necessary to input 3 types of parameters:

  • attributes list for antecedent
  • attributes list for consequent
  • threshold values of requested interest measures

For attributes, it is also possible to fix its value.

In the following example, we define a task pattern in the form

city('Prague') AND amount(*) -> rating(*)

with interest measures confidence and support.

json_data = json.dumps({"miner": miner_id,
                        "name": "Test task",
                        "limitHits": 1000,
                        "IMs": [
                            {
                                "name": "CONF",
                                "value": 0.5
                            },
                            {
                                "name": "SUPP",
                                "value": 0.01
                            }
                        ],
                        "antecedent": [
                            {
                                "attribute": "city",
                                "fixedValue": "Prague"
                            },
                            {
                                "attribute": "amount"
                            }
                        ],
                        "consequent": [
                            {
                                "attribute": "rating"
                            }
                        ]
                        })

r = requests.post(API_URL+"/tasks/simple?apiKey="+API_KEY,headers=headers,data=json_data.encode())
print("create task response code:" + str(r.status_code))
task_id = r.json()["id"]
task_id = str(task_id)

If you want to solve a task with only one attribute in consequent, it is possible to generate the task settings automatically. Try the script:

#uses the attributes_datafields_map generated in previous code example

CONSEQUENT_ATTRIBUTE_NAME = "" #setup name of attribute in consequent

antecedent=[]
for attribute_name in attributes_datafields_map.values():
  if attribute_name!=CONSEQUENT_ATTRIBUTE_NAME:
     antecedent.append({"attribute":attribute_name})  

json_data = json.dumps({"miner": miner_id,
                        "name": "Test task",
                        "limitHits": 1000,
                        "IMs": [
                            {
                                "name": "CONF",
                                "value": 0.5
                            },
                            {
                                "name": "SUPP",
                                "value": 0.01
                            }
                        ],
                        "antecedent": antecedent,
                        "consequent": [
                            {
                                "attribute": CONSEQUENT_ATTRIBUTE_NAME
                            }
                        ]
                        })

r = requests.post(API_URL+"/tasks/simple?apiKey="+API_KEY,headers=headers,data=json_data.encode())
print("create task response code:" + str(r.status_code))
task_id = r.json()["id"]

Solve data mining task

For solving of a defined task, you have to start it and then check its state, before it is solved (and results imported)

  #start task
  r = requests.get(API_URL+"/tasks/"+task_id+"/start?apiKey="+API_KEY,headers=headers)
  while True:
      time.sleep(1)
      #check state
      r= requests.get(API_URL+"/tasks/"+task_id+"/state?apiKey="+API_KEY,headers=headers)
      task_state=r.json()["state"]
      print("task_state:"+task_state)
      if task_state=="solved":
        break
      if task_state=="failed":
        print(dataset["filename"] + ": task failed executing")
        break

Read results of data mining task

PMML AssociationModel / GUHA PMML

EasyMiner supports export of data mining results in the form of a GUHA PMML or in the form of standard PMML AssociationModel.

# export of standardized PMML AssociationModel
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=associationmodel&apiKey=' + API_KEY)
pmml = r.text

# export of GUHA PMML
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=guha&apiKey=' + API_KEY)
guha_pmml = r.text

There is available also a visualisation - export in HTML form:

r = requests.get(API_URL + '/tasks/' + task_id + '/html?apiKey=' + API_KEY)
html = r.text

The transformations are available in the GIT repository EasyMiner-XML. There are transformations for both of the supported PMML formats.

Task results in JSON

Found rules can be read (and collected etc.) also using API access. For each rule, there is available simple text form and information about values from the four field table.

You can try the request:

headers = {"Accept": "application/json"}
r = requests.get(API_URL + '/tasks/' + task_id + '/rules?apiKey=' + API_KEY, headers=headers)
rules = r.json()['rules']

for rule in rules:
  a = int(rule['a'])
  b = int(rule['b'])
  c = int(rule['c'])
  d = int(rule['d'])
  confidence = a / (a + b)
  support = a / (a + b + c + d)
  print(rule['text'] + ' | confidence=' + str(confidence) + ', support=' + str(support))

Example found association rule

How to interpret found rules? We can explain it on a single rule from the example output (rule with ID 1668888):

amount(30000.0) & district(Uherske Hradiste) → rating(C)

  confidence = 0.889
  support =  0.012
  a=72, b=9, c=3555, d=2545 

Antecedent and consequent

Rule part Content
antecedent amount(30000.0) & district(Uherske Hradiste)
consequent rating(C)

Antecedent and consequent are conjunctions of attributes with concrete values characterizing records from the analyzed dataset. The relation between antecedent and consequent is characterized using interest measures.

For classification purposes, the antecedent can be identified as condition of the rule, consequent is the result of classification.

Interest measures

Interest measures are calculated from the four field table:

consequent not consequent
antecedent a b
not antecedent c c

with concrete values, the table is:

consequent not consequent
antecedent 72 9
not antecedent 3555 2545

Usable interest measures are defined formulas:

Interest measure Formula
confidence a / (a+b)
support a / (a+b+c+d)
lift a * (a+b+c+d) / ((a+b)*(a+c))

Interpretation of the rule

The dataset used for data mining contains records relating to the rating of loans. The translation into natural language is:

When the given characterized person lives in the district "Uherske Hradiste" and the amount is "30000", then the loan rating is "C".

This rule is valid with the

  • confidence = 0.889 - also when the person lives in the district "Uherske Hradiste" and the amount is "30000", the rating is "C" with the probability 88,9%
  • support = 0.012 - 1,2% of the records in the analyzed dataset are about persons with district "Uherske Hradiste" and the amount "30000"

Complex demo code

The following Python code represents a complex example of usage of the EasyMiner API.

For this demo, we use a preprocessed version of dataset ESIF Finance details from the project OpenBudgets.eu. Original version of this dataset is available there.

import requests, json
import urllib
import time

API_KEY = ''  # you have to input your API KEY
API_URL = 'https://br-dev.lmcloud.vse.cz/easyminercenter/api'  # there has to be URL of the requested EasyMiner API endpoint

ANTECEDENT_COLUMNS = []  # there can be list of data fields (columns) in the input CSV - if this array is empty, all data fields not included in consequent will be added into antecedent
CONSEQUENT_COLUMNS = ["Technical_Assistance_5"]  # data fields for the consequent
MIN_CONFIDENCE = 0.7  # requested minimal value of confidence
MIN_SUPPORT = 0.1  # requested minimal value of support

CSV_FILE = "esif.csv"  # path to the CSV file
CSV_SEPARATOR = ";"
CSV_ENCODING = "utf8"

# upload data set - create datasource
headers = {"Accept": "application/json"}
files = {("file", open(CSV_FILE, 'rb'))}
r = requests.post(API_URL + '/datasources?separator=' + urllib.parse.quote(
    CSV_SEPARATOR) + '&encoding=' + CSV_ENCODING + '&type=limited&apiKey=' + API_KEY, files=files, headers=headers)
datasource_id = r.json()["id"]

# create miner
headers = {'Content-Type': 'application/json', "Accept": "application/json"}
json_data = json.dumps({"name": "TEST MINER", "type": "cloud", "datasourceId": datasource_id})
r = requests.post(API_URL + "/miners?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
miner_id = r.json()["id"]

# preprocess data fields to attributes
headers = {'Content-Type': 'application/json', "Accept": "application/json"}
r = requests.get(API_URL + '/datasources/' + str(datasource_id) + '?apiKey=' + API_KEY, headers=headers)
datasource_columns = r.json()['column']
attributes_columns_map = {}
for col in datasource_columns:
    column = col["name"]
    json_data = json.dumps(
        {"miner": miner_id, "name": column, "columnName": column, "specialPreprocessing": "eachOne"})
    r = requests.post(API_URL + "/attributes?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
    if r.status_code != 201:
        break  # error occured
    attributes_columns_map[column] = r.json()['name']  # map of created attributes (based on the existing data fields)

# define data mining task
antecedent = []
consequent = []

# prepare antecedent pattern
if len(ANTECEDENT_COLUMNS):
    # add to antecedent only fields defined in the constant
    for column in ANTECEDENT_COLUMNS:
        antecedent.append({"attribute": attributes_columns_map[column]})
else:
    # add to antecedent all fields not used in consequent
    for (column, attribute_name) in attributes_columns_map.items():
        if not(column in CONSEQUENT_COLUMNS):
            antecedent.append({"attribute": attribute_name})

# prepare consequent pattern
for column in CONSEQUENT_COLUMNS:
    consequent.append({"attribute": attributes_columns_map[column]})

    json_data = json.dumps({"miner": miner_id,
                            "name": "Test task",
                            "limitHits": 1000,
                            "IMs": [
                                {
                                    "name": "CONF",
                                    "value": MIN_CONFIDENCE
                                },
                                {
                                    "name": "SUPP",
                                    "value": MIN_SUPPORT
                                }
                            ],
                            "antecedent": antecedent,
                            "consequent": consequent
                            })
# define new data mining task
r = requests.post(API_URL + "/tasks/simple?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
print("create task response code:" + str(r.status_code))
task_id = str(r.json()["id"])

# start task
r = requests.get(API_URL + "/tasks/" + task_id + "/start?apiKey=" + API_KEY, headers=headers)
while True:
    time.sleep(1)
    # check state
    r = requests.get(API_URL + "/tasks/" + task_id + "/state?apiKey=" + API_KEY, headers=headers)
    task_state = r.json()["state"]
    print("task_state:" + task_state)
    if task_state == "solved":
        break
    if task_state == "failed":
        print(dataset["filename"] + ": task failed executing")
        break

# export rules in JSON format
headers = {"Accept": "application/json"}
r = requests.get(API_URL + '/tasks/' + task_id + '/rules?apiKey=' + API_KEY, headers=headers)
task_rules = r.json()

# export of standardized PMML AssociationModel
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=associationmodel&apiKey=' + API_KEY)
pmml = r.text

# export of GUHA PMML
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=guha&apiKey=' + API_KEY)
guha_pmml = r.text