-
Notifications
You must be signed in to change notification settings - Fork 8
API usage manual
EasyMiner provides complex API for usage of almost all functionality implemented in all modules of the EasyMiner project.
The work using API is fully combinable with the work using UI.
API is available on the URL {APPLICATION_URL}/api, where {APPLICATION_URL} is the location of EasyMinerCenter.
The API is documented using Swagger (see documentation on our development server)
For usage of the API, the API KEY is required. This key is unique for each user. You can find it in the graphical UI:
- Go to the {APPLICATION_URL}
- Log-in using your user account
- Click on the "user image" on the in the top right corner of the screen and use the link Show my profile
- alternatively, you can go to the site {APPLICATION_URL}/em/user/details
- Copy your API KEY
The API key has to be included in all request sent to the API endpoint. It can be attached in the GET param ?apiKey={YOUR_API_KEY}
or send in a header: Authorization: ApiKey {YOUR_API_KEY}
The following section presents a sample use of the API in Python. The code uses the libraries and constants
import requests,json
import time
API_KEY = ''
API_URL = 'https://br-dev.lmcloud.vse.cz/easyminercenter/api'
The included response data are based on the test file:
EasyMiner supports data saved in a CSV file. Small files can be simply send in the POST request using this API. Big files can be uploaded using the UI or using the uploader in the module EasyMiner-Data.
Once the data file is uploaded, it can be used for solving of more data mining tasks.
# for upload, you have to send the CSV file and params "type" (with value limited), "separator" and "encoding", optionally, you can use also the params "enclosure", "escape" or "nullValue"
headers = {"Accept":"application/json"}
files = {("file", open(CSV_FILE, 'rb'))}
r = requests.post(API_URL+'/datasources?separator=%2C&encoding=utf8&type=limited&apiKey='+API_KEY, files=files,headers=headers)
datasource_id= r.json()["id"]
The response of this request contain details about the uploaded datasource. For next usage, it is necessary to parse at least the "id".
For usage of an existing datasource, you can identify it using the id. The list of available datasources can be loaded using the request:
headers = {"Accept":"application/json"}
r = requests.get(API_URL+'/datasources?apiKey='+API_KEY, headers=headers)
"Miner" is an instance of EasyMiner data workspace, based on a selected datasource.
For miner creation, you should send a request like:
headers = {'Content-Type': 'application/json',"Accept":"application/json"}
json_data = json.dumps({"name": "TEST MINER","type": "cloud","datasourceId":datasource_id})
r= requests.post(API_URL+"/miners?apiKey="+API_KEY,headers=headers,data=json_data.encode())
miner_id= r.json()["id"]
Currently, the cloud version of EasyMiner supports preprocessing methods each value-one bin, nominal enumeration, interval enumeration, equidistant intervals, equifrequent intervals and equisized interfals. You have to send preprocessing request for each required data field.
The following code reads the list of available data fields and sends the preprocessing requests for them.
headers = {'Content-Type': 'application/json', "Accept": "application/json"}
r = requests.get(API_URL+'/datasources/'+datasource_id+'?apiKey='+API_KEY,headers=headers)
datasource_columns=r.json()['column']
attributed_datafields_map={}
for col in datasource_columns:
column_name=col["name"]
json_data = json.dumps({"miner": miner_id,"name": column_name,"columnName": column_name, "specialPreprocessing":"eachOne"})
r = requests.post(API_URL+"/attributes?apiKey="+API_KEY,headers=headers,data=json_data.encode())
if r.status_code!=201:
break; #error occured
attributed_datafields_map[column_name]=r.json()['name']; #map of created attributes (based on the existing data fields)
For new task definition it is necessary to input 3 types of parameters:
- attributes list for antecedent
- attributes list for consequent
- threshold values of requested interest measures
For attributes, it is also possible to fix its value.
In the following example, we define a task pattern in the form
city('Prague') AND amount(*) -> rating(*)
with interest measures confidence and support.
json_data = json.dumps({"miner": miner_id,
"name": "Test task",
"limitHits": 1000,
"IMs": [
{
"name": "CONF",
"value": 0.5
},
{
"name": "SUPP",
"value": 0.01
}
],
"antecedent": [
{
"attribute": "city",
"fixedValue": "Prague"
},
{
"attribute": "amount"
}
],
"consequent": [
{
"attribute": "rating"
}
]
})
r = requests.post(API_URL+"/tasks/simple?apiKey="+API_KEY,headers=headers,data=json_data.encode())
print("create task response code:" + str(r.status_code))
task_id = r.json()["id"]
task_id = str(task_id)
If you want to solve a task with only one attribute in consequent, it is possible to generate the task settings automatically. Try the script:
#uses the attributes_datafields_map generated in previous code example
CONSEQUENT_ATTRIBUTE_NAME = "" #setup name of attribute in consequent
antecedent=[]
for attribute_name in attributes_datafields_map.values():
if attribute_name!=CONSEQUENT_ATTRIBUTE_NAME:
antecedent.append({"attribute":attribute_name})
json_data = json.dumps({"miner": miner_id,
"name": "Test task",
"limitHits": 1000,
"IMs": [
{
"name": "CONF",
"value": 0.5
},
{
"name": "SUPP",
"value": 0.01
}
],
"antecedent": antecedent,
"consequent": [
{
"attribute": CONSEQUENT_ATTRIBUTE_NAME
}
]
})
r = requests.post(API_URL+"/tasks/simple?apiKey="+API_KEY,headers=headers,data=json_data.encode())
print("create task response code:" + str(r.status_code))
task_id = r.json()["id"]
For solving of a defined task, you have to start it and then check its state, before it is solved (and results imported)
#start task
r = requests.get(API_URL+"/tasks/"+task_id+"/start?apiKey="+API_KEY,headers=headers)
while True:
time.sleep(1)
#check state
r= requests.get(API_URL+"/tasks/"+task_id+"/state?apiKey="+API_KEY,headers=headers)
task_state=r.json()["state"]
print("task_state:"+task_state)
if task_state=="solved":
break
if task_state=="failed":
print(dataset["filename"] + ": task failed executing")
break
EasyMiner supports export of data mining results in the form of a GUHA PMML or in the form of standard PMML AssociationModel.
# export of standardized PMML AssociationModel
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=associationmodel&apiKey=' + API_KEY)
pmml = r.text
# export of GUHA PMML
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=guha&apiKey=' + API_KEY)
guha_pmml = r.text
There is available also a visualisation - export in HTML form:
r = requests.get(API_URL + '/tasks/' + task_id + '/html?apiKey=' + API_KEY)
html = r.text
The transformations are available in the GIT repository EasyMiner-XML. There are transformations for both of the supported PMML formats.
Found rules can be read (and collected etc.) also using API access. For each rule, there is available simple text form and information about values from the four field table.
You can try the request:
headers = {"Accept": "application/json"}
r = requests.get(API_URL + '/tasks/' + task_id + '/rules?apiKey=' + API_KEY, headers=headers)
rules = r.json()['rules']
for rule in rules:
a = int(rule['a'])
b = int(rule['b'])
c = int(rule['c'])
d = int(rule['d'])
confidence = a / (a + b)
support = a / (a + b + c + d)
print(rule['text'] + ' | confidence=' + str(confidence) + ', support=' + str(support))
How to interpret found rules? We can explain it on a single rule from the example output (rule with ID 1668888):
amount(30000.0) & district(Uherske Hradiste) → rating(C)
confidence = 0.889
support = 0.012
a=72, b=9, c=3555, d=2545
Rule part | Content |
---|---|
antecedent | amount(30000.0) & district(Uherske Hradiste) |
consequent | rating(C) |
Antecedent and consequent are conjunctions of attributes with concrete values characterizing records from the analyzed dataset. The relation between antecedent and consequent is characterized using interest measures.
For classification purposes, the antecedent can be identified as condition of the rule, consequent is the result of classification.
Interest measures are calculated from the four field table:
consequent | not consequent | |
---|---|---|
antecedent | a | b |
not antecedent | c | c |
with concrete values, the table is:
consequent | not consequent | |
---|---|---|
antecedent | 72 | 9 |
not antecedent | 3555 | 2545 |
Usable interest measures are defined formulas:
Interest measure | Formula |
---|---|
confidence | a / (a+b) |
support | a / (a+b+c+d) |
lift | a * (a+b+c+d) / ((a+b)*(a+c)) |
The dataset used for data mining contains records relating to the rating of loans. The translation into natural language is:
When the given characterized person lives in the district "Uherske Hradiste" and the amount is "30000", then the loan rating is "C".
This rule is valid with the
- confidence = 0.889 - also when the person lives in the district "Uherske Hradiste" and the amount is "30000", the rating is "C" with the probability 88,9%
- support = 0.012 - 1,2% of the records in the analyzed dataset are about persons with district "Uherske Hradiste" and the amount "30000"
The following Python code represents a complex example of usage of the EasyMiner API.
For this demo, we use a preprocessed version of dataset ESIF Finance details from the project OpenBudgets.eu. Original version of this dataset is available there.
import requests, json
import urllib
import time
API_KEY = '' # you have to input your API KEY
API_URL = 'https://br-dev.lmcloud.vse.cz/easyminercenter/api' # there has to be URL of the requested EasyMiner API endpoint
ANTECEDENT_COLUMNS = [] # there can be list of data fields (columns) in the input CSV - if this array is empty, all data fields not included in consequent will be added into antecedent
CONSEQUENT_COLUMNS = ["Technical_Assistance_5"] # data fields for the consequent
MIN_CONFIDENCE = 0.7 # requested minimal value of confidence
MIN_SUPPORT = 0.1 # requested minimal value of support
CSV_FILE = "esif.csv" # path to the CSV file
CSV_SEPARATOR = ";"
CSV_ENCODING = "utf8"
# upload data set - create datasource
headers = {"Accept": "application/json"}
files = {("file", open(CSV_FILE, 'rb'))}
r = requests.post(API_URL + '/datasources?separator=' + urllib.parse.quote(
CSV_SEPARATOR) + '&encoding=' + CSV_ENCODING + '&type=limited&apiKey=' + API_KEY, files=files, headers=headers)
datasource_id = r.json()["id"]
# create miner
headers = {'Content-Type': 'application/json', "Accept": "application/json"}
json_data = json.dumps({"name": "TEST MINER", "type": "cloud", "datasourceId": datasource_id})
r = requests.post(API_URL + "/miners?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
miner_id = r.json()["id"]
# preprocess data fields to attributes
headers = {'Content-Type': 'application/json', "Accept": "application/json"}
r = requests.get(API_URL + '/datasources/' + str(datasource_id) + '?apiKey=' + API_KEY, headers=headers)
datasource_columns = r.json()['column']
attributes_columns_map = {}
for col in datasource_columns:
column = col["name"]
json_data = json.dumps(
{"miner": miner_id, "name": column, "columnName": column, "specialPreprocessing": "eachOne"})
r = requests.post(API_URL + "/attributes?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
if r.status_code != 201:
break # error occured
attributes_columns_map[column] = r.json()['name'] # map of created attributes (based on the existing data fields)
# define data mining task
antecedent = []
consequent = []
# prepare antecedent pattern
if len(ANTECEDENT_COLUMNS):
# add to antecedent only fields defined in the constant
for column in ANTECEDENT_COLUMNS:
antecedent.append({"attribute": attributes_columns_map[column]})
else:
# add to antecedent all fields not used in consequent
for (column, attribute_name) in attributes_columns_map.items():
if not(column in CONSEQUENT_COLUMNS):
antecedent.append({"attribute": attribute_name})
# prepare consequent pattern
for column in CONSEQUENT_COLUMNS:
consequent.append({"attribute": attributes_columns_map[column]})
json_data = json.dumps({"miner": miner_id,
"name": "Test task",
"limitHits": 1000,
"IMs": [
{
"name": "CONF",
"value": MIN_CONFIDENCE
},
{
"name": "SUPP",
"value": MIN_SUPPORT
}
],
"antecedent": antecedent,
"consequent": consequent
})
# define new data mining task
r = requests.post(API_URL + "/tasks/simple?apiKey=" + API_KEY, headers=headers, data=json_data.encode())
print("create task response code:" + str(r.status_code))
task_id = str(r.json()["id"])
# start task
r = requests.get(API_URL + "/tasks/" + task_id + "/start?apiKey=" + API_KEY, headers=headers)
while True:
time.sleep(1)
# check state
r = requests.get(API_URL + "/tasks/" + task_id + "/state?apiKey=" + API_KEY, headers=headers)
task_state = r.json()["state"]
print("task_state:" + task_state)
if task_state == "solved":
break
if task_state == "failed":
print(dataset["filename"] + ": task failed executing")
break
# export rules in JSON format
headers = {"Accept": "application/json"}
r = requests.get(API_URL + '/tasks/' + task_id + '/rules?apiKey=' + API_KEY, headers=headers)
task_rules = r.json()
# export of standardized PMML AssociationModel
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=associationmodel&apiKey=' + API_KEY)
pmml = r.text
# export of GUHA PMML
r = requests.get(API_URL + '/tasks/' + task_id + '/pmml?model=guha&apiKey=' + API_KEY)
guha_pmml = r.text