Hewlett Packard has developed a set of JSON-based REST API’s which enable “Big Data”-type processing capabilities allowing developers to process information embedded in unstructured text and images in previously inaccessible formats. This platform is called IDOL OnDemand, the APIs are published here https://www.idolondemand.com/developer/apis
In this post, I will use NodeJs to call HP IDOL OnDemand APIs, 3 APIs usage will be given below for demonstration purpose, which are:
Since these APIs are all REST based and authorization required,
// Set the request headers
var headers = {
'User-Agent': 'Super Agent/0.0.1',
'Content-Type': 'application/x-www-form-urlencoded'
};
Then, we construct the request data, for all the request, the apiKey parameter is required. You need to sign up in IDOL OnDemand developer page(
https://www.idolondemand.com/developer/apis) to get the apiKey. For OCR Document API calling, the request only need to use 'url' and 'apiKey' parameters, which can refer to
https://www.idolondemand.com/developer/apis/ocrdocument/#request, the full request url will be:
https://api.idolondemand.com/1/api/sync/ocrdocument/v1?url={url_value}&apikey={apikey_value}
Note that, we need pay attention to the url parameter, its value should be encoded, this can be done by
encodeURIComponent method in javascript.
/**
* Get the OCR document options.
* @param url the image url to extract the text.
*/
var get_ocr_options = function (url){
return {
host: 'api.idolondemand.com',
port: 443,
path: '/1/api/sync/ocrdocument/v1?url=' + encodeURIComponent(url).replace(/%20/g,'+') + '&apikey=' + apikey,
headers: headers
};
};
Now, we begin to send request to server, in the callback function, we can handle the response data. The response data is JSON format, the structure can refer to https://www.idolondemand.com/developer/apis/ocrdocument/#response , we need to parse the 'text_block' field which is the extracted text result when response is on the end. We use the build-in JSON to parse the response data to json object:
req = https.get(get_ocr_options(image_url), function(response) {
response.on('data', function (chunk) {
str += chunk;
});
response.on('end', function () {
var json = JSON.parse(str);
var len = json.text_block.length;
console.log(json.text_block[0].text);
});
});
req.end();
Find Similar API calling is similar to
OCR Document API calling, the only differences are the API path and request parameters. From https://www.idolondemand.com/developer/apis/findsimilar/#request
besides the 'text' and 'apiKey' parameters, we need to set 'print=all' If we get the text content. So, the full request url will be:
https://api.idolondemand.com/1/api/sync/findsimilar/v1?text={text_value}&print=all&apikey={apikey_value}
Note that, the text_value should also be encoded via
encodeURIComponent method.
/**
* Get the Find Similar options.
* @param text The text content to process.
*/
var get_findsimilar_options = function (text){
return {
host: 'api.idolondemand.com',
port: 443,
path: '/1/api/sync/findsimilar/v1?text=' + encodeURIComponent(text).replace(/%20/g,'+') + '&print=all&apikey=' + apikey,
headers: headers
};
};
The Analyze Sentiment API calling need the text output of Find Similar calling which is the content of wiki article containing many words, I save the content to a local file, and then post the file content to Analyze Sentiment API, the
analyze sentiment post url will be:
https://api.idolondemand.com/1/api/sync/analyzesentiment/v1
The following code block will show how to post local file to the url
:
var r = request.post(analyzesentiment_post_url, function optionalCallback (err, httpResponse, body) {
var json = JSON.parse(body);
// output the score and rating
console.log("Score:"+json.aggregate.score+" Rating:"+json.aggregate.sentiment);
});
// create form to post data
var form = r.form();
form.append('apiKey', apikey);
form.append('file', fs.createReadStream(path.join(__dirname, file)));
In order to keep the order of calling the APIs (Find Similar ---> OCR Document --->Analyze Sentiment), I use
async.waterfall(
https://github.com/caolan/async#waterfall) which is commonly accepted by NodeJS developers. The calling sequence flow can be controlled by following code block:
async.waterfall([
function(callback){
// send request to Find Similar API
callback(null, response);
},
function(response, callback){
// parse the response and output data
callback(null);
},
function(callback){
// send request to OCR Document API
callback(null, response);
},
function(response, callback){
// parse the response and output data
callback(null);
},
function(callback){
// send request to Analyze Sentiment
callback(null, response);
},
function(response, callback){
// parse the response and output data
callback(null);
}
], function (err, result) {
});