Monday, June 16, 2014

Nodejs call HP IDOL OnDemand

Hewlett Packard has developed a set of JSON-based REST API’s which enable “Big Data”-type processing capabilities allowing developers to process information embedded in unstructured text and images in previously inaccessible formats.  This platform is called IDOL OnDemand, the APIs are published here https://www.idolondemand.com/developer/apis


In this post, I will use NodeJs to call HP IDOL OnDemand APIs,  3 APIs usage will be given below for demonstration purpose, which are:
Since these APIs are all REST based and authorization required,
https.get (http://nodejs.org/api/https.html#https_https_get_options_callback) will be used to send request to HP IDOL OnDemand server to get the JSON result. 

Firstly, we need to set the request headers, this is the common part for for all requests:
// Set the request headers
var headers = {
    'User-Agent':       'Super Agent/0.0.1',
    'Content-Type':     'application/x-www-form-urlencoded'
};

Then, we construct the request data, for all the request, the apiKey parameter is required. You need to sign up in IDOL OnDemand developer page(https://www.idolondemand.com/developer/apis)  to get the apiKey. For OCR Document API calling, the request only need to use 'url' and 'apiKey' parameters, which can refer to https://www.idolondemand.com/developer/apis/ocrdocument/#request, the full request url will be:
https://api.idolondemand.com/1/api/sync/ocrdocument/v1?url={url_value}&apikey={apikey_value}
Note that, we need pay attention to the url parameter, its value should be encoded, this can be done by encodeURIComponent method in javascript.
/**
*  Get the OCR document options.
*  @param url  the image url to extract the text.
*/
var get_ocr_options = function (url){
    return {
        host: 'api.idolondemand.com',
        port: 443,
        path: '/1/api/sync/ocrdocument/v1?url=' + encodeURIComponent(url).replace(/%20/g,'+') + '&apikey=' + apikey,
        headers: headers
    };
};

Now, we begin to send request to server,  in the callback function, we can handle the response data. The response data is JSON format, the structure can refer to https://www.idolondemand.com/developer/apis/ocrdocument/#response , we need to parse the 'text_block' field which is the extracted text result when response is on the end. We use the build-in JSON to parse the response data to json object:
req = https.get(get_ocr_options(image_url),  function(response) {
response.on('data', function (chunk) {
str += chunk;
});
response.on('end', function () {
var json = JSON.parse(str);
var len = json.text_block.length;
console.log(json.text_block[0].text);
});
});
req.end();


Find Similar API calling is similar to OCR Document API calling, the only differences are the API path and request parameters. From https://www.idolondemand.com/developer/apis/findsimilar/#request
besides the 'text' and 'apiKey' parameters, we need to set 'print=all'  If we get the text content. So, the full request url will be:
https://api.idolondemand.com/1/api/sync/findsimilar/v1?text={text_value}&print=all&apikey={apikey_value}
Note that,  the text_value should also be encoded via encodeURIComponent  method.
/**
*  Get the Find Similar options. 
*  @param text The text content to process.
*/
var get_findsimilar_options = function (text){
    return {
        host: 'api.idolondemand.com',
        port: 443,
        path: '/1/api/sync/findsimilar/v1?text=' + encodeURIComponent(text).replace(/%20/g,'+') + '&print=all&apikey=' + apikey,
        headers: headers
    };
};

The Analyze Sentiment API calling need the text output of  Find Similar calling which is the content of wiki article containing many words, I save the content to a local file, and then post the file content to Analyze Sentiment API, the analyze sentiment post url will be:
https://api.idolondemand.com/1/api/sync/analyzesentiment/v1
The following code block will show how to post local file to the url:
var r = request.post(analyzesentiment_post_url, function optionalCallback (err, httpResponse, body) {
     var json = JSON.parse(body);
      // output the score and rating
     console.log("Score:"+json.aggregate.score+"  Rating:"+json.aggregate.sentiment);
 });
// create form to post data
 var form = r.form();
 form.append('apiKey', apikey);
form.append('file', fs.createReadStream(path.join(__dirname, file)));

In order to keep the order of calling the APIs (Find Similar ---> OCR Document --->Analyze Sentiment), I use async.waterfall(https://github.com/caolan/async#waterfall) which is commonly accepted by NodeJS developers. The calling sequence flow can be controlled by following code block:
async.waterfall([
    function(callback){
        // send request to Find Similar API
        callback(null, response);
    },
    function(response,  callback){
        // parse the response and output data
        callback(null);
    },
    function(callback){
        // send request to OCR Document API
        callback(null, response);
    },
    function(response,  callback){
         // parse the response and output data
        callback(null);
    },
    function(callback){
         // send request to Analyze Sentiment
        callback(null, response);
    },
    function(response, callback){
        // parse the response and output data
        callback(null);
    }
], function (err, result) {
});

No comments:

Post a Comment