Jaql

Jaql (JAQL) is a functional data processing and query language most commonly used for JSON query processing on BigData.

It started as an Open Source project at Google[1] but the latest release was on 7/12/2010. IBM[2] took it over as primary data processing language for their Hadoop software package BigInsights.

Although having been developed for JSON it supports a variety of other data sources like CSV, TSV, XML.

A comparison[3] to other BigData query languages like PIG Latin and Hive QL illustrates performance and usability aspects of these technologies.

JAQL supports[4] Lazy Evaluation, so expressions are only materialized when needed.

Syntax

The basic concept of JAQL is

source -> operator(parameter) -> sink ;

where a sink can be a source for a downstream operator. So typically a JAQL program has to following structure, expressing a data processing graph:

source -> operator1(parameter) -> operator2(parameter) -> operator2(parameter) -> operator3(parameter) -> operator4(parameter) -> sink ;

Most commonly for readability reasons JAQL programs are linkbreaked after the arrow, this is also a common technique in Twitter SCALDING:

source -> operator1(parameter)
-> operator2(parameter)
-> operator2(parameter)
-> operator3(parameter)
-> operator4(parameter)
-> sink ;

Core Operators[5]

EXPAND

Use the EXPAND expression to flatten nested arrays. This expression takes as input an array of nested arrays [ [ T ] ] and produces an output array [ T ], by promoting the elements of each nested array to the top-level output array.

FILTER

Use the FILTER operator to filter away elements from the specified input array. This operator takes as input an array of elements of type T and outputs an array of the same type, retaining those elements for which a predicate evaluates to true. It is the Jaql equivalent of the SQL WHERE clause. Example:

data = [
 {name: "Jon Doe", income: 20000, mgr: false},
 {name: "Vince Wayne", income: 32500, mgr: false},
 {name: "Jane Dean", income: 72000, mgr: true},
 {name: "Alex Smith", income: 25000, mgr: false}
];

data -> filter $.mgr;

[
  {
    "income": 72000,
    "mgr": true,
    "name": "Jane Dean"
  }
]

data -> filter $.income < 30000;

[
  {
    "income": 20000,
    "mgr": false,
    "name": "Jon Doe"
  },
  {
    "income": 25000,
    "mgr": false,
    "name": "Alex Smith"
  }
]

GROUP

Use the GROUP expression to group one or more input arrays on a grouping key and applies an aggregate function per group.

JOIN

Use the JOIN operator to express a join between two or more input arrays. This operator supports multiple types of joins, including natural, left-outer, right-outer, and outer joins.

SORT

Use the SORT operator to sort an input by one or more fields.

TOP

The TOP expression selects the first k elements of its input. If a comparator is provided, the output is semantically equivalent to sorting the input, then selecting the first k elements.

TRANSFORM

Use the TRANSFORM operator to realize a projection or to apply a function to all items of an output.

References

External links

This article is issued from Wikipedia - version of the 5/31/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.