In a previous post, we walked through how to implement a custom Java transformation in Oracle Big Data Discovery. While that post was more technical in nature, this follow up post will highlight a detailed use case for the transformations and illustrate how they can be used to augment an existing dataset.
In our first post introducing Oracle Big Data Discovery, we highlighted the data transform capabilities of BDD. The transform editor provides a variety of built in functions for transforming datasets. While these built in functions are straightforward to use and don't require any additional configuration, they are also limited to a predefined set of transformations. Fortunately, for those looking for additional functionality during transform, it is possible to introduce custom transformations that can leverage external Java libraries by implementing a custom Groovy script. The rest of this post will walk through the implementation of a basic example, and a subsequent post will go in depth with a few real world use cases.
In our last post, we talked about some of the tools in the Hadoop ecosystem that Oracle Big Data Discovery takes advantage of to do its work -- namely Hive and Spark. In this post, we're going to delve a little deeper into how BDD integrates with data that is already sitting in Hive, how it can write transformed data back to HDFS, and how it can help give users new insights on that data.
The most exciting thing about Oracle Big Data Discovery is its integration with all the latest tools in the Hadoop ecosystem. This includes Spark, which is rapidly supplanting MapReduce as the processing paradigm of choice on distributed architectures. BDD also makes clever use of the tried and tested Hive as a metadata layer, meaning it has a stable foundation on which to build its complex data processing operations.
We have been anticipating the intersection of big data with data discovery for quite some time. What exactly that will look like in the coming years is still up for debate, but we think Oracle's new Big Data Discovery application provides a window into what true discovery on Hadoop might entail.