Skip to main content

To Infinity and Beyond - Big Data at the speed of light

Astronomers were into Big Data before it was big. In order to learn about the history of the universe, they needed to observe and record billions and billions of astronomical objects and perform heavy-duty analysis on the resulting massive datasets. Available predictive methods were not scalable to the size of data sets they were dealing with so they turned to Skytree to obtain unprecedented performance and accuracy on the largest datasets ever collected. Fast-forward a decade or so and the need to store, access, process and analyze datasets of astronomical sizes is now mainstream in the guise of Big Data analytics.

We have built our Advanced Analytics platform, Skytree Infinity, from the ground up to provide an ultra-fast platform that makes it possible to use the most advanced and accurate machine learning methods on extremely large datasets to extract actionable insights and predictions. Furthermore, the infrastructure available for big data has matured to provide scalable and reliable data management and processing capabilities. The combination of our state-of-the-art machine learning and scalable infrastructure allows for an unprecedented, easy-to-consume solution for big data analysis.

Skytree Infinity integration with Apache Spark

While Skytree Infinity’s analysis capabilities are remarkable, we are acutely aware of the need for easy, seamless integration to other closely related pieces of the Big Data analytics workflow. The Apache Spark project has garnered broad interest and adoption by the industry. Spark offers a rich platform for iterative processing and data manipulation that is well suited to machine learning. The combination of Skytree and Spark provides a widely usable solution for big data analytics that provides a fast, reliable, scalable, and manageable data cleansing/munging/analysis platform for the most challenging business problems.

Our most recent offering in this direction is the certification of Skytree Infinity, the enterprise Machine Learning platform, on Apache Spark by Databricks.

Users can access data from multiple sources including RDBMSs, HDFS and Hive, and use Spark calls through Skytree Infinity to perform the most common ML-centric data pre-processing, munging and featurization tasks, to which they can apply advanced machine learning methods of their choice to obtain accurate, actionable predictions and insights from their data.

Continued Collaboration

We are happy to be a member of the Spark community, and we are excited to explore the new vistas that this new integration opens.

To learn more about Skytree Infinity, please visit our website at

Try Databricks for free

Related posts

See all Partners posts