Prior to the invention of Hadoop, the technologies underpinning modern storage and compute systems were relatively basic, limiting companies mostly to the analysis of "small data." Even this relatively basic form of analytics could be difficult, though, especially the integration of new data sources. With traditional data analytics, which relies on the use of relational databases (like SQL databases), made up of tables of structured data, every byte of raw data needs to be formatted in a specific way before it can be ingested into the database for analysis. This often lengthy process, commonly known as extract, transform, load (or ETL) is required for each new data source. The main problem with this 3-part process and approach is that it’s incredibly time and labor intensive, sometimes requiring up to 18 months for data scientists and engineers to implement or change. Once data was inside the database, though, in most cases it was easy enough for data analysts to query and analyze. But then along came the Internet, eCommerce, social media, mobile devices, marketing automation, Internet of Things (IoT) devices, etc., and the size, volume, and complexity of raw data became too much for all but a handful of institutions to analyze in the normal course of business.
Big data analytics is the often complex process of examining large and varied data sets - or big data - that has been generated by various sources such as eCommerce, mobile devices, social media and the Internet of Things (IoT). It involves integrating different data sources, transforming unstructured data into structured data, and generating insights from the data using specialized tools and techniques that spread out data processing over an entire network. The amount of digital data that exists is growing at a fast pace, doubling every two years. Big data analytics is the solution that came with a different approach for managing and analyzing all of these data sources. While the principles of traditional data analytics generally still apply, the scale and complexity of big data analytics required the development of new ways to store and process the petabytes of structured and unstructured data involved. The demand for faster speeds and greater storage capacities created a technological vacuum that was soon filled by new storage methods, such as data warehouses and data lakes, and nonrelational databases like NoSQL, as well as data processing and data management technologies and frameworks, such as open source Apache Hadoop, Spark, and Hive. Big data analytics takes advantage of advanced analytic techniques to analyze really big data sets that include structured, semi-structured and unstructured data, from various sources, and in different sizes from terabytes to zettabytes.
Big data analytics helps organizations harness their data and use advanced data science techniques and methods, such as natural language processing, deep learning, and machine learning, uncovering hidden patterns, unknown correlations, market trends and customer preferences, to identify new opportunities and make more informed business decisions.