A data lake is a central location that holds a large amount of data in its native, raw format, as well as a way to organize large volumes of highly diverse data. Compared to a hierarchical data warehouse, which stores data in files or folders, a data lake uses a flat architecture to store the data. Data lakes are usually configured on a cluster of scalable commodity hardware. As a result, you can store raw data in the lake in case it will be needed at a future date — without worrying about the data format, size or storage capacity.
In addition, data lake clusters can exist on-premises or in the cloud. Historically, the term "data lake" was often associated with Hadoop-oriented object storage, but today the term generally refers to the broader category of object storage. Object storage stores data with metadata tags and a unique identifier, which makes it easier to locate and retrieve data across regions and improves performance. The Databricks Lakehouse Platform makes all the data in your data lake available for any number of data-driven use cases.