Azure data lake folder structure

azure_learner 380 Reputation points
2024-09-19T08:43:49.7+00:00

Hi, the link provided in the below thread is not working, any way to find the information or a new URL/link:

https://learn.microsoft.com/en-us/answers/questions/1255947/azure-datalake-directory-partioning-naming-convent

Please guided.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,485 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,842 questions
{count} votes

Accepted answer
  1. Nehruji R 8,146 Reputation points Microsoft Vendor
    2024-09-19T09:49:49.6333333+00:00

    Hello azure_learner,

    Greetings! Welcome to Microsoft Q&A Platform.

    I understand that you are having questions with creating folders in Azure data lake.

    Azure Blob Storage is organized in a flat paradigm, rather than a hierarchical paradigm (like a classic file system). However, you can organize blobs into virtual directories in order to mimic a folder structure. There is a good explanation in the documentation here, as well as some sample code. A virtual directory forms part of the name of the blob and is indicated by the delimiter character. However, please note that these are not actual directories.

    On the other hand, enabling Hierarchical Namespace (HNS) in Azure Data Lake Storage Gen2 organizes objects (files) into a hierarchy of directories and subdirectories in the same way that the file system on your computer is organized. The hierarchical namespace scales linearly and doesn't degrade data capacity or performance.

    Refer below information to understand difference between Hierarchical Namespace vs Flat namespaces to further understand the difference-

    Hierarchical namespaces organize blob data into directories and stores metadata about each directory and the files within it. They keep the data organized, which yields better storage and retrieval performance for an analytical use case and lowers the cost of analysis. This structure allows operations, such as directory renames and deletes, to be performed in a single atomic operation. Flat namespaces, by contrast, require several operations proportionate to the number of objects in the structure.

    If you do not have hierarchical namespaces enabled, you can simply use a delimiter character and folder in the blob name. There is a Stack Overflow question also asking about the GetDirectoryReference replacement and a GitHub issue that you might find useful. Please check out the QuickStart with examples of creating containers and uploading blobs.

    Since you would like to create a folder structure you might consider enabling Azure Data Lake Storage Gen2 hierarchical namespace. If you enable this, you can find information on creating and managing directories here.

    Hope this helps. Let me know if you have further questions or issues and I will be happy to help.


    Please don’t forget to "Accept the answer" and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. azure_learner 380 Reputation points
    2024-09-20T14:26:26.4+00:00

    Hi @Nehruji R thank you. The link I mentioned has wealth of information, hence interested to find why it is not available and whether it has moved to some new site.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.