Module 1: Create a pipeline with Data Factory

This module takes 10 minutes, ingesting raw data from the source store into the Bronze table of a data Lakehouse using the Copy activity in a pipeline.

The high-level steps in module 1 are as follows:

  1. Create a data pipeline.
  2. Use a Copy Activity in the pipeline to load sample data into a data Lakehouse.

Create a data pipeline

  1. A Microsoft Fabric tenant account with an active subscription is required. Create a free account.

  2. Make sure you have a Microsoft Fabric enabled Workspace: Create a workspace.

  3. Sign into Power BI.

  4. Select the default Power BI icon at the bottom left of the screen, and switch to the Data Factory experience.

    Screenshot showing the selection of the Data Factory experience.

  5. Select Data pipeline and provide a pipeline name. Then select Create.

    Screenshot of the Data Factory start page with the button to create a new data pipeline selected.

    Screenshot showing the dialog to give the new pipeline a name.

Use a Copy activity in the pipeline to load sample data to a data Lakehouse

Step 1: Use the copy assistant to configure a copy activity.

Select Copy data assistant to open the copy assistant tool.

Screenshot showing the selection of the Copy data activity from the new pipeline start page.

Step 2: Configure your settings in the copy assistant.

  1. The Copy data dialog is displayed with the first step, Choose data source, highlighted. Select Sample data from the options at the top of the dialog, and then select NYC Taxi - Green.

    Screenshot showing the selection of the NYC Taxi - Green data in the copy assistant on the Choose data source tab.

  2. The data source preview appears next on the Connect to data source page. Review, and then select Next.

    Screenshot showing the preview data for the NYC Taxi - Green sample dataset.

  3. For the Choose data destination step of the copy assistant, select Lakehouse and then Next.

    Screenshot showing the selection of the Lakehouse destination on the Choose data destination tab of the Copy data assistant.

  4. Select Create new Lakehouse on the data destination configuration page that appears, and enter a name for the new Lakehouse. Then select Next again.

    Screenshot showing the data destination configuration page of the Copy assistant, choosing the Create new Lakehouse option and providing a Lakehouse name.

  5. Now configure the details of your Lakehouse destination on the Select and map to folder path or table. page. Select Tables for the Root folder, provide a table name, and choose the Overwrite action. Don't check the Enable partition checkbox that appears after you select the Overwrite table action.

    Screenshot showing the Connect to data destination tab of the Copy data assistant, on the Select and map to folder path or table step.

  6. Finally, on the Review + save page of the copy data assistant, review the configuration. For this tutorial, uncheck the Start data transfer immediately checkbox, since we run the activity manually in the next step. Then select OK.

    Screenshot showing the Copy data assistant on the Review + save page.

Step 3: Run and view the results of your Copy activity.

  1. Select the Run tab in the pipeline editor. Then select the Run button, and then Save and run at the prompt, to run the Copy activity.

    Screenshot showing the pipeline Run tab with the Run button highlighted.

    Screenshot showing the Save and run dialog with the Save and run button highlighted.

  2. You can monitor the run and check the results on the Output tab below the pipeline canvas. Select the run details button (the "glasses" icon that appears when you hover over the running pipeline run) to view the run details.

    Screenshot showing the run details button in the pipeline Output tab.

  3. The run details show 1,508,501 rows read and written.

    Screenshot of the Copy data details for the pipeline run.

  4. Expand the Duration breakdown section to see the duration of each stage of the Copy activity. After reviewing the copy details, select Close.

    Screenshot showing the duration breakdown of the Copy activity run.

In this first module to our end-to-end tutorial for your first data integration using Data Factory in Microsoft Fabric, you learned how to:

  • Create a data pipeline.
  • Add a Copy activity to your pipeline.
  • Use sample data and create a data Lakehouse to store the data to a new table.
  • Run the pipeline and view its details and duration breakdown.

Continue to the next section now to create your dataflow.