Tuesday, May 20, 2025

Free Power BI Classes Week 3

Successfully completely the free Power BI classes for week 3

We continued with Lab 2.

The participants learnt about creating stacked column charts by using legend functionality, creating groups, creating filters on the visual and page level filters as well creating drill down charts based on date hierarchy.  They also created slicers.

Some interesting questions also arose as now the participants started thinking of how to analyse the data at hand.


Here is the recording of the class for week 3

Free Power BI classes 2025 Week 3 


Monday, May 12, 2025

Free Power BI Classes for Week 2

Successfully completely the free Power BI classes for week 2

We completed Lab1 and started on Lab 2.

The participants learnt about data modelling, facts/measures and dimensions

They also learn about Relationsips -- 1 to Many and Many to Many relationships and how to create them in Power BI


Here is the recording of the class for week 2

Power BI Classes Week 2


Tuesday, May 06, 2025

Free Power BI Classes Week 1

Yesterday I have started free Power BI classes for adults.

These classes are aimed at people who are new to Power BI.  I will be covering the Dashboard in a Day labs over 6 weeks.

We started off with Lab 0 and Lab 1.

Here is the recording for yesterday's one for those who would like to start off with Power BI

https://youtu.be/tvVkdmMOSJU


Thursday, January 23, 2025

OneLake: The Heart of Your Data Universe in Microsoft Fabric

Imagine a single, unified data lake for your entire organization, accessible to every workload, without data duplication. That's the power of Microsoft Fabric's OneLake. It's not just a storage solution; it's a foundational layer that fosters data collaboration and streamlines your analytics journey.

Understanding the Core Concept of OneLake

OneLake is fundamentally a single, unified, SaaS-managed data lake built on Azure Data Lake Storage Gen2 (ADLS Gen2). It's automatically provisioned with every Fabric tenant, eliminating the need for manual setup. Key concepts include:

  • One Copy of Data: OneLake eliminates data silos by providing a single, logical location for all your data, regardless of format or source.
  • Hierarchical Structure: It uses a familiar hierarchical file system, allowing you to organize data into folders and subfolders.
  • Shortcuts: OneLake shortcuts enable you to reference existing data in other storage locations (like ADLS Gen2 or S3) without physically moving it.
  • Open Formats: It supports open data formats like Parquet, Delta Lake, and CSV, ensuring interoperability with various tools and applications.
  • Automatic Indexing and Discovery: OneLake automatically indexes metadata, making it easy to discover and access data.

Advantages of OneLake: A Game Changer for Your Data Strategy

  • Eliminates Data Silos: OneLake breaks down data silos, fostering a unified view of your organization's data.
  • Reduces Data Duplication and Costs: By storing data in a single location, OneLake eliminates the need for redundant copies, reducing storage costs and complexity.
  • Simplifies Data Management: OneLake's SaaS-managed nature simplifies data management, freeing up IT resources.
  • Accelerates Analytics: With all data in one place, OneLake accelerates data access and analysis, enabling faster insights.
  • Enhances Collaboration: OneLake promotes data sharing and collaboration across teams and departments.
  • Seamless Integration with Fabric Workloads: OneLake is tightly integrated with all Fabric workloads, including Data Factory, Data Warehouse, Lakehouse, and Power BI.

How OneLake Fosters Data Collaboration

OneLake acts as a central hub for data collaboration, enabling teams to easily share and access data. Here's how:

  • Shared Workspaces: Fabric workspaces provide a collaborative environment where teams can work on data projects together, with OneLake as the underlying storage.
  • Data Sharing through Shortcuts: OneLake shortcuts allow teams to easily share data without physically moving it, reducing data duplication and ensuring data consistency.
  • Data Discovery with Metadata: OneLake's automatic indexing and metadata management make it easy for teams to discover and access relevant data.
  • Consistent Data Access: OneLake provides a consistent data access layer, ensuring that all Fabric workloads can access data in the same way.

Scenarios and Examples:

  • Scenario 1: Cross-Departmental Analytics:
    • A retail company wants to analyze customer behavior across different departments (marketing, sales, and operations).
    • With OneLake, each department can store its data in separate folders within the same data lake.
    • Data analysts can easily access and combine data from different departments to gain a holistic view of customer behavior.
  • Scenario 2: Data Science Collaboration:
    • A data science team wants to collaborate on a machine learning project.
    • They can store their data and models in a shared workspace within OneLake.
    • This enables team members to easily access and share data, code, and models, accelerating the project lifecycle.
  • Scenario 3: External Data Integration:
    • A financial services company needs to integrate data from external partners.
    • Using OneLake shortcuts, they can reference data from their partners' ADLS Gen2 accounts without physically moving it.
    • This simplifies data integration and reduces the risk of data duplication.
  • Scenario 4: Real-time Data Sharing:
    • A manufacturing company has IoT devices that are constantly generating data.
    • This data is streamed into OneLake.
    • Different teams can access the most recent data instantly for real time dashboards, and alerting.

The Future of Data Collaboration is Here

OneLake is a transformative technology that simplifies data management and fosters data collaboration. By providing a single, unified data lake for your entire organization, OneLake enables you to unlock the full potential of your data and accelerate your analytics journey.



Friday, January 10, 2025

Building and Deploying Machine Learning Models with Microsoft Fabric

    Microsoft Fabric brings a unified experience to data science, enabling you to build, train, and deploy machine learning models seamlessly. With integrated tools and workflows, Fabric empowers data scientists to accelerate their projects and deliver impactful insights. Let's explore how you can leverage Fabric's data science capabilities.

    Fabric's Data Science Toolkit: A Unified Approach

    Fabric provides a comprehensive environment for machine learning, including:

    • Notebooks: Interactive environments for data exploration, model development, and experimentation.
    • Experiments: Tracking and managing model training runs, including parameters, metrics, and artifacts.
    • Models: Registering and versioning trained models for deployment.
    • Pipelines: Orchestrating end-to-end machine learning workflows.
    • ML Libraries: Integration with popular libraries like Scikit-learn, TensorFlow, and PyTorch.
    • OneLake Integration: Direct access to your data in OneLake, eliminating data movement.

    Building and Deploying a Machine Learning Model: A Step-by-Step Approach

1. Data Ingestion and Preparation:
  • Scenario: A retail company wants to predict customer churn based on historical transactions and demographic data.
  • Action: Use Fabric Notebooks to connect to your data in OneLake, load it into a Pandas DataFrame, and perform data cleaning and preprocessing.
  • Example: Python
import pandas as pd 
df = pd.read_parquet("abfss://<your-onelake-path>/customer_data.parquet") 

df = df.dropna() # Remove missing values # Feature engineering and encoding

2. Model Training and Experimentation:

  • Scenario: The data scientist wants to compare the performance of different classification algorithms.
  • Action: Use Fabric Experiments to track multiple training runs with different hyperparameters and algorithms.
  • Example:
    Python
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import accuracy_score
    import mlflow
    
    mlflow.set_experiment("customer_churn_prediction")
    with mlflow.start_run():
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        model = RandomForestClassifier(n_estimators=100, max_depth=10)
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        mlflow.log_metric("accuracy", accuracy)
        mlflow.sklearn.log_model(model, "random_forest_model")
    

3. Model Registration and Versioning:

  • Scenario: The data scientist has selected the best performing model and wants to register it for deployment.
  • Action: Use Fabric Models to register the trained model, including its metadata and artifacts.
  • Example:
    Python
    registered_model = mlflow.sklearn.log_model(model, "random_forest_model")
    
    Then you can register it into the Fabric workspace model registry.

4. Model Deployment:

  • Scenario: The retail company wants to deploy the churn prediction model as a real-time API.
  • Action: Fabric's deployment capabilities allow you to deploy models as web services for real-time predictions or as batch jobs for offline scoring.
  • Deployment Options:
    • Real-time endpoints: Fabric provides the ability to deploy models as real-time endpoints for low-latency predictions.
    • Batch prediction: For large datasets, use Fabric Pipelines to schedule batch predictions and store the results in OneLake.
  • Example: (Conceptual)
    • Deploy the registered model as a real-time endpoint using Fabric's deployment tools.
    • Create a Power BI report that consumes the API to display customer churn predictions.

5. Model Monitoring and Retraining:

  • Scenario: The model's performance may degrade over time due to changes in customer behavior.
  • Action: Use Fabric's monitoring capabilities to track model performance and trigger retraining workflows.
  • Example:
    • Set up alerts to notify the data science team when the model's accuracy falls below a certain threshold.
    • Create a Fabric Pipeline that automatically retrains the model with new data on a regular schedule.

Benefits of Fabric's Data Science Workflow:

  • Unified Platform: Eliminates the need to switch between different tools and environments.
  • Seamless Integration: Integrates with OneLake, Power BI, and other Fabric components.
  • Scalability and Performance: Leverages Azure's cloud infrastructure for scalable model training and deployment.
  • Collaboration: Enables data scientists and engineers to collaborate effectively.
  • Simplified Deployment: Streamlines the deployment process, reducing time-to-production.

Microsoft Fabric empowers data scientists to build and deploy machine learning models efficiently, accelerating the delivery of valuable insights. By leveraging its unified platform and robust capabilities, you can unlock the full potential of your data and drive impactful business outcomes.



Friday, December 06, 2024

Unleashing Data Democracy: How Microsoft Fabric Powers a Data Mesh Architecture

 

The traditional centralized data lake or warehouse often struggles to keep pace with the growing complexity and volume of modern data. Enter the data mesh, a decentralized architectural approach that empowers domain-specific teams to own and manage their data as products. Microsoft Fabric, with its unified platform and robust capabilities, is perfectly positioned to support and enable this transformative approach.

What is a Data Mesh?

A data mesh is a decentralized socio-technical approach to data management. It shifts the focus from centralized data ownership to distributed ownership by domain-specific teams. Key principles include:

  • Domain Ownership: Domains own their data as products, with clear interfaces and service-level agreements.
  • Data as a Product: Data is treated as a product, with discoverability, addressability, trustworthiness, and security.
  • Self-Serve Data Infrastructure as a Platform: A platform provides the necessary infrastructure for domains to manage their data independently.
  • Federated Computational Governance: Decentralized governance with standardized global policies.

How Microsoft Fabric Enables a Data Mesh:

Fabric's unified platform seamlessly aligns with the data mesh principles:

  • OneLake as a Decentralized Data Lake: OneLake provides a single, logical data lake across the entire organization, enabling domain-specific data zones while maintaining global accessibility. This supports domain ownership and data as a product.
  • Workspaces for Domain Ownership: Fabric workspaces allow domains to manage their data products independently, controlling access, security, and lifecycle.
  • Data Products with Lakehouses and Data Warehouses: Domains can build data products using Lakehouses (for diverse data types) or Data Warehouses (for structured analytics), tailored to their specific needs.
  • Data Flows and Pipelines for Self-Serve Data Infrastructure: Fabric's data integration tools enable domains to build and manage their own data pipelines, promoting self-service.
  • Microsoft Purview Integration for Federated Governance: Purview provides a centralized governance layer, enabling data discovery, lineage tracking, and policy enforcement across the data mesh.

Benefits of a Data Mesh with Microsoft Fabric:

  • Increased Agility and Speed: Domains can independently manage their data, reducing dependencies and accelerating time-to-insight.
  • Improved Data Quality and Relevance: Domain experts, who understand their data best, are responsible for its quality and accuracy.
  • Enhanced Innovation and Experimentation: Domains can easily explore and experiment with their data, fostering innovation.
  • Scalability and Flexibility: The decentralized architecture allows the data mesh to scale easily and adapt to changing business needs.
  • Reduced Data Silos: OneLake and Purview promote data sharing and collaboration across domains.

Scenarios and Examples:

  • Retail Company:
    • The "Product" domain manages product data in a Lakehouse, providing APIs for other domains to access product information.
    • The "Customer" domain owns customer data in a Data Warehouse, offering analytical reports and customer segmentation data products.
    • Fabric workspaces and OneLake zones ensure data isolation and ownership, while Purview enables data discovery and governance.
  • Financial Services:
    • The "Trading" domain manages real-time market data in a Lakehouse, offering data streams and analytical dashboards as data products.
    • The "Risk Management" domain owns risk data in a Data Warehouse, providing risk reports and predictive models.
    • Fabric's security features and Purview's governance capabilities ensure compliance with regulatory requirements.
  • Healthcare Organization:
    • The "Patient Records" domain manages patient data in a lakehouse, with strict access control, and data masking to protect sensitive information.
    • The "Research" domain has a workspace to access de-identified patient data for research purposes.
    • OneLake provides a central repository for all data, while Purview helps track data lineage, and ensure compliance with HIPAA.

Embracing the Future of Data Management:

Microsoft Fabric empowers organizations to adopt a data mesh architecture, unlocking the potential of their data and accelerating their digital transformation. By embracing domain ownership, data as a product, and self-serve infrastructure, organizations can build a more agile, scalable, and innovative data ecosystem.

Wednesday, November 13, 2024

Securing Your Data in Microsoft Fabric: Security Best Practices

Microsoft Fabric offers a powerful, unified analytics platform, but with great power comes great responsibility – securing your data. As you leverage Fabric for data warehousing, lakehouse architectures, and advanced analytics, implementing robust security measures is paramount. This post outlines key security best practices to protect your valuable data within the Fabric ecosystem.

Understanding Fabric's Security Layers

Fabric's security model is built on layers, encompassing:

  • Azure Active Directory (Azure AD): For identity and access management.
  • Workspace Security: Controlling access to Fabric workspaces and their contained items.
  • Data Security: Protecting data at rest and in transit.
  • Row-Level Security (RLS) and Object-Level Security (OLS): Restricting data access based on user roles and permissions.

Best Practices for Securing Your Fabric Environment:

1. Implement Strong Identity and Access Management (IAM) with Azure AD:

  • Scenario: A company has multiple departments accessing sensitive customer data within Fabric.
  • Best Practice:
    • Utilize Azure AD groups to assign roles and permissions based on job functions.
    • Enforce multi-factor authentication (MFA) to prevent unauthorized access.
    • Implement least privilege principle, granting only necessary permissions.
    • Use Service Principals when applications need to access data.
  • Example: Create Azure AD groups like "Marketing Analysts," "Sales Managers," and "Data Scientists," assigning appropriate Fabric roles to each.

2. Secure Fabric Workspaces:

  • Scenario: A project involves sensitive financial data, and access needs to be tightly controlled.
  • Best Practice:
    • Use workspace roles (Admin, Member, Contributor, Viewer) to manage access levels.
    • Regularly review workspace permissions and remove unnecessary access.
    • Create separate workspaces for different projects or data sensitivity levels.
  • Example: Create a dedicated workspace for the financial data project, granting only authorized personnel Admin or Contributor roles.

3. Protect Data at Rest and in Transit:

  • Scenario: Data needs to be encrypted to comply with regulatory requirements.
  • Best Practice:
    • Leverage Azure Storage Service Encryption (SSE) to encrypt data at rest within OneLake.
    • Ensure data is transmitted over HTTPS to encrypt data in transit.
    • Utilize Private Links to ensure that network traffic stays within the Microsoft Azure backbone.
  • Example: Enable SSE for your OneLake storage account, and configure network security groups to restrict traffic to authorized sources.

4. Implement Row-Level Security (RLS) and Object-Level Security (OLS):

  • Scenario: Sales representatives should only see data related to their assigned regions.
  • Best Practice:
    • Use RLS to filter rows based on user attributes or roles.
    • Use OLS to restrict access to specific columns or tables.
    • Implement dynamic RLS to automatically filter data based on user context.
  • Example: Create RLS rules in Power BI datasets to filter sales data based on the sales representative's region, as defined in Azure AD.

5. Monitor and Audit Security Activities:

  • Scenario: Detecting and responding to potential security breaches is crucial.
  • Best Practice:
    • Enable Azure Monitor and Azure Sentinel to collect and analyze security logs.
    • Set up alerts for suspicious activities, such as unusual login attempts or data access patterns.
    • Regularly review audit logs to identify potential security vulnerabilities.
  • Example: Configure Azure Sentinel to alert on unusual login activity from unknown IP addresses, and set up dashboards to visualize security events.

6. Data Governance and Compliance:

  • Scenario: Meeting regulatory compliance such as GDPR, HIPAA, or CCPA.
  • Best Practice:
    • Implement data classification and labeling.
    • Establish data retention policies.
    • Utilize Microsoft Purview to govern and track sensitive data.
    • Perform regular security assessments and audits.
  • Example: Use Microsoft Purview to classify sensitive customer data and implement data loss prevention (DLP) policies to prevent unauthorized data sharing.

7. Secure External Data Access:

  • Scenario: Connecting to external data sources.
  • Best Practice:
    • Use secure connection strings, and store credentials securely using Azure Key Vault.
    • Implement network security measures to restrict access to external data sources.
    • Follow the principle of least privilege when granting access to external data.

By implementing these security best practices, you can build a robust and secure data environment in Microsoft Fabric, protecting your valuable data from unauthorized access and ensuring compliance with regulatory requirements.

What are the security measures you take from within Microsoft Fabric ?

Free Power BI Classes Week 3

Successfully completely the free Power BI classes for week 3 We continued with Lab 2. The participants learnt about creating stacked column ...