Performing Data Engineering on Microsoft HD Insight

Requerimientos

Experiencia en programación usando R y familiaridad con paquetes R comunes.Conocimiento de métodos estadísticos comunes y mejores prácticas de análisis de datos.Conocimientos básicos del sistema operativo Microsoft Windows y su funcionalidad básica.Conocimiento práctico de las bases de datos relacionales.

Materiales y Equipos

Equipos con 16GB de RAM

Dirigido a

Ingenieros de datos, arquitectos de datos, científicos de datos y desarrolladores de datos que planean implementar flujos de trabajo de ingeniería de big data en HDInsight.

Objetivos

El objetivo principal del curso es brindar a los alumnos la capacidad de planificar e implementar flujos de trabajo de big data en HDInsight.

Temario

Getting Started with HDInsight

  • What is Big Data?
  • Introduction to Hadoop
  • Working with MapReduce Function
  • Introducing HDInsight

Deploying HDInsight Clusters

  • Identifying HDInsight cluster types
  • Managing HDInsight clusters by using the Azure portal
  • Managing HDInsight Clusters by using Azure PowerShell

Authorizing Users to Access Resources

  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters

Loading data into HDInsight

  • Storing data for HDInsight processing
  • Using data loading tools
  • Maximising value from stored data

Troubleshooting HDInsight

  • Analyze HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite

Implementing Batch Solutions

  • Apache Hive storage
  • HDInsight data queries using Hive and Pig
  • Operationalize HDInsight

Design Batch ETL solutions for big data with Spark

  • What is Spark?
  • ETL with Spark
  • Spark performance

Analyze Data with Spark SQL

  • Implementing iterative and interactive queries
  • Perform exploratory data analysis

Analyze Data with Hive and Phoenix

  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix

Stream Analytics

  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics Jobs

Implementing Streaming Solutions with Kafka and HBase

  • Building and Deploying a Kafka Cluster
  • Publishing, Consuming, and Processing data using the Kafka Cluster
  • Using HBase to store and Query Data

Develop big data real-time processing solutions with Apache Storm

  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm

Create Spark Streaming Applications

  • Working with Spark Streaming
  • Creating Spark Structured Streaming Applications
  • Persistence and Visualization

 

METODOLOGÍA
Tú elijes: Presencial y/o Videoconferencia

Cursos de calendario 0

No hay Cursos de Calendario agendados.