Product image media
Product image media

Analyzing Big Data with Microsoft R

14.100,00 kr

Beskrivelse


Lær at bruger Microsoft R Server til at analysere store datasæt i Big Data miljøer med Hadoop, Spark cluster, og SQL Server databaser. Efter kurset vil deltageren være i stand til:

  • Forklare hvordan Microsoft R Server og Microsoft R Client arbejder sammen
  • Bruge R-klient med R-server til at udforske store datasæt fra forskellige datagrundlag
  • Visualiser data ved hjælp af grafer og diagrammer
  • Transformere og rengøre store datasæt
  • Implementere muligheder for at splitte analysearbejdet i forskellige parallelle opgaver
  • Bygge og evaluer regressionsmodeller genereret fra store datasæt
  • Oprette og implementer partitioning models genereret fra store datasæt
  • Bruge R i SQL Server og Hadoop miljøer

Indhold


Module 1: Microsoft R Server and R Client

  • Explain how Microsoft R Server and Microsoft R Client work.
  • Lessons
  • What is Microsoft R server
  • Using Microsoft R client
  • The ScaleR functions
  • Lab : Exploring Microsoft R Server and Microsoft R Client
  • Using R client in VSTR and RStudio
  • Exploring ScaleR functions
  • Connecting to a remote server Module 2: Exploring Big Data
  • At the end of this module the student will be able to use R Client with R Server to explore big data held in different data stores.
  • Lessons
  • Understanding ScaleR data sources
  • Reading data into an XDF object
  • Summarizing data in an XDF object
  • Lab : Exploring Big Data
  • Reading a local CSV file into an XDF file
  • Transforming data on input
  • Reading data from SQL Server into an XDF file
  • Generating summaries over the XDF data Module 3: Visualizing Big Data
  • Explain how to visualize data by using graphs and plots.
  • Lessons
  • Visualizing In-memory data
  • Visualizing big data
  • Lab : Visualizing data
  • Using ggplot to create a faceted plot with overlays
  • Using rxlinePlot and rxHistogram Module 4: Processing Big Data
  • Explain how to transform and clean big data sets.
  • Lessons
  • Transforming Big Data
  • Managing datasets
  • Lab : Processing big data
  • Transforming big data
  • Sorting and merging big data
  • Connecting to a remote server Module 5: Parallelizing Analysis Operations
  • Explain how to implement options for splitting analysis jobs into parallel tasks.
  • Lessons
  • Using the RxLocalParallel compute context with rxExec
  • Using the revoPemaR package
  • Lab : Using rxExec and RevoPemaR to parallelize operations
  • Using rxExec to maximize resource use
  • Creating and using a PEMA class Module 6: Creating and Evaluating Regression Models
  • Explain how to build and evaluate regression models generated from big data
  • Lessons
  • Clustering Big Data
  • Generating regression models and making predictions
  • Lab : Creating a linear regression model
  • Creating a cluster
  • Creating a regression model
  • Generate data for making predictions
  • Use the models to make predictions and compare the results Module 7: Creating and Evaluating Partitioning Models
  • Explain how to create and score partitioning models generated from big data.
  • Lessons
  • Creating partitioning models based on decision trees.
  • Test partitioning models by making and comparing predictions
  • Lab : Creating and evaluating partitioning models
  • Splitting the dataset
  • Building models
  • Running predictions and testing the results
  • Comparing results Module 8: Processing Big Data in SQL Server and Hadoop
  • Explain how to transform and clean big data sets.
  • Lessons
  • Using R in SQL Server
  • Using Hadoop Map/Reduce
  • Using Hadoop Spark
  • Lab : Processing big data in SQL Server and Hadoop
  • Creating a model and predicting outcomes in SQL Server
  • Performing an analysis and plotting the results using Hadoop Map/Reduce
  • Integrating a sparklyr script into a ScaleR workflow