Analyzing Big Data with Microsoft R
День проведения: По запросу
Тип: Электронное обучение
Категория: Анализ данных
About this course
The open-source programming language R has for a long time been popular (particularly in academia) for data processing and statistical analysis. Among R's strengths are that it's a succinct programming language and has an extensive repository of third party libraries for performing all kinds of analyses. Together, these two features make it possible for a data scientist to very quickly go from raw data to summaries, charts, and even full-blown reports. However, one deficiency with R is that traditionally it uses a lot of memory, both because it needs to load a copy of the data in its entirety as a data.frame object, and also because processing the data often involves making further copies (sometimes referred to as copy-on-modify). This is one of the reasons R has been more reluctantly received by industry compared to academia.
The main component of Microsoft R Server (MRS) is the RevoScaleR package, which is an R library that offers a set of functionalities for processing large datasets without having to load them all at once in the memory. RevoScaleR offers a rich set of distributed statistical and machine learning algorithms, which get added to over time. Finally, RevoScaleR also offers a mechanism by which we can take code that we developed on our laptop and deploy it on a remote server such as SQL Server or Spark (where the infrastructure is very different under the hood), with minimal effort.
In this course, we will show you how to use MRS to run an analysis on a large dataset and provide some examples of how to deploy it on a Spark cluster or a SQL Server database. Upon completion, you will know how to use R for big-data problems.
Since RevoScaleR is an R package, we assume that the course participants are familiar with R. A solid understanding of R data structures (vectors, matrices, lists, data frames, environments) is required. Familiarity with 3rd party packages such as dplyr is also helpful.
What you'll learn
You will learn how to use MRS to read, process, and analyze large datasets including:
- Read data from flat files into R’s data frame object, investigate the structure of the dataset and make corrections, and store prepared datasets for later use
- Prepare and transform the data
- Calculate essential summary statistics, do crosstabulation, write your own summary functions, and visualize data with the ggplot2 package
- Build predictive models, evaluate and compare models, and generate predictions on new data
Кто может принять участие:
This course is part of the Microsoft Professional Program Certificate in Data Science and Microsoft Professional Program Certificate in Big Data.
Pursue a Verified Certificate to highlight the knowledge and skills you gain ($99 USD)
Official and Verified
Receive an instructor-signed certificate with the institution's logo to verify your achievement and increase your job prospects
Add the certificate to your CV or resume, or post it directly on LinkedIn
Give yourself an additional incentive to complete the course
Support our Mission
EdX, a non-profit, relies on verified certificates to help fund free education for everyone globall
Senior Content Developer
УчастиеКрайний срок регистрации 16 Ноябрь 2019
To participate in this training, you can Enroll
Поделись с друзьями
Информация об учебном центре
Тип компании: Другое
Количество сотрудников: 500-1500
Founded by Harvard University and MIT in 2012, edX is an online learning destination and MOOC provider, offering high-quality courses from the world’s best universities and institutions to learners everywhere.
With more than 130 global partners, we are proud to count the world’s leading universities, nonprofits, and institutions as our members. EdX university members top the QS World University Rankings® with our founders receiving the top honors, and edX partner institutions ranking highly on the full list.
Our Global Learning Community
Our students come from every country in the world! Whether you are interested in computer science, languages, engineering, psychology, writing, electronics, biology, or marketing, we have the course for you! Enroll today and learn something new.
We were founded by and continue to be governed by colleges and universities. We are the only leading MOOC provider that is both nonprofit and open source.
Open edX is the open-source platform that powers edX courses and is freely available. With Open edX , educators and technologists can build learning tools and contribute new features to the platform, creating innovative solutions to benefit students everywhere.