#### **Data Analysis and Machine Learning with R and Spark **

**Using PostgreSQL and shiny with a dynamic leaflet map: monitoring trash cans
**

When there is increased social activity, trash cans can get full quicker. On the contrary, during very cold weather, the trash cans can take one or a couple of more days to get full. Therefore, knowing when the trash cans are full lets us pick them up right away rather than waiting for a specific day of the week to come.

The code is available on GitHub

**Leaflet, Plotly and Shiny: Weather Forecasts In The Northeast
**

Integrating JavaScript libraries with R helps create interactive visualizations. This blog post uses Leaflet, which is the leading open-source JavaScript library for interactive maps, and plotly to create weather forecast visualizations... more

**Email and Text Message Alerts Based on Streaming Sensor Data
**

How can we get email and text message alerts when sensors either fail or transmit abnormal reading? If we have a dashboard that is built based on dynamic data and we want alerts when some conditions are met, how can we do that? One option is using R-shiny. The code is available on GitHub

**Using Shiny to Demo Your Machine Learning Model
**

Shiny is a good way to demo your machine learning model or to submit your machine learning challenge so that others can quickly upload test data and try it out. Read more

The code is available on GitHub

**Integrating Tableau with R through R Notebooks and Shiny for Descriptive, Inferential and Predictive Analytics**

Integrating R Notebooks and R shiny with Tableau enables us to have descriptive, inferential and predictive analytics in our Tableau story/dashboard... more

**Machine Learning with Python scikit-learn Vs R Caret - Part 1**

Integrating R Notebooks and R shiny with Tableau enables us to have descriptive, inferential and predictive analytics in our Tableau story/dashboard... more

####
**Tableau’s Level Of Detail Expressions With R Shiny**

This is a continuation to the previous blog post on Tableau’s Level Of Detail (LOD) Expressions with R.
We will solve questions using both R Shiny and Tableau's Level of Detail Expressions.
In the previous post, we saw the **fixed**Tableau LOD expression. In this post, we will see, **include** and **exclude** expressions. ...
more.

**Logistic Regression Regularized with Optimization
**

Logistic regression predicts the probability of the outcome being true. In this blog post, we will build logistic regression models to predict whether a student gets admitted into a university and whether microchips from a fabrication plant pass quality assurance ... more

**Anomaly Detection with R
**

Anomaly detection is used for different applications. It is a commonly used technique for fraud detection. It is also used in manufacturing to detect anomalous systems such as aircraft engines. It can also be used to identify anomalous medical devices and machines in a data center. In this blog post, we will implement anomaly detection algorithm and apply it to detect failing servers on a network ... more

**Analytical and Numerical Solutions, with R, to Linear Regression Problems
**

This post shows how to implement numerical and analytical solutions to linear regression problems using R. This is the first programming exercise in the coursera machine learning course offered by Andrew Ng. The course is offered with Matlab/Octave. Since R is the lingua franca data science tool, I plan to do all the programming exercises in Andrew's course with R ... more

####
**Visualizing Streaming Data with Shiny**

This post is on visualizing streaming data with Shiny... more.

**Using SparkR in Rstudio with Hadoop Deployed on AWS EC2 - part 1
**

SparkR provides a distributed data frame implementation that supports operations
like selection, filtering, aggregation etc, similar to the **dplyr** R package but on large datasets.
SparkR also supports distributed machine learning using MLlib...
more

**Using SparkR in Rstudio with Hadoop Deployed on AWS EC2 - part 2
**

In a previous post, we saw how to install R, Rstudio server and R packages on AWS EC2 Red Hat cluster to use with Hortonworks Data Platform (HDP 2.4) Hadoop distribution. Now, let’s use SparkR for data munging... more

**Sentiment Analysis of Donald Trump's views on Muslims using R and Tableau
**

In this post, we will focus on how to integrate R and Tableau for text mining, sentiment analysis and visualization. Using these tools together enables us to answer detailed questions... more

####
**Machine Learning with Python scikit-learn Vs R Caret - Part 1**

We will use the Scikit-learn library in Python and the Caret package in R. In this part, we will first perform exploratory Data Analysis (EDA) on a real-world dataset, and then apply non-regularized linear regression to solve a supervised regression problem on the dataset... more.

**Working with databases in*** R*

*R*

The dplyr package, which is one of my favorite R packages, works with in-memory data and with data stored in databases. In this post, I will share my experience on using dplyr to work with databases... more

####
**JSON data manipulation: the R way vs the Python way**

JavaScript Object Notation (JSON) is the most common data format used for asynchronous browser/server communication and knowing how to work with JSON data is important as we get various datasets out there in this format. In this blog post, we will see how to analyse JSON data with R. In the next blog post, we will perform the same tasks using Python... more.

**My Two favorite Packages for Data Manupilation in*** R*

*R*

*dplyr* and *data.table* are so awesome as they make data manipulation more fun. Both packages have their strengthes.
While dplyr is more elegant and resembles natural language, data.table is succint and we can do a lot with data.table in just a single line.
Further, data.table is, in some cases, faster and it may be a go-to package when performance and memory are constraints...
more

**Performing SQL selects on R data frames
**

For anyone who has SQL background and who wants to learn R, I guess the sqldf package is very useful because it enables us to use SQL commands in R. One who has basic SQL skills can manipulate data frames in R using their SQL skills... more

**Machine Learning for Drug Adverse Event Discovery
**

Clustering can be used for knowledge discovery in drug adverse event reactions. Specially in cases where the data has millions of observations, where we cannot get any insight visually, clustering becomes handy for summarizing our data, for getting statistical insights and for discovering new knowledge... more

**Semi-automated rainfall prediction models for any geographic region using R (Shiny)**

Here, I used shiny, an R package that makes it easy to build interactive web applications (apps) straight from R, HTML, CSS and JavaScript to develop semi-automated machine learning models to predict rainfall over a region the user selects. The user can extract predictand and predictors by drawing a polygon over a region. Then, the user can select some or all of the machine learning algorithms provided. Provided models include Linear regression models (GLM, SGLM), Tree-based ensemble models (Random Forest and Boosting), Support vector Machines, Artificial Neural Network, and other non-linear models (GAM, SGAM, MARS). Finally, the user can download the analysis steps they used, such as the region they selected, the time period they specified, the predictand and predictors they chose and preprocessing options they used, and the model results in PDF or HTML format. A quick demo is shown in the video below.

Server.R and ui.R codes are on GitHub

**Supervised Machine Learning with R and Python**

Here I show how to build various machine learning models in Python and R. The models include linear regression, logistic regression, tree-based models (bagging and random forest) and support vector machines (SVM)... more

####
**Web scraping with R using rvest: Population of U.S. states and territories**

In this post, we will use the **rvest** web scraping R package to scrape US population data from Wikipedia and use **ggplo2** to visualize the population data by state... more.

####
**US Hospital Ranking Shiny App**

This is my shiny app that helps to see the performances of various US hospitals in heart attack, heart failure and pneumonia. We can select a state and outcome and see the rank of a hospital in that state and compare its performance with all hospitals across the nation.... more.

####
**How is climate changing and where with R**

Even if global temperature has risen, the magnitude varies from region to region. Here, I use R to investigate how temperature has changed in the last 110 years... more.

####
**How Many Live on How Much, and Where with ****Shiny**

**Shiny**

This is a shiny app that shows world population by income. The income data can be downloaded from
here and the shape file can be downloaded from here.
The ui.R and server.R codes are available here

**Using linear and non-linear regression models to predict global temperature**

In this project, the performances of support vector machines, neural network, boosting, classification and regression trees, random forest and linear models, such as generalized linear model, lasso, ridge regression and elastic net, in predicitng average global temperature anomaly are compared... more

**Ensemble Machine Learning Techniques for Human Activity Recognition**

In this project, ensemble tree-based predictive models that determine the manner an exercise is done are built. The models considered are Random Forest, Adaptive Boosting and Bagged Adaptive Boosting... more

#### Velloso et al. (2013)

**ggplot in R and Python**

The grammar of graphics package **(ggplot2)**
is the best data visualization library in R. The concept of grammar of graphics is
also implemented in Python with the library **ggplot**
and it has similar commands to **ggplot2**...
more

#### **Correlation map of climate variables**

To understand physical mechanisms and to develop statistical rainfall prediction models, correlation analysis is used as a first step. Here, I show how to use R to generate correlation map between rainfall and sea surface temperature... more

#### **Reproducible Research with R**

It is now possible to collect a large amount of data about personal movement using activity monitoring devices such as a Fitbit (http://www.fitbit.com), Nike Fuelband (http://www.nike.com/us/en_us/c/nikeplusfuelband), or Jawbone Up (https://jawbone.com/up). These type of devices are part of the “quantified self” movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech... more

** Downloading data from the web using R**

There are different ways that we can download and read data into R. Some examples are shown... more

**Google scholar scraping with R**

In this post, I will show how to scrape google scholar account.
Particularly, we will use the **'rvest'** R package to scrape the google scholar account
of my advisor. We will see his coauthors, how many times they are cited and thier affilations...
more

#### **Slidify presentation of a Shiny App**

Slidify helps to create data-centric presentations. It allows embedded code chunks and mathematical formulas to be rendered correctly. Final products are HTML files, which can be viewed with any web browser and shared easily.

Shiny is an R package that makes it easy to build interactive web applications straight from R. Here I have developed a shiny app that calculates the area average rainfall and temperature climatologies and trend over any region selected by the user over Africa (see presentation).

** Composite analysis to capture non-linear relationships**

Though correlation is able to capture linear relationships, since it does not handle non-linear relationships, composite analysis is also widely used to understand physical mechanisms and to develop statistical rainfall prediction models. Here, I show how to use R to generate composites of rainfall based on sea surface temperature... more.

#### **Most Harmful Storms and Weather Events In The United States **

This report seeks to investigate storms and other weather events that cause the highest number of fatalities and injuries. Moreover, it shows which events have the greatest economic consequences... more.

#### **Hospital Rankings In The United States **

Here, I compare the performance of hospitals in the USA using data that come from the Hospital Compare web site (http://hospitalcompare.hhs.gov) run by the U.S. Department of Health and Human Services. Hospital rankings are performed on state-wide and nation-wise basis considering different outcomes... more.

#### **Car Fuel Efficiency and Transmission Type **

In this analysis, the relationship between a set of variables and miles per gallon (MPG) is explored using the mtcars dataset. Particularly, the MPG difference between automatic and manual transmissions is evaluated and quantified using multivariate linear regression models... more.

#### **The Role of Regular Expressions in Creating a Tidy Data **

In this analysis, let's prepare a tidy data that can be used for later analysis employing regular expressions in R and demonstrate the strength of regular expressions... more.

**World's Biggest Companies **

Let's visualize the distribution of the world's biggest companies using the Forbes2000 data from HSAUR2 package... more.

**Approximating distributions **

Here, let's see the approximation of some distributions by other distributions, when certain criteria are met, through simulations... more.

**Predicting Earnings from census data **

In this problem, we are going to use census information about an individual to predict how much a person earns -- in particular, whether the person earns more than $50,000 per year... more.

**Letter Recognition **

This is letter recognition exercise using tree-based models... more.

####
**Quick overview of climate trends using ****Shiny**

**Shiny**

Global climate is changing and this change is apparent across a wide range of observations. The impacts of climate change on rainfall and temperature varies from region to region. Shiny, which is an R package that makes it easy to build interactive web applications (apps) straight from R, can be used to see the trends of different climate variables over different parts of the world very quickly. Here, I developed a Shiny App that displays the trends of temperature and rainfall over any selected region over Africa. This app can be used as a starting point in studying impacts of, adaptation to and mitigation of climate change over a region. The app is available on RStudio.

####
**Working with Dates and Times in R using Power Consumption Data**

Here data from the UC Irvine Machine Learning Repository, a popular repository for machine learning datasets, is used to show how to work with dates in R... more.

####
**Text Analytics with R**

This lab is on text analytics with R using logistic regression and regression trees... more.

####
**Visualizing election predictions using ggplot2**

Here, ggplot2 is used to visualize US presidential election predictions... more.

####
**Visualizing murder rates by state in the US with ggplot2**

Let's visualize murder rate by state in the US using ggplot2... more.

####
**Using simulation to demosntrate the Central Limit Theorem**

The central limit theorem (CLT) states that the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution (Wikipedia). Here, I demonstrate this theorem using simulations... more.

####
**The Effect of Vitamin C on Tooth Growth in Guinea Pigs**

####
**Exploratory Analysis of Fine Particulate Matter**

In this analysis, the trend of fine particulate matter over time, by source, in the United States and over specific cities is investigated... more.

####
**Spatial distribution of Food and Drug Administration's adverse events reports data**

This post shows how to download the quartley FDA's adverse events reports data, concatenate them and do some analysis using the dplyr package and display them by country and gender using ggplot2... more.

####
**Visualizing outcomes of drug related adverse events **

How many of the drug related adverse events in the FDA database resulted in deaths, disabilities, hopspitalization, etc? Here we will download various datasets and join them... more.

####
**Web scraping with R using rvest: List of countries by proven oil reserves**

In this post, let's see how to scrape information from the web using the rvest R package. Specifically, we will scrape the "List of countries by proven oil reserves" table from wikipedia... more.

Wikipedia

####
**Web Scraping and Natural Language Processing: most commonly used words in a journal paper**

I am practicing web scrapping, regular expressions and natural language processing in R. In this post, I will find the most commonly used words in one of my published papers... more.

####
**Text Mining, Scraping and Sentiment Analysis with R: Russia this week**

In this post, I am scraping twitter to understand what has been being said about Russia and its relations with the middle east. Particularly, we will see the sentiment of posts from November 24-29, 2015. For this excercise, we will consider posts in English... more.

####
**Top searches associated with each nation with R**

In this post, we will get top searches associated with each nation.
In doing so, first, we will scrape the list of world countries from

####
**Google Trends Analytics using Shiny**

In this post, I will show how we can use Shiny to analyse Google Trends data and create a dashboard. Shiny is an R package that makes it easy to build interactive web apps straight from R. For a nice look and feel, we will use the shinydashboard package.... more.

####
**Visualizing world cities using R**

In this post, we will create a world map that shows world cities using **ggplot2**.
We will get the cities and their attributies from Wikipedia. After we get the world cities data from Wikipedia,
we will use the string manipulation packages **stringi** and **stringr**
.... more.

####
**Installing and loading many R packages at once**

I like Jupyter notebook because it enables me to use **R**, **Python** and **Matlab** on the same session.
Recently, I was trying to insall other kernels and for unknown reason my Jupyter notebook crushed.
Then, I uninstalled Anaconda and reinstalled it. The problem, I have lost all **R packages** I installed over time
.... more.

####
**Document Clustering**

Clustering is a non-supervised learning technique which has wide applications. Some examples where clustering is commonly applied are market segmentation, social network analytics, and astronomical data analysis. This post is on document clustering or text clustering, which is a very popular application of clustering algorithms. We will see K-means clustering and Hierarchical clustering... more.

####
**A Shiny Dashboard of Adverse Drug Event Reports**

This is a shiny dashboard developed using openFDA data. The data is in JSON format. The R library jsonlite is used to access the data from the openFDA website and change it to data frame. The user can select one , a couple or all types of events.

The ui.R and server.R codes are available here####
**PDF Mining with R using Shiny**

This application helps to get useful insights from PDF documents by creating visualizations and summarizations. It also enables searching, sorting and filtering. We can browse through lots of documents in a single click and get a summary and comparison of the documents in minutes.... more.

####
**The importance of Data Visualization**

Before we perform any analysis and come up with any assumptions about the distribution of and relationships between variables in our datasets, it is always a good idea to visualize our data in order to understand their properties and identify appropriate analytics techniques... more.

####
**Using Amazon Relational Database Service with Python and R**

Amazon Relational Database Service (RDS) is a distributed relational database service by Amazon Web Services (AWS). It simplifies the setup, operation, and scaling of a relational database for use in applications. In this blog post, we will see how to use R and Python with Amazon RDS. AWS RDS has a free tier for anybody to use for testing/development efforts... more.

####
**Interactive visualization with R-Shiny versus with Tableau : Treemaps**

This post is on interactive treemap with Shiny and Tableau... more.

####
**Using MongoDB with R and Python**

This post shows how to use Python and R to analyse data stored in MongoDB, top NoSQL database engine in use today. When dealing with large volume data, using MongoDB can give us performance boost ... more.

####
**Data Manipulation with Python Pandas and R Data.Table**

Pandas is a commonly used data manipulation library in Python. Data.Table, on the other hand, is among the best data manipulation packages in R. Data.Table is succinct and we can do a lot with Data.Table in just a single line. Further, data.table is, generally, faster than Pandas (see benchmark here) and it may be a go-to package when performance is a constraint. For someone who knows one of these packages, I thought it could help to show codes that perform the same tasks in both packages side by side to help them quickly study the other. If you know either of them and want to learn the other, this blog post is for you ... more.

####
**Using Actions And Modals With DataTables For Data Exploration In Shiny**

Modal windows can be helpful for data exploration and to avoid cluttering in our shiny applications. The shiny app in this blog post is one example. It helps to explore various World Bank indicators and to make comparisons across nations... more.

####
**Tableau’s Level Of Detail Expressions With R - Part 1**

Data visualization tools perform some interesting analyses behind the scenes when we create marvelous dashboards through drag and drop. In this tutorial series, I will show how to accomplish Tableau’s level of detail expressions in R using programing ... more.

####
**Extreme Gradient Boosting (XGBoost) with R and Python**

Extreme Gradient Boosting is among the hottest libraries in supervised machine learning these days. It shines when we have lots of training data where the features are numeric or mixture of numeric and categorical fields. In this post, we will see how to use it in R and Python... more.

####
**Interactive Chord Diagrams in R/Shiny**

Chord diagrams are used to visualize network flows and interactions between different entities. This blog post shows how to create interactive chord diagrams using the JavaScript Library D3 in R/Shiny ... more.