R introduction

Rstudio

RStudio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphics.
RStudio was founded by JJ Allaire, creator of the programming language ColdFusion. Hadley Wickham is the Chief Scientist at RStudio.

ggplot2

ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005. It’s a function to build plots piece by piece.
The concept behind ggplot2 divides plot into three different fundamental parts: Plot = data + Aesthetics + Geometry.

The principal components of every plot can be defined as follow:

  • data is a data frame
  • Aesthetics is used to indicate x and y variables. It can also be used to control the color, the size or the shape of points, the height of bars, etc.
  • Geometry defines the type of graphics (histogram, box plot, line plot, density plot, dot plot, ..)

There are two major functions in ggplot2 package:

qplot() stands for quick plot, which can be used to produce easily simple plots.
ggplot() function is more flexible and robust than qplot for building a plot piece by piece.

Install and load ggplot2 package:

#install.packages("ggplot2")
library(ggplot2)

plot type selection

One variable: Continuous

  • geom_area(): Create an area plot
  • geom_density(): Create a smooth density estimate
  • geom_dotplot(): Dot plot
  • geom_freqpoly(): Frequency polygon
  • geom_histogram(): Histogram
  • stat_ecdf(): Empirical Cumulative Density Function
  • stat_qq(): quantile - quantile plot

One variable: Discrete

Two variables: Continuous X, Continuous Y

  • geom_point(): Scatter plot
  • geom_smooth(): Add regression line or smoothed conditional mean
  • geom_quantile(): Add quantile lines from a quantile regression
  • geom_rug(): Add marginal rug to scatter plots
  • geom_jitter(): Jitter points to reduce overplotting
  • geom_text(): Textual annotations

Two variables: Continuous bivariate distribution

  • geom_bin2d(): Add heatmap of 2d bin counts
  • geom_hex(): Add hexagon bining
  • geom_density_2d(): Add contours from a 2d density estimate

Two variables: Continuous function

Two variables: Discrete X, Continuous Y

  • geom_boxplot(): Box and whiskers plot
  • geom_violin(): Violin plot
  • geom_dotplot(): Dot plot
  • geom_jitter(): Strip charts
  • geom_line(): Line plot
  • geom_bar(): Bar plot

Two variables: Discrete X, Discrete Y

Two variables: Visualizing error

  • geom_crossbar(): Hollow bar with middle indicated by horizontal line
  • geom_errorbar(): Error bars
  • geom_errorbarh(): Horizontal error bars
  • geom_linerange() and geom_pointrange(): An interval represented by a vertical line
  • Combine geom_dotplot and error bars

Two variables: Maps

Three variables

Interesting R

Pure Love

n=50000;
r=0.7;r_e=(1-r*r)^.5;
X=rnorm(n);
Y=X*r+r_e*rnorm(n);
Y=ifelse(X>0,Y,-Y);
plot(X,Y,col="red", main = "Pure Love")

Love is colorful

n=50000;
r=0.7;r_e=(1-r*r)^.5;
X=rnorm(n);
Y=X*r+r_e*rnorm(n);
Y=ifelse(X>0,Y,-Y);
a<-sample(c(2,6,7,8),50000,T)
b<-sample(c(76,79,86,69),50000,T)
plot(X,Y,col=0, main = "Love is colorful")
text(X,Y,"lOVE",col=a)

Embarrassing

library(ggplot2)
f <- function(x) 1/(x^2-1)
x <- seq(-3,3, by=0.001)
y <- f(x)
d <- data.frame(x=x,y=y)

p <- ggplot()
p <- p+geom_rect(fill = "white",color="black",size=3,
                 aes(xmin=-3, xmax=3, ymin=-3,ymax=3, alpha=0.1))

p <- p + geom_line(data=d, aes(x,y), size=3)+ylim(-3,3)
p <- p + theme_bw() +
    theme(axis.text.x=element_blank(),
          axis.text.y=element_blank(),
          legend.position="none",
          panel.grid.minor=element_blank(),
          panel.grid.major=element_blank(),
          panel.background=element_blank(),
          axis.ticks=element_blank(),
          panel.border=element_blank())

p <- p+xlab("")+ylab("")
print(p)

So, R is quite interesting and attracting, why not learn it. Let’s go~

References

Be Awesome in ggplot2
R STHDA
Interesting R
Cookbook for R
ggplot2 for bignners
ggplot2 - Essentials
Guide to Create Beautiful Graphics in R (Book)

CHENYUAN

CHENYUAN

CHENYUAN
Pursuing the dream and the best future

CHENYUAN Blog Homepage

因为不想遗忘! 在这个信息大爆炸的年代,最重要的是对知识的消化-吸收-重铸。每天学了很多东西,但是理解的多少,以及能够运用多少是日后成功的关键。作为一个PhD,大脑中充斥了太多的东西,同时随着年龄的增长,难免会忘掉很多事情。所以只是为了在众多教程中写一个自己用到的,与自己...… Continue reading