¿Qué es ggplot2?

Es un paquete de R específicamente diseñado para producir gráficos, pero a diferencia de otros paquetes, ggplot2 tiene su propia gramática o lenguaje de programación. Esta gramática está basada en la “Gramática de Gráficos” (Wilkinson 2005), la cual está hecha de componentes independientes que pueden ser compuestos en muchas formas. Esto permite que ggplot2 pueda crear gráficos muy flexibles, específicos para cualquier situación o problema. El paquete ggplot2 nace en el 2005, con la idea de tomar y mejorar lo bueno de la graficación “base” y “lattice” que existía en R, para crear un modelo más robusto de graficación.

En ggplot2 podemos empezar con una capa de datos crudos y luego adicionar más capas de anotaciones y resúmenes estadísticos. Este paquete permite producir gráficos empleando la misma estructura de pensamiento que usamos al diseñar un análisis, reduciendo la distancia de como visualizamos un gráfico en la cabeza y el producto final en nuestra hoja de trabajo.

Aprender la gramática para crear gráficos en ggplot2 no solo será crucial para producir un gráfico de interés, pero también para pensar en otros gráficos más complejos. En los gráficos base de R, cuando diseñamos un gráfico, normalmente está compuesto de elementos de graficación crudos como puntos y líneas, y es difícil diseñar nuevos componentes que se combinen con gráficos existentes. En ggplot2, las expresiones usadas para crear un nuevo gráfico están compuestas de elementos más complejos de los datos crudos, que permiten ser combinados fácilmente con otros sets de datos o gráficos.

¿Cuál es la gramática de ggplot2?

Wilkinson (2005) fue el responsable de crear la gramática de graficación para describir todos los atributos que son parte de lo gráficos estadísticos. El libro “The layer grammar of graphics” de Wickham (2010) construye a partir de la gramática de Wilkinson, enfocándose en la importancia de las capas y adaptándolas a R. En general, la gramática de ggplot2 nos dice que un gráfico estadístico es un mapeo de los datos a los atributos estéticos (color, forma, tamaño) y objetos geométricos (puntos, líneas, barras). El gráfico también puede contener transformaciones estadísticas de los datos y esta dibujado en un sistema de coordenadas específicas.

Componentes

Todos los gráficos están compuestos de:

Data - los datos que queremos visualizar y un conjunto de elementos estéticos que describen como las variables de los datos serán mapeadas a atributos estéticos que podemos percibir.
Las capas, las cuales están hechas de elementos geométricos (geom puntos, líneas, polígonos, etc.) y transformaciones estadísticas o stat (por ejemplo, juntar y contar observaciones para producir histogramas o resumir relaciones de un modelo lineal).
scale - la escala, mapea valores en el espacio de los datos a valores en un espacio estético, ya sea color, tamaño o forma. La escala dibuja la leyenda o ejes (X-Y).
coord. - el sistema de coordenadas describe como las coordenadas de los datos son mapeados en el plano de los datos. También produce los ejes y las líneas de cuadriculas para poder leer mejor el gráfico. Normalmente se usa un sistema de coordenadas cartesianas, pero hay otros sistemas disponibles (coordenadas polares, por ejemplo).
facet - describe como separar (break) los datos en subsets y como mostrar estos subsets en múltiplos más pequeños.
theme - controla los elementos más finos como el tamaño de la fuente y color del fondo.

Otros paquetes de graficación

Actualmente existen otros paquetes que han sido creados para resolver situaciones específicas, empleando un poco del modelo de graficación de ggplot2 como base. Entre ellos se pueden mencionar los siguientes:

ggvis (el sucesor de ggplot2 para gráficos interactivos, pero aún en proceso)
vcd (Warnes 2015)
plotrix (Lemon et al. 2006)
gplots (Warnes 2015)

Una lista más completa de todas las herramientas gráficas de graficación se puede encontrar en http://cran.r-project.org/web/views/Graphics.html.

Instalación

para usar ggplot2, primero necesitamos instalarlo, ojalá en la versión más reciente de R (version 3.2.0+) de la página web http://r-project.org y luego bajar/instalar ggplot2:

install.packages("ggplot2")

Otros recursos

Existen miles de recursos en internet en donde podemos aprender más de ggplot2. Estos son solo algunas opciones:

http://docs.ggplot2.org/
http://groups.google.com/group/ggplot2
http://stackoverflow.com
https://github. com/jennybc/reprex
http://www.rstudio. com/resources/cheatsheets/
https://github.com/hadley/ggplot2-book

¿Cómo usar ggplot2?

Gráficos de puntos

A continuación vamos a usar nuestro de set de datos de tiburones de Costa Rica para ilustrar como podemos crear graficos con ggplots. Veamos la relación entre el tamaño y el dN15 (isótopo estable de N15) para el tiburón Mustelus henlei y la raya Raja velezi.

Figura 1. (a) Mustelus henlei, (b) Raja velezi

# Cargar librería
library(ggplot2) 

# Importar base de datos
dat <- read.csv("data.elasmos.csv", header = T, sep = ",")
head(dat)

##       date month month2 year     lat      long  depth         species
## 1 3/4/2010 March      3 2010 8.52032 -83.88766 252.45 Mustelus henlei
## 2 3/4/2010 March      3 2010 8.52032 -83.88766 252.45 Mustelus henlei
## 3 3/4/2010 March      3 2010 8.52032 -83.88766 252.45 Mustelus henlei
## 4 3/4/2010 March      3 2010 8.52032 -83.88766 252.45 Mustelus henlei
## 5 3/4/2010 March      3 2010 8.52032 -83.88766 252.45 Mustelus henlei
## 6 3/4/2010 March      3 2010 8.52032 -83.88766 252.45 Mustelus henlei
##      sex   TL      dC13     dN15
## 1   Male 47.3 -16.40551 15.45999
## 2   Male 31.7 -16.70021 14.66036
## 3 Female 35.6 -16.80240 14.97725
## 4 Female 29.1 -16.86952 14.81147
## 5   Male 25.5 -16.75631 14.33695
## 6 Female 31.7 -17.75917 14.75128

str(dat)

## 'data.frame':    174 obs. of  12 variables:
##  $ date   : Factor w/ 19 levels "10/26/2010","10/28/2010",..: 11 11 11 11 11 11 9 9 9 9 ...
##  $ month  : Factor w/ 10 levels "April","December",..: 6 6 6 6 6 6 3 3 3 3 ...
##  $ month2 : int  3 3 3 3 3 3 2 2 2 2 ...
##  $ year   : int  2010 2010 2010 2010 2010 2010 2011 2011 2011 2011 ...
##  $ lat    : num  8.52 8.52 8.52 8.52 8.52 ...
##  $ long   : num  -83.9 -83.9 -83.9 -83.9 -83.9 ...
##  $ depth  : num  252 252 252 252 252 ...
##  $ species: Factor w/ 2 levels "Mustelus henlei",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex    : Factor w/ 2 levels "Female","Male": 2 2 1 1 2 1 2 2 2 1 ...
##  $ TL     : num  47.3 31.7 35.6 29.1 25.5 31.7 47.5 47.5 49.5 43.8 ...
##  $ dC13   : num  -16.4 -16.7 -16.8 -16.9 -16.8 ...
##  $ dN15   : num  15.5 14.7 15 14.8 14.3 ...

# Subset de especies
mhe <- dat[dat$species=="Mustelus henlei", ]
rve <- dat[dat$species=="Raja velezi", ]

# Crear gráfico en ggplot2
ggplot(mhe, aes(x = TL, y = dN15)) + geom_point()

ggplot(rve, aes(x = TL, y = dN15)) + geom_point()

# Este código va a producir lo mismo
ggplot(mhe, aes(TL, dN15)) + geom_point()

ggplot(rve, aes(TL, dN15)) + geom_point()

Esto va producir un gráfico X/Y definido por: 1) “data”- mhe 2) “aes” - tamaño del tiburón en el eje x, composición isotópica 15N (dN15) en el eje y 3) “layer” - points

Atributos estéticos

Para agregar variables adicionales al gráfico, podemos usar otros atributos estéticos como color, forma, tamaño. Estos atributos pueden usarse con la función aes():

aes(TL, hwy, colour = sex)
aes(TL, hwy, shape = month)
aes(TL, hwy, size = depth)

# Gráfico con un color para cada sexo
ggplot(mhe, aes(TL, dN15, colour = sex)) + geom_point()

ggplot(rve, aes(TL, dN15, colour = sex)) + geom_point()

# Gráfico con un color para cada mes
ggplot(mhe, aes(TL, dN15, colour = month)) + geom_point()

ggplot(rve, aes(TL, dN15, colour = month)) + geom_point()

# Gráfico con un símbolo para cada sexo
ggplot(mhe, aes(TL, dN15, shape = sex)) + geom_point()

ggplot(rve, aes(TL, dN15, shape = sex)) + geom_point()

# Gráfico con un tamaño para cada profundidad
ggplot(mhe, aes(TL, dN15, size = depth)) + geom_point()

ggplot(rve, aes(TL, dN15, size = depth)) + geom_point()

Si quisieramos usar un atributo estético fijo, podemos ponerlo controlando los parámetros de la función aes(). Por ejemplo,

# Gráfico 1
ggplot(mhe, aes(TL, dN15)) + geom_point(aes(colour = "blue"))

ggplot(rve, aes(TL, dN15)) + geom_point(aes(colour = "blue"))

# Gráfico 2
ggplot(mhe, aes(TL, dN15)) + geom_point(colour = "blue")

ggplot(rve, aes(TL, dN15)) + geom_point(colour = "blue")

# Cambiar las formas de los puntos, el tamaño y definir color azul
ggplot(mhe, aes(TL, dN15, shape = sex)) + geom_point(colour = "blue", size = 4)

ggplot(rve, aes(TL, dN15, shape = sex)) + geom_point(colour = "blue", size = 4)

# Cambiar las formas de los puntos y definir color azul
ggplot(mhe, aes(TL, dN15, shape = sex)) + geom_point(colour = "blue")

ggplot(rve, aes(TL, dN15, shape = sex)) + geom_point(colour = "blue")

# Cambiar las formas de los puntos y colores
ggplot(mhe, aes(TL, dN15, shape = sex, color = sex)) + geom_point()

ggplot(rve, aes(TL, dN15, shape = sex, color = sex)) + geom_point()

# Cambiar las formas de los puntos, colores y tamaños
ggplot(mhe, aes(TL, dN15, shape = sex, color = sex)) + geom_point(size = 4)

ggplot(rve, aes(TL, dN15, shape = sex, color = sex)) + geom_point(size = 4)

Diferentes tipos de atributos estéticos trabajan mejor con diferentes tipos de variables. Por ejemplo, “colour” y “shape” trabajan mejor con variables categóricas, mientras que “size” con variables continuas.

Facetting

Otra técnica para visualizar variables categóricas en un gráfico es “facetting”, lo cual crea cuadros con gráficos al separar los datos en subsets. Existen dos tipos de “facetting”: “grid” y “wrapped”.

# Crear paneles para machos y hembras
ggplot(mhe, aes(TL, dN15)) + 
  geom_point() + 
  facet_wrap(~sex, nrow = 2)

ggplot(rve, aes(TL, dN15)) + 
  geom_point() + 
  facet_wrap(~sex, nrow = 2)

ggplot(rve, aes(TL, dN15)) + 
  geom_point() + 
  facet_wrap(~sex, nrow = 2, strip.position = "bottom")

ggplot(rve, aes(TL, dN15)) + 
  geom_point() + 
  facet_wrap(~sex, nrow = 2, strip.position = "right")

¿Qué pasaría si tratamos de producir mútiples gráficos usando una variable continua y no categórica?

# Crear múltiples gráficos 
ggplot(mhe, aes(TL, dN15)) + geom_point() + 
  facet_wrap(~depth)

ggplot(rve, aes(TL, dN15)) + geom_point() + 
  facet_wrap(~depth)

Tipos de gráficos

Podemos sustituir “geom_point()” por otra función de geometría para producir un gráfico diferente. Por ejemplo, algunas de las geometrías más comúnes en ggplot2 son:

geom_smooth() - introduce un “smoother” a los datos y se puede visualizar tanto el “smooth” como el error estándar.
geom_boxplot() - produce un boxplot para resumir la distribución de los datos.
geom_histogram() y geom_freqpoly() - muestra la distribución de variables contínuas.
geom_bar() - muestra la distribución de variables categóricas.
geom_path() y geom_line() - grafica líneas entre los puntos

Agregar un Smoother

Si relación X/Y presenta mucho ruido, a veces es dificil ver el patrón dominante. En estos casos puede ser útil agregar una “línea suavisada”.

# Agregar smoother
ggplot(mhe, aes(TL, dN15)) + geom_point() + 
  geom_smooth()

## `geom_smooth()` using method = 'loess'

ggplot(rve, aes(TL, dN15)) + geom_point() + 
  geom_smooth()

## `geom_smooth()` using method = 'loess'

# Agregar línea sin el intervalo de confianza
ggplot(rve, aes(TL, dN15)) + geom_point() + 
  geom_smooth(se = F)

## `geom_smooth()` using method = 'loess'

ggplot(rve, aes(TL, dN15)) + geom_point() + 
  geom_smooth(se = T, level = 0.50)

## `geom_smooth()` using method = 'loess'

ggplot(rve, aes(TL, dN15)) + geom_point() + 
  geom_smooth(se = T, level = 0.70)

## `geom_smooth()` using method = 'loess'

Un argumento importante para geom_smooth() es el método, el cual nos permite escoger que tipo de modelo se usa para crear la “línea suavisada”. El método “loess” es el método por defecto para muestras pequeñas, el cual usa una regresión local suavisada. Que tan ondulada es la línea esta controlada por el parámetro “span” (0-1).

# Agregar smoother con span bajo
ggplot(mhe, aes(TL, dN15)) + 
  geom_point() + 
  geom_smooth(span = 0.2)

## `geom_smooth()` using method = 'loess'

# Agregar smoother con span alto
ggplot(mhe, aes(TL, dN15)) + 
  geom_point() + 
  geom_smooth(span = 0.8)

## `geom_smooth()` using method = 'loess'

ggplot(mhe, aes(TL, dN15)) + 
  geom_point() + 
  geom_smooth(span = 0.8, fill = "red", colour = "red4", 
              lty = 2, size = 1)

## `geom_smooth()` using method = 'loess'

ggplot(mhe, aes(TL, dN15)) + 
  geom_point() + 
  geom_smooth(span = 0.8, fill = "red", colour = "green", 
              lty = 2, lwd = 2.5)

## `geom_smooth()` using method = 'loess'

Cuando nuestro n es más alto que 1000, podemos usar otros métodos que funcionan mejor que “loess”.

# Cargar paquetes
require(mgcv)

## Loading required package: mgcv

## Loading required package: nlme

## This is mgcv 1.8-17. For overview type 'help("mgcv-package")'.

# Usar método gam
ggplot(mhe, aes(TL, dN15)) +
  geom_point() + 
  geom_smooth(method = "gam", formula = y ~ s(x))

ggplot(rve, aes(TL, dN15)) +
  geom_point() + 
  geom_smooth(method = "gam", formula = y ~ s(x))

# Usar método lm
ggplot(mhe, aes(TL, dN15)) +
  geom_point() + 
  geom_smooth(method = "lm")

ggplot(rve, aes(TL, dN15)) +
  geom_point() + 
  geom_smooth(method = "lm")

A veces la variable respuesta no es continua sino binomial (ej., precencia-ausencia, 0-1, machos y hembras), y el “fit” de los datos se ajusta a un modelo binomial. En el siguiente ejemplo veremos como podemos usar “geom_smooth” cuando los datos se ajustan a una curva binomial.

# Cargar librería
library(vcdExtra)

## Loading required package: vcd

## Loading required package: grid

## Loading required package: gnm

# Datos
head(Titanicp)

##   pclass survived    sex     age sibsp parch
## 1    1st survived female 29.0000     0     0
## 2    1st survived   male  0.9167     1     2
## 3    1st     died female  2.0000     1     2
## 4    1st     died   male 30.0000     1     2
## 5    1st     died female 25.0000     1     2
## 6    1st survived   male 48.0000     0     0

dim(Titanicp)

## [1] 1309    6

str(Titanicp)

## 'data.frame':    1309 obs. of  6 variables:
##  $ pclass  : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ...
##  $ survived: Factor w/ 2 levels "died","survived": 2 2 1 1 1 2 2 1 2 1 ...
##  $ sex     : Factor w/ 2 levels "female","male": 1 2 1 2 1 2 1 2 1 2 ...
##  $ age     : num  29 0.917 2 30 25 ...
##  $ sibsp   : num  0 1 1 1 1 0 1 0 2 0 ...
##  $ parch   : num  0 2 2 2 2 0 0 0 0 0 ...

Titanicp <- Titanicp[!is.na(Titanicp$age),]
dim(Titanicp)

## [1] 1046    6

# Crear variable de sobrevivencia numérica
Titanicp$survived2 <- as.numeric(Titanicp$survived) - 1
head(Titanicp)

##   pclass survived    sex     age sibsp parch survived2
## 1    1st survived female 29.0000     0     0         1
## 2    1st survived   male  0.9167     1     2         1
## 3    1st     died female  2.0000     1     2         0
## 4    1st     died   male 30.0000     1     2         0
## 5    1st     died female 25.0000     1     2         0
## 6    1st survived   male 48.0000     0     0         1

# Gráfico
p <- ggplot(Titanicp, aes(age, survived2, color = sex)) + 
  geom_smooth(method="glm", family = binomial, formula = y ~ x, 
              alpha = 0.2, size = 2, aes(fill = sex))

## Warning: Ignoring unknown parameters: family

p1 <- p + geom_point(position = position_jitter(height = 0.03, width = 0)) + 
  xlab("Edad") + ylab("Prob. Sobrevivencia")
p1

p1 + theme_minimal()

p1 + theme_classic()

Ejercicios

Grafique la relación entre el tamaño del tiburón y la composición isotópica (dN15) usando el método lm, y muestre: (i) puntos más grandes (size = 4), (ii) puntos de color rojo / borde azul, y (iii) la línea de ajuste del modelo lineal negra e intermitente (hint: lty = 2).
Grafique la relación entre el tamaño del tiburón y la composición isotópica (dN15) usando el método lm, y muestre: (i) puntos de color rojo, (ii) puntos más grandes de acuerdo a variable “depth”, y (iii) la línea de ajuste del modelo lineal verde e intermitente (hint: lty = 2).
Grafique la relación entre el tamaño del tiburón y la composición isotópica (dN15) usando el método lm, y muestre: (i) puntos de color rojo para hembras y puntos azules para machos, y (ii) la línea de ajuste del modelo lineal (intermitente) roja para hembras y azul para machos (hint: lty = 2).
Grafique la relación entre el tamaño del tiburón y la composición isotópica (dN15) usando el método lm, y muestre: (i) puntos de diferentes colores de acuerdo a la variable profundidad - “depth” (gradiente de color), y (ii) la línea de ajuste del modelo lineal azul e intermitente (hint: lty = 2).

Boxplots and Jitter Points

Gráficos jitter - geom_gitter(), agrega ruido aleatorio a los datos para evitar sobregraficar.
Gráficos boxplot - geom_boxplot(), resume la forma de la distribución.
Gráficos de violín - geom_violin(), muestra una representación compacta de la densidad de la distribución, mostrando áreas con más datos.

# Gráfico jitter
ggplot(mhe, aes(sex, dN15)) + geom_jitter()

ggplot(rve, aes(sex, dN15)) + geom_jitter()

# Gráfico boxplot
ggplot(mhe, aes(sex, dN15)) + geom_boxplot()

ggplot(rve, aes(sex, dN15)) + geom_boxplot()

# Gráfico de violín
ggplot(mhe, aes(sex, dN15)) + geom_violin()

ggplot(rve, aes(sex, dN15)) + geom_violin()

# Gráfico de violín con jitter
ggplot(mhe, aes(sex, dN15)) + geom_violin() +
  geom_jitter(width = 0.1)

ggplot(rve, aes(sex, dN15)) + geom_violin() +
  geom_jitter(width = 0.1)

Histogramas y gráficos de frecuencia

Estos gráficos proporcionan más información acerca de la distribución que los boxplots, pero requieren de más espacio.

# Histogramas
ggplot(mhe, aes(dN15)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(rve, aes(dN15)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Gráficos de frecuencia
ggplot(mhe, aes(dN15)) + geom_freqpoly()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(rve, aes(dN15)) + geom_freqpoly()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Podemos controlar el ancho de los grupos del histograma o gráfico de frecuencia con el argumento (parámetro) “binwidth” (si no queremos que esten espaciados a la misma distancia podemos usar el argumento “breaks”). Por defecto, cuando usamos histogramas o gráficos de frecuencia, R nos separa los datos en 30 grupos.

# Gráfico 1
ggplot(mhe, aes(dN15)) + geom_histogram(binwidth = 2.5)

ggplot(mhe, aes(dN15)) + geom_freqpoly(binwidth = 2.5)

ggplot(rve, aes(dN15)) + geom_histogram(binwidth = 2.5)

ggplot(rve, aes(dN15)) + geom_freqpoly(binwidth = 2.5)

# Gráfico 2
ggplot(mhe, aes(dN15)) + geom_histogram(binwidth = 1)

ggplot(mhe, aes(dN15)) + geom_freqpoly(binwidth = 1)

ggplot(rve, aes(dN15)) + geom_histogram(binwidth = 1)

ggplot(rve, aes(dN15)) + geom_freqpoly(binwidth = 1)

# Gráfico 3
ggplot(mhe, aes(dN15)) + geom_histogram(binwidth = 0.2)

ggplot(mhe, aes(dN15)) + geom_freqpoly(binwidth = 0.2)

ggplot(rve, aes(dN15)) + geom_histogram(binwidth = 0.2)

ggplot(rve, aes(dN15)) + geom_freqpoly(binwidth = 0.2)

Para comparar la distribución de múltiples grupos, podemos “mapear” una variable categórica con “fill” (para geom_histogram()) o “colour” (para geom_freqpoly()). Es más fácil comparar distribuciones de dos o más grupos usando polígonos de frecuencia que histográmas. Podemos también usar “facetting” cuando tenemos varios grupos.

# Gráfico 1
ggplot(mhe, aes(dN15, colour = sex)) + geom_freqpoly(binwidth = 0.5)

ggplot(rve, aes(dN15, colour = sex)) + geom_freqpoly(binwidth = 0.5)

# Gráfico 2
ggplot(mhe, aes(dN15, fill = sex)) + geom_histogram(binwidth = 0.5) +
  facet_wrap(~sex, ncol = 1)

ggplot(rve, aes(dN15, fill = sex)) + geom_histogram(binwidth = 0.5) +
  facet_wrap(~sex, ncol = 1)

Gráficos de barras

Es un análogo de un histograma, pero para variables discretas. A continuación vamos a importar el set de datos de “quinn.csv” (Peak & Quinn 1993). En este estudio se investigó el número de invertebrados que se reclutan en densidades altas y bajas de parches de mejillones en una zona intermareal rocosa, y como varía según la estación del año.

Figura 2. Parches de mejillones

# Importar set de datos "quinn"
df <- read.csv("quinn.csv", header = T, sep = ",")
head(df)

##   SEASON DENSITY RECRUITS SQRTRECRUITS      GROUP
## 1 Spring     Low       15     3.872983  SpringLow
## 2 Spring     Low       10     3.162278  SpringLow
## 3 Spring     Low       13     3.605551  SpringLow
## 4 Spring     Low       13     3.605551  SpringLow
## 5 Spring     Low        5     2.236068  SpringLow
## 6 Spring    High       11     3.316625 SpringHigh

table(df$DENSITY)

## 
## High  Low 
##   24   18

# Gráfico de barras del número de reclutas según tratamiento de densidad
ggplot(df, aes(DENSITY)) + geom_bar()

Sin embargo, normalmente lo que realmente queremos representar en un gráfico de barras son los valores resumidos de los promedios. Por ejemplo,

# Visualizar los datos
head(df)

##   SEASON DENSITY RECRUITS SQRTRECRUITS      GROUP
## 1 Spring     Low       15     3.872983  SpringLow
## 2 Spring     Low       10     3.162278  SpringLow
## 3 Spring     Low       13     3.605551  SpringLow
## 4 Spring     Low       13     3.605551  SpringLow
## 5 Spring     Low        5     2.236068  SpringLow
## 6 Spring    High       11     3.316625 SpringHigh

# Calcular el promedio de reclutas según tratamiento de densidad
df2 <- aggregate(RECRUITS ~ DENSITY, df, mean)

# Gráficos de barras
ggplot(df2, aes(DENSITY, RECRUITS)) + geom_bar(stat = "identity")

# Ajustar eje Y
p1 <- ggplot(df2, aes(DENSITY, RECRUITS)) + geom_bar(stat = "identity")
p1 + coord_cartesian(ylim=c(14, 22))

Ejemplos 1

# Calcular el promedio del # de reclutas por estación
df2 <- aggregate(RECRUITS ~ SEASON, df, mean)
names(df2) <- c("season", "rec")  

# Gráfico de barras
p <- ggplot(df2, aes(season, rec)) +
  geom_bar(stat="identity")
p

# Cambiar la orientación
p + coord_flip()

### Cambiar el ancho y colores de las barras ###

# Cambiar el ancho de las barras
ggplot(df2, aes(season, rec)) +
  geom_bar(stat="identity", width = 0.5)

ggplot(df2, aes(season, rec)) +
  geom_bar(stat="identity", width = 0.2)

# Cambiar colores
ggplot(df2, aes(season, rec)) +
  geom_bar(stat="identity", color = "blue", fill="white")

# Remover el tema y agregar barras azules
p <- ggplot(df2, aes(season, rec)) +
  geom_bar(stat="identity", fill="steelblue") + theme_minimal()
p

# Escoger que barras mostrar
p + scale_x_discrete(limits = c("Spring", "Winter"))

## Warning: Removed 2 rows containing missing values (position_stack).

### Agregar anotaciones ###

df2$rec <- round(df2$rec, 1)

# Fuera de las barras
ggplot(df2, aes(season, rec)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = rec), vjust = -0.3, size = 3.5) +
  theme_minimal()

# Dentro de las barras
ggplot(df2, aes(season, rec)) +
  geom_bar(stat = "identity", fill = "yellow") +
  geom_text(aes(label = rec), vjust = 1.6, size = 3.5) +
  theme_minimal()

### Cambiar colores de barras por grupo ###

# Cambiar el color de las líneas externas por grupo
p <- ggplot(df2, aes(season, rec, color = season)) +
  geom_bar(stat = "identity", fill = "white")
p

### Cambiar el color manualmente ###

# Usar "palette" predefinido de colores
p + scale_color_manual(values = c("red", "blue", "green", "yellow"))

# Usar "palette brewer"
p + scale_color_brewer(palette = "Dark2")

p + scale_color_brewer(palette = 1)

p + scale_color_brewer(palette = 2)

# Usar escala de grises
p + scale_color_grey() + theme_classic()

### Cambiar colores de relleno de las barras ###

# Color de relleno
p <- ggplot(df2, aes(season, rec, fill = season)) +
  geom_bar(stat = "identity") + theme_minimal()
p

### Cambiar el color de relleno manualmente ###

# Usar "palette" predefinido de colores
p + scale_fill_manual(values = c("red", "blue", "green", "yellow"))

# Usar "palette brewer"
p + scale_fill_brewer(palette = "Dark2")

p + scale_fill_brewer(palette = 1)

p + scale_fill_brewer(palette = 2)

# Usar escala de grises
p + scale_fill_grey() + theme_classic()

# Usar color negro del borde y relleno
ggplot(df2, aes(season, rec, fill = season)) +
  geom_bar(stat = "identity", color = "black") + 
  scale_fill_manual(values = c("red", "blue", "green", "yellow")) +
  theme_classic()

### Cambiar la posición de la leyenda ###

# Cambiar el color de las barras a tonos de azúl
p <- ggplot(df2, aes(season, rec, fill = season)) +
  geom_bar(stat = "identity") + theme_minimal()

p1 <- p + scale_fill_brewer(palette = "Blues")
p1 + theme(legend.position = "top")

p1 + theme(legend.position = "bottom")

# Remover la leyenda
p1 + theme(legend.position = "none")

### Cambiar el órden de los items en una leyenda ###

# Cambiar órden de grupos
p1 + scale_x_discrete(limits = c("Summer", "Autumn", "Winter", "Spring"))

### Editar aún más los elementos de la leyenda ###

# Cambiar el título de la leyenda y la fuente
p1 <- ggplot(df2, aes(season, rec, fill = season)) +
  geom_bar(stat = "identity", color = "black")

titulo <- "Estación"

p2 <- p1 + scale_fill_manual(titulo, values = c("red", "blue", "green", "yellow")) +
  theme_classic()

p2 + theme(legend.title = element_text(colour = "blue", size = 10,
                                       face = "bold"))

p2 + theme(legend.text = element_text(colour = "blue", size = 10,
                                       face = "bold"))

# Cambiar el color del fondo de la leyenda
p2 + theme(legend.background = element_rect(fill = "lightblue",
                                            size = 0.5, linetype = "solid"))

p2 + theme(legend.background = element_rect(fill = "lightblue",
                                            size = 0.5, linetype = "solid"))

### Gráfico de barras con múltiples grupos ###
head(df)

##   SEASON DENSITY RECRUITS SQRTRECRUITS      GROUP
## 1 Spring     Low       15     3.872983  SpringLow
## 2 Spring     Low       10     3.162278  SpringLow
## 3 Spring     Low       13     3.605551  SpringLow
## 4 Spring     Low       13     3.605551  SpringLow
## 5 Spring     Low        5     2.236068  SpringLow
## 6 Spring    High       11     3.316625 SpringHigh

df.n <- aggregate(RECRUITS ~ SEASON + DENSITY, df, mean)
names(df.n) <- c("season", "density", "rec")

# Gráfico 1
ggplot(df.n, aes(season, rec, fill = density)) +
  geom_bar(stat="identity")

# Gráfico 2
ggplot(df.n, aes(season, rec, fill = density)) +
  geom_bar(stat="identity", position=position_dodge())

# Cambiar el color manualmente
p <- ggplot(df.n, aes(season, rec, fill = density)) +
  geom_bar(stat="identity", color = "black", position=position_dodge()) +
  theme_minimal()

p + theme_classic()

# Colores manuales
p + scale_fill_manual(values = c('#999999','#E69F00')) + theme_classic()

# Palette de colores
p + scale_fill_brewer(palette = "Blues") + theme_classic()

### Gráficos de barras con error ###

# Calcular promedio +/- SD
head(df)

##   SEASON DENSITY RECRUITS SQRTRECRUITS      GROUP
## 1 Spring     Low       15     3.872983  SpringLow
## 2 Spring     Low       10     3.162278  SpringLow
## 3 Spring     Low       13     3.605551  SpringLow
## 4 Spring     Low       13     3.605551  SpringLow
## 5 Spring     Low        5     2.236068  SpringLow
## 6 Spring    High       11     3.316625 SpringHigh

df3 <- aggregate(RECRUITS ~ SEASON + DENSITY, df, FUN = function(x) c(mean = mean(x), sd = sd(x)))
df4 <- cbind(data.frame(df3[,c(1:2)]), df3$RECRUITS[,1], df3$RECRUITS[,2])

names(df4) <- c("season", "density", "mean", "sd")
str(df4)

## 'data.frame':    8 obs. of  4 variables:
##  $ season : Factor w/ 4 levels "Autumn","Spring",..: 1 2 3 4 1 2 3 4
##  $ density: Factor w/ 2 levels "High","Low": 1 1 1 1 2 2 2 2
##  $ mean   : num  19.67 10 48.17 5.67 18.25 ...
##  $ sd     : num  11.94 4.82 14.99 3.33 3.1 ...

# Grafico de barras y desviación estándar
p <- ggplot(df4, aes(season, mean, fill = density)) + 
  geom_bar(stat="identity", position=position_dodge(), color = "black") +
  geom_errorbar(aes(ymin = mean, ymax = mean + sd), width = 0.2,
                position = position_dodge(0.9))
  
p + scale_fill_brewer(palette="Paired") + theme_minimal()

p + scale_fill_brewer(palette="Paired") + theme_classic()

### Personalizar aún más los gráficos ###

# Poner títulos de ejes
p + xlab("Estación") + 
  ylab("Número de reclutas") +
  theme_classic()

# Remover títulos de ejes
p + xlab(NULL) + 
  ylab(NULL) +
  theme_classic()

# Otros cambios
p2 <- p + labs(x = "Estación", y = "Número de reclutas") + 
  scale_fill_manual(values = c("black", "grey50")) +
  theme_classic()
p2

# Cambiar los límites X y Y
p2 + ylim(0,100)

# Cambiar límites con expand_limits
p2 + expand_limits(x = c(0, 6), y = c(0, 80))

A continuación vamos a usar la función “ggsave” para exportar un gráfico a nuestro folder de trabajo.

# Grafico de barras y desviación estándar del ejemplo anterior
p <- ggplot(df4, aes(season, mean, fill = density)) + 
  geom_bar(stat="identity", position=position_dodge(), color = "black") +
  geom_errorbar(aes(ymin = mean, ymax = mean + sd), width = 0.2,
                position = position_dodge(0.9))
  
p2 <- p + scale_fill_brewer(palette="Paired") + theme_classic()

# Exportar gráfico
ggsave("plot.png", plot = p2, width = 5, height = 5)

Ejemplos 2

A continuación veremos otros ejemplo, en los que aplicaremos transformaciones directas a los ejes de los gráficos.

head(rve)

##         date    month month2 year      lat      long  depth     species
## 77 5/26/2011      May      5 2011 9.557160 -84.79745 149.60 Raja velezi
## 78  3/4/2010    March      3 2010 8.520320 -83.88766 252.45 Raja velezi
## 79 2/27/2011 February      2 2011 8.458203 -83.68366  59.84 Raja velezi
## 80 2/27/2011 February      2 2011 8.458203 -83.68366  59.84 Raja velezi
## 81 2/27/2011 February      2 2011 8.458203 -83.68366  59.84 Raja velezi
## 82 2/27/2011 February      2 2011 8.458203 -83.68366  59.84 Raja velezi
##       sex   TL      dC13     dN15
## 77 Female 52.9 -16.14579 14.97830
## 78 Female 32.2 -16.50151 14.13812
## 79   Male 51.9 -16.35581 14.95036
## 80   Male 29.2 -16.55418 14.58886
## 81   Male 46.5 -16.91351 14.86255
## 82   Male 52.5 -16.54070 15.29162

# Gráficos bases
p <- ggplot(rve, aes(TL, dN15)) + 
  geom_point(size = 4, pch = 21, colour = "black", bg = "red") + 
  theme_minimal()

# Transformación logaritmo base 2
p + scale_x_continuous(trans = "log2") + scale_y_continuous(trans = "log2")

# Transformación de raiz cuadrada (1)
p + scale_x_continuous(trans = "sqrt") + 
  scale_y_continuous(trans = "sqrt")

# Transformación de raiz cuadrada (2)
p + scale_y_sqrt()

# Revertir coordenadas
p + scale_y_reverse()

# Transformaciones de coordenadas
p + coord_trans(x = "log2", y = "log2")

p + coord_trans(x = "log10", y = "log10")

p + coord_trans(x = "sqrt", y = "sqrt")

Ejemplos 3

A continuación veremos como cambiar el formato de los ejes

# Log2 scaling of the y axis (with visually-equal spacing)
require(scales)

## Loading required package: scales

p + scale_y_continuous(trans = log2_trans())

# show exponents
p + scale_y_continuous(trans = log2_trans(),
    breaks = trans_breaks("log2", function(x) 2^x),
    labels = trans_format("log2", math_format(2^.x)))

# Percent
p + scale_y_continuous(labels = percent)

# dollar
p + scale_y_continuous(labels = dollar)

# scientific
p + scale_y_continuous(labels = scientific)

### Agregar "tick marks" ###

# Cargar librerías
library(MASS)

head(Animals)

##                     body brain
## Mountain beaver     1.35   8.1
## Cow               465.00 423.0
## Grey wolf          36.33 119.5
## Goat               27.66 115.0
## Guinea pig          1.04   5.5
## Dipliodocus     11700.00  50.0

# x and y axis are transformed and formatted
p2 <- ggplot(Animals, aes(x = body, y = brain)) + geom_point(size = 4) +
     scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x))) +
     scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x))) +
     theme_bw()

# log-log plot without log tick marks
p2

# Show log tick marks
p2 + annotation_logticks()

# # Log ticks on left and right
p2 + annotation_logticks(sides = "lr")

# All sides
p2 + annotation_logticks(sides = "trbl")

Series de tiempo

A continuación vamos a usar nuestro set de datos “Silvertips.csv” para producir algunos gráficos sencillos de series de tiempo.

# Importar base de datos
silvertip <- read.csv("Silvertips.csv", header = T, sep = ",")

head(silvertip)

##        date hour   tag tagNo  sex  FL    stage month.year month.yearID
## 1  5/6/2013   15 14808     1 Male 115 Immature     May-13            1
## 2  5/6/2013   16 14808     1 Male 115 Immature     May-13            1
## 3 5/10/2013   11 14808     1 Male 115 Immature     May-13            1
## 4 5/10/2013   12 14808     1 Male 115 Immature     May-13            1
## 5 5/10/2013   13 14808     1 Male 115 Immature     May-13            1
## 6 5/12/2013    4 14808     1 Male 115 Immature     May-13            1
##   month  diel    depth tide     temp
## 1   May   day 25.40800 1.55 25.80167
## 2   May   day 27.41714 1.94 25.76000
## 3   May   day 14.36667 2.19 25.50500
## 4   May   day 25.58696 1.62 25.64167
## 5   May   day 27.05680 1.21 25.70000
## 6   May night 14.60000 1.25 25.14167

str(silvertip)

## 'data.frame':    9544 obs. of  14 variables:
##  $ date        : Factor w/ 182 levels "1/1/2014","1/11/2014",..: 120 120 107 107 107 108 108 110 110 110 ...
##  $ hour        : int  15 16 11 12 13 4 5 9 10 11 ...
##  $ tag         : int  14808 14808 14808 14808 14808 14808 14808 14808 14808 14808 ...
##  $ tagNo       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ sex         : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
##  $ FL          : int  115 115 115 115 115 115 115 115 115 115 ...
##  $ stage       : Factor w/ 2 levels "Immature","Mature": 1 1 1 1 1 1 1 1 1 1 ...
##  $ month.year  : Factor w/ 12 levels "Apr-14","Aug-13",..: 9 9 9 9 9 9 9 9 9 9 ...
##  $ month.yearID: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ month       : Factor w/ 12 levels "Apr","Aug","Dec",..: 9 9 9 9 9 9 9 9 9 9 ...
##  $ diel        : Factor w/ 2 levels "day","night": 1 1 1 1 1 2 2 1 1 1 ...
##  $ depth       : num  25.4 27.4 14.4 25.6 27.1 ...
##  $ tide        : num  1.55 1.94 2.19 1.62 1.21 1.25 2.02 3.67 3.62 3.14 ...
##  $ temp        : num  25.8 25.8 25.5 25.6 25.7 ...

# Definir formato de tiempo 
silvertip$date <- as.POSIXlt(strptime(as.character(silvertip$date), "%m/%d/%Y"))
silvertip$date <- as.POSIXct(silvertip$date, origin="1960/01/01", tz = "UTC")
silvertip$date2 <- as.Date(silvertip$date)

# Seleccionar tagNo 1408
d <- silvertip[silvertip$tag=="14808",]
d$tag <- factor(d$tag)

head(d)

##         date hour   tag tagNo  sex  FL    stage month.year month.yearID
## 1 2013-05-06   15 14808     1 Male 115 Immature     May-13            1
## 2 2013-05-06   16 14808     1 Male 115 Immature     May-13            1
## 3 2013-05-10   11 14808     1 Male 115 Immature     May-13            1
## 4 2013-05-10   12 14808     1 Male 115 Immature     May-13            1
## 5 2013-05-10   13 14808     1 Male 115 Immature     May-13            1
## 6 2013-05-12    4 14808     1 Male 115 Immature     May-13            1
##   month  diel    depth tide     temp      date2
## 1   May   day 25.40800 1.55 25.80167 2013-05-06
## 2   May   day 27.41714 1.94 25.76000 2013-05-06
## 3   May   day 14.36667 2.19 25.50500 2013-05-10
## 4   May   day 25.58696 1.62 25.64167 2013-05-10
## 5   May   day 27.05680 1.21 25.70000 2013-05-10
## 6   May night 14.60000 1.25 25.14167 2013-05-12

# Serie de tiempo de profundidad
ggplot(d, aes(date, depth)) + geom_line()

p1 <- ggplot(d, aes(date, depth)) + geom_line() + 
  theme_classic()
p1

# Serie 2 (agregar puntos rojos)
p1 + geom_point(color = "red", size = 2)

# Serie 3 (separar años y modificar un poco el gráfico)

head(d)

##         date hour   tag tagNo  sex  FL    stage month.year month.yearID
## 1 2013-05-06   15 14808     1 Male 115 Immature     May-13            1
## 2 2013-05-06   16 14808     1 Male 115 Immature     May-13            1
## 3 2013-05-10   11 14808     1 Male 115 Immature     May-13            1
## 4 2013-05-10   12 14808     1 Male 115 Immature     May-13            1
## 5 2013-05-10   13 14808     1 Male 115 Immature     May-13            1
## 6 2013-05-12    4 14808     1 Male 115 Immature     May-13            1
##   month  diel    depth tide     temp      date2
## 1   May   day 25.40800 1.55 25.80167 2013-05-06
## 2   May   day 27.41714 1.94 25.76000 2013-05-06
## 3   May   day 14.36667 2.19 25.50500 2013-05-10
## 4   May   day 25.58696 1.62 25.64167 2013-05-10
## 5   May   day 27.05680 1.21 25.70000 2013-05-10
## 6   May night 14.60000 1.25 25.14167 2013-05-12

d$year <- substring(d$date2, 1, 4)

ggplot(d, aes(date, depth)) + 
  geom_path(colour = "grey50") +
  geom_point(aes(colour = year), size = 2)

# Serie 4 (ordenar y separar por mes)

levels(d$month)

##  [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct"
## [12] "Sep"

d$month <- factor(d$month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", 
                                      "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))

d$month2 <- as.numeric(d$month)

# Gráfico 1
ggplot(d, aes(date, depth)) + 
  geom_path(colour = "grey50", lty = 5) +
  geom_point(aes(bg = month), pch = 21, size = 2)

# Gráfico 2
p1 <- ggplot(d, aes(date, depth)) + 
  geom_path(colour = "grey50", lty = 5) +
  geom_point(aes(colour = month2), size = 2)
p1

Gráficos de superficies

ggplot2 no produce gráficos 3d verdaderos, pero si tiene toda una plataforma para representar gráficos 3d en dos dimensiones: gráficos de contornos, gráficos de densidad y de burbujas.

# Gráficos de contornos y densidades
head(faithful)

##   eruptions waiting
## 1     3.600      79
## 2     1.800      54
## 3     3.333      74
## 4     2.283      62
## 5     4.533      85
## 6     2.883      55

ggplot(faithfuld, aes(eruptions, waiting)) + geom_contour(aes(z = density, colour = ..level..))

ggplot(faithfuld, aes(eruptions, waiting)) + geom_raster(aes(fill = density))

# Gráficos de burbujas funcionan mejor con pocas observaciones
small <- faithfuld[seq(1, nrow(faithfuld), by = 10), ] 

ggplot(small, aes(eruptions, waiting)) + 
  geom_point(aes(size = density), alpha = 1/3) +
  scale_size_area()

Ejercicios

Utilizando el set de datos del titatic (“Titanicp”) vamos a tratar de recrear los siguientes gráficos:

##   pclass survived    sex     age sibsp parch survived2
## 1    1st survived female 29.0000     0     0         1
## 2    1st survived   male  0.9167     1     2         1
## 3    1st     died female  2.0000     1     2         0
## 4    1st     died   male 30.0000     1     2         0
## 5    1st     died female 25.0000     1     2         0
## 6    1st survived   male 48.0000     0     0         1

## [1] 1046    7

## 'data.frame':    1046 obs. of  7 variables:
##  $ pclass   : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ...
##  $ survived : Factor w/ 2 levels "died","survived": 2 2 1 1 1 2 2 1 2 1 ...
##  $ sex      : Factor w/ 2 levels "female","male": 1 2 1 2 1 2 1 2 1 2 ...
##  $ age      : num  29 0.917 2 30 25 ...
##  $ sibsp    : num  0 1 1 1 1 0 1 0 2 0 ...
##  $ parch    : num  0 2 2 2 2 0 0 0 0 0 ...
##  $ survived2: num  1 1 0 0 0 1 1 0 1 0 ...

## [1] 1046    7

##   pclass survived    sex     age sibsp parch survived2
## 1    1st survived female 29.0000     0     0         1
## 2    1st survived   male  0.9167     1     2         1
## 3    1st     died female  2.0000     1     2         0
## 4    1st     died   male 30.0000     1     2         0
## 5    1st     died female 25.0000     1     2         0
## 6    1st survived   male 48.0000     0     0         1

## Warning: Ignoring unknown parameters: family

## Warning: Ignoring unknown parameters: family

Referencias

Lemon J (2006) Plotrix: a package in the red light district of R. R-News 6(4):8–12
Warnes GR, Bolker B, Bonebakker L, Gentleman R, Liaw WHA, Lumley T, Maechler M, Magnusson A, Moeller S, Schwartz M, Venables B (2015) gplots: various R programming tools for plotting data. R package version 2.17.0. https://CRAN.R-project.org/package=gplots
Wickham H (2010) A layered grammar of graphics. J Comput Graph Stat 19(1):3–28
Wilkinson L (2005) The grammar of graphics. Statistics and computing, 2nd edn. Springer, New York