ggplot2

# ggplot2
## Ein Überblick
### Andreas Filser
### Herbst 2019

---

# Datenvisualisierung - warum und wozu?

**(1.) Deskription: **  
+ Überblick zu den verwendeten Daten  
+ Sind nur junge oder alte Menschen in meinem Datensatz?  
+ Wie häufig kommt das analysierte Ereignis vor?

**(2.) Zusammenfassung der Ergebnisse:**
+ Schöner als Tabellen (bei guter Umsetzung)   
+ Schneller verständlich als Tabellen (dito)

***
> Diese Präsentation basiert teilweise auf dem *ggplot flipbook*  von [Gina Reynolds](https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1) und den *Data Visualization* Kursen von [Danielle Navarro](https://djnavarro.github.io/satrdayjoburg/) und [Charles Lanfear](https://clanfear.github.io/R_Visualization_Workshop/).

---
# Vorbemerkungen

+ Zuhören, später selbst machen

+ Prinzip verstehen, nicht Befehle auswendig lernen
  + Googlen!
  + das [R Graphics Cookbook](http://www.cookbook-r.com/Graphs/) bietet eine umfangreiche Sammlung an Beispielplots inkl. dazugehöriger Syntax 
  + das [ggplot2 Cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf) gibt eine Übersicht zu den wichtigsten Funktionen und Befehlen
  + Mehr Links am Ende

---
# ggplot2

Das Paket [**ggplot2**](https://ggplot2.tidyverse.org/) basiert auf der Idee der "layered grammar of graphics". Die zu erstellende Darstellung wird in verschiedene Parameter unterteilt:

* **data**: welche Daten sollen verwendet werden?

* **aes**thetics: welche Variablen sollen dargestellt werden?
  
* **geom**etrische Objekte: Punkte, Säulen, Linien, ...?

* **Koordinatensystem**: numerisch, Prozent, logarithmisch, Koordianten?

* **lab**els: Beschriftungen und Überschriften

* **Theme**: Farbschema, Schriftgröße

* ggf. Skalen

---
# ggplot2

Wie alle Pakete in R müssen wir das `ggplot2` einmal installieren und bei jedem Neustart einer R Session mit `library()` laden:

```r
install.packages("ggplot2")
library(ggplot2)
```

---
# Struktur von ggplot2
Ein Standardbefehl für `ggplot2` sieht in etwa so aus:

```r
ggplot(data = datensatz, aes(x = var1, y = var2, color = var3)) +
  geom_point() +
  labs(title= "Titel", subtitle = "Untertitel") +
  theme_minimal()
```

Zur besseren Verdeutlichung entzerre ich diese Befehle hier:

```r
ggplot(data = datensatz) +
  aes(x = var1) 
  labs(x = "x-Achsenbeschriftung") +
  aes(y = var2) +
  labs(y = "y-Achsenbeschriftung") +
  geom_point() +
  labs(title= "Titel") +
  labs(subtitle = "Untertitel") +
  aes(color = var3) +
  labs(color = "Titel für die Legende") +
  theme_minimal()
```

---
# Ein erstes Beispiel: Weihnachtsbäume

In den USA werden Weihnachtsbäume aus Plastik immer beliebter.

Die zugrunde liegenden Daten sind im long-shape - d.h. pro Jahr haben wir zwei Zeilen mit Angaben der Verkaufszahlen. Einmal für die echten Bäumen und einmal für die aus Plastik. Ein kleiner Blick auf das Datenset:

```r
head(christmas_trees,n= 4)
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> jahr </th>
   <th style="text-align:left;"> baumart </th>
   <th style="text-align:right;"> anz_baeume </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 2004 </td>
   <td style="text-align:left;"> real </td>
   <td style="text-align:right;"> 27.1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2004 </td>
   <td style="text-align:left;"> fake </td>
   <td style="text-align:right;"> 9.0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2005 </td>
   <td style="text-align:left;"> real </td>
   <td style="text-align:right;"> 32.8 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2005 </td>
   <td style="text-align:left;"> fake </td>
   <td style="text-align:right;"> 9.3 </td>
  </tr>
</tbody>
</table>

---

```r
*ggplot(data = christmas_trees)
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_1-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
* aes(x = jahr)
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_2-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
* aes(y = anz_baeume)
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_3-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
  aes(y = anz_baeume) +
* geom_point()
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_4-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
  aes(y = anz_baeume) +
  geom_point() +
* aes(color = baumart)
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_5-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
  aes(y = anz_baeume) +
  geom_point() +
  aes(color = baumart) +
* scale_color_manual(values=c("red","green4"))
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_6-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
  aes(y = anz_baeume) +
  geom_point() +
  aes(color = baumart) +
  scale_color_manual(values=c("red","green4")) +
* labs(title="Wie echt sind deine Blätter?")
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_7-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
  aes(y = anz_baeume) +
  geom_point() +
  aes(color = baumart) +
  scale_color_manual(values=c("red","green4")) +
  labs(title="Wie echt sind deine Blätter?") +
* labs(subtitle="Verkaufte Weihnachtsbäume in USA | Quelle: Statista")
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_8-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
  aes(y = anz_baeume) +
  geom_point() +
  aes(color = baumart) +
  scale_color_manual(values=c("red","green4")) +
  labs(title="Wie echt sind deine Blätter?") +
  labs(subtitle="Verkaufte Weihnachtsbäume in USA | Quelle: Statista") +
* labs(y = "Anzahl verkaufte Bäume (in Mio)")
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_9-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
  aes(y = anz_baeume) +
  geom_point() +
  aes(color = baumart) +
  scale_color_manual(values=c("red","green4")) +
  labs(title="Wie echt sind deine Blätter?") +
  labs(subtitle="Verkaufte Weihnachtsbäume in USA | Quelle: Statista") +
  labs(y = "Anzahl verkaufte Bäume (in Mio)") +
* labs(x = "Jahr")
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_10-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
  aes(y = anz_baeume) +
  geom_point() +
  aes(color = baumart) +
  scale_color_manual(values=c("red","green4")) +
  labs(title="Wie echt sind deine Blätter?") +
  labs(subtitle="Verkaufte Weihnachtsbäume in USA | Quelle: Statista") +
  labs(y = "Anzahl verkaufte Bäume (in Mio)") +
  labs(x = "Jahr") +
* labs(color = "")
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_11-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = christmas_trees) +
  aes(x = jahr) +
  aes(y = anz_baeume) +
  geom_point() +
  aes(color = baumart) +
  scale_color_manual(values=c("red","green4")) +
  labs(title="Wie echt sind deine Blätter?") +
  labs(subtitle="Verkaufte Weihnachtsbäume in USA | Quelle: Statista") +
  labs(y = "Anzahl verkaufte Bäume (in Mio)") +
  labs(x = "Jahr") +
  labs(color = "") +
* theme_minimal()
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_12-1.png" width="80%" />
]]
---
class: split-40

# Noch einmal mit kompakter Syntax

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, color = baumart)) +
  geom_point() +
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "") +
  theme_minimal() 
```
]
]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas2-1.png)
]]

---
class: split-40

# Linien statt Punkte

.column.bg-main1[
.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart)) +
* geom_line() +
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "") +
  theme_minimal()
```
]
]
.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas3s-1.png)
]]
---
class: split-40

# Linien **und** Punkte

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart)) + 
* geom_point() +
* geom_line() +
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "") +
  theme_minimal() 
```
]
]
.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas_pointline-1.png)
]]
---
class: split-40

# Weitere Optionen

.column.bg-main1[
.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart)) + 
* geom_point(shape = 17, size = 3) +
* geom_line(linetype = "dashed", size = .25) +
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "") +
  theme_minimal() 
```
Eine Übersicht zu allen `shape`s und `linetype`s findet sich bspw. [hier](http://www.cookbook-r.com/Graphs/Shapes_and_line_types/)
]
]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas_pointline_opts-1.png)
]]

---
class: split-40
# Weitere Optionen in `aes`thetics

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart, 
*          size = anz_baeume)) +
  geom_point(shape = 17) +
  geom_line() + 
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "") +
  theme_minimal() 
```
Die Größe der Punkte & Linien entspricht der Verkaufszahl.
]
]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas_pointline_sizeaes-1.png)
]]

---
class: split-40
# Legendentitel für `size`

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart, 
           size = anz_baeume)) + 
  geom_point(shape = 17) +
  geom_line() + 
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "",
*      size = "Absatz (in Mio)") +
  theme_minimal() 
```
]
]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas_pointline_sizeaes2-1.png)
]]

---
class: split-40
# `size` für nur ein geom_...

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart)) + 
* geom_point(aes(size = anz_baeume),shape = 17) +
  geom_line() +   
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "",
       size = "Absatz (in Mio)") + 
  theme_minimal() 
```
]
]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas_pointline_sizeaes3-1.png)
]]

---
class: split-40
# Alle Anpassungen können durch Variablen definiert werden

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart)) + 
* geom_point(aes(size = anz_baeume, shape = baumart)) +
  geom_line() +   
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "",
       size = "Absatz (in Mio)") + 
  theme_minimal() 
```
]
]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas_pointline_sizeaes4-1.png)
]]

---
class: split-40
# Alle Anpassungen können durch Variablen definiert werden

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart)) + 
  geom_point(aes(size = anz_baeume, shape = baumart)) +
  geom_line() +   
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "",
*      shape = "Baumart",
       size = "Absatz (in Mio)") +  
  theme_minimal() 
```
]
]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas_pointline_sizeaes5-1.png)
]]

---
class: split-40
# Alle Anpassungen können durch Variablen definiert werden

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart)) + 
  geom_point(aes(size = anz_baeume, shape = baumart)) +
  geom_line() +   
  scale_color_manual(values = c("red", "green4")) +
* scale_shape_manual(values = c(16,18)) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "",
*      shape = "",
       size = "Absatz (in Mio)") +  
  theme_minimal() 
```
]
]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/christmas_pointline_sizeaes6-1.png)
]]

---
class: split-40

# Säulendiagramme

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart)) +
* geom_col() +
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "") +
  theme_minimal()  
```
]
]
.column.bg-main2[.content.bottom.center[
<img src="ggplot2_intro_files/figure-html/christmas_col-1.png" width="80%" />
]]

---
class: split-40

# fill und color

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           color = baumart,
*          fill = baumart)) +
  geom_col() +
  scale_color_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       color = "") +
  theme_minimal()
```
]
]

.column.bg-main2[.content.bottom.center[
<img src="ggplot2_intro_files/figure-html/christmas_colfill-1.png" width="80%" />
]
]

---
class: split-40

# scale_...

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col() +
* scale_fill_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
  theme_minimal()
```
]
]
.column.bg-main2[.content.bottom.center[
<img src="ggplot2_intro_files/figure-html/christmas_col_scale-1.png" width="80%" />
]
]
---
class: split-40

# Säulen nebeneinander

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
* geom_col(position=position_dodge()) +
  scale_fill_manual(values = c("red", "green4")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
  theme_minimal()
```
]
]
.column.bg-main2[.content.bottom.center[
<img src="ggplot2_intro_files/figure-html/christmas_col_dodge-1.png" width="80%" />
]
]
---
class: split-40

# Legendenbildung

.blade1[Die Zuordnung von Farben und Ausprägungen kann mit `breaks` verändert werden. Mit `legend.position` in `theme` kann die Legende umplatziert werden.]

.column.bg-main1[
.small.vmiddle[<br><br>]
.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_manual(values = c("red", "green4"),
*                   breaks = c("fake", "real")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
  theme_minimal() +
* theme(legend.position="bottom")
```
]
]
.column.bg-main2[
.small.vmiddle[<br><br>]
.content.bottom.center[
<img src="ggplot2_intro_files/figure-html/unnamed-chunk-13-1.png" width="80%" />
]]
---
class: split-40

# Legendenbildung

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_manual(values = c("red", "green4"),
                    breaks = c("fake", "real"),
*                   labels = c("künstl.", "echt")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
  theme_minimal() +
* theme(legend.position="top")
```
.note1[Mehr zur Formatierung der Legende [hier](http://www.cookbook-r.com/Graphs/Legends_(ggplot2))]
]
]
.column.bg-main2[.content.bottom.center[
<img src="ggplot2_intro_files/figure-html/unnamed-chunk-15-1.png" width="80%" />
]]

---
# Farben

Selbstverständlich gibt es nicht nur `red` und `green4` als Farbauswahl. Farben können auf verschiedene Arten ausgewählt werden:

+ verbal - siehe Liste [hier](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf)  
+ mit sog. [Hex-Codes](https://www.color-hex.com/) - einfach zB. "#FF0000" anstelle von "red" einsetzen 
+ Farbpaletten mit eigenen `scale_fill` bzw. `scale_color` Funktionen
  + [ColorBrewer](http://colorbrewer2.org) - bereits in `ggplot2` integriert
  + [scico](https://github.com/thomasp85/scico)
  + ein letztes Beispiel...

---
class: split-40

# Color Brewer

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
* scale_fill_brewer(palette = "Accent") +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
  theme_minimal()
```
.note85[Der Übersichtlichkeit wurde hier `color` weggelassen, es gibt natürlich auch `scale_color_brewer()`]
]
]
.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_col_brewer-1.png" width="80%" />
]
]

---
class: split-40

# Color Brewer

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_brewer(palette = "Paired") + 
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
  theme_minimal()
```
]
]
.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_col_brewer_col-1.png" width="80%" />
]
]

---
class: split-40

# Color Brewer

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_brewer(palette = "Accent",
*                   breaks = c("fake", "real"),
*                   labels = c("künstl.", "echt")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
  theme_minimal()
```
.note85[`breaks` und `labels` funktionieren auch in `scale_color_brewer()`]
]
]
.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/christmas_col_brewer2-1.png" width="80%" />
]
]

---
# Themes
<img src="ggplot2_intro_files/figure-html/unnamed-chunk-19-1.png" width="80%" />

---
# Themes II

Mehr Themes gibt es im Paket `ggthemes` - siehe [hier](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/)
<img src="ggplot2_intro_files/figure-html/unnamed-chunk-20-1.png" width="80%" />

---
class: split-40

# Textgröße

.pull-left40[.smallish[Der Text ist immer zu klein für Präsentationen]]
.column.bg-main1[.smaller.vmiddle[

]]

---
class: split-40

# Textgröße

.pull-left40[.smallish[Der Text ist immer zu klein für Präsentationen]]
.column.bg-main1[.smaller.vmiddle[

]]

---
class: split-40

# Textgröße

.column.bg-main1[.smaller.vmiddle[

---
class: split-40

# Textgröße

.column.bg-main1[.smaller.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_brewer(palette = "Accent",
                    breaks = c("fake", "real"),
                    labels = c("künstl.", "echt")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
  theme_minimal() +
  theme(axis.title.x = element_text(size = rel(2)), 
        axis.title.y = element_text(size = rel(.5)),
*       axis.text.y =  element_text(size = rel(2)))
```
]
]
.column.bg-main2[.vmiddle[
![](ggplot2_intro_files/figure-html/textsize3-1.png)
]]

---
class: split-40

# Weitere Textoptionen

.pull-left40[.smallish[Mehr Infos [hier](www.cookbook-r.com/Graphs/Fonts/)]]
.column.bg-main1[.smaller.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_brewer(palette = "Accent",
                    breaks = c("fake", "real"),
                    labels = c("künstl.", "echt")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
  theme_minimal() +
  theme(axis.title.x = element_text(size = rel(2)), 
        axis.title.y = element_text(size = rel(.5)),
*       axis.text.x =  element_text(face = "bold",angle = 45),
*       axis.text.y =  element_text(size = rel(2), family = "mono", color = "#dc322f"),
*       plot.title = element_text(size = rel(2), family = "serif", color = "#5d7187"),
*       plot.subtitle = element_text(size = rel(.9), family = "serif", face = "italic"))
```
]
]
.column.bg-main2[.vmiddle[
![](ggplot2_intro_files/figure-html/textsize4-1.png)
]]

---
class: split-40

# Weitere Textoptionen

.column.bg-main1[.smaller.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_brewer(palette = "Accent",
                    breaks = c("fake", "real"),
                    labels = c("künstl.", "echt")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
* theme_minimal() +
  theme(axis.text =  element_text(color = "#dc322f"), 
        plot.title = element_text(size = rel(2), color = "#5d7187"),
        plot.subtitle = element_text(face = "italic"))
```
]
]
.column.bg-main2[.vmiddle[
![](ggplot2_intro_files/figure-html/textsize5-1.png)
]]

---
class: split-40

# Weitere Textoptionen

.column.bg-main1[.smaller.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_brewer(palette = "Accent",
                    breaks = c("fake", "real"),
                    labels = c("künstl.", "echt")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
* theme_minimal(base_size = 14) +
  theme(axis.text =  element_text(color = "#dc322f"), 
        plot.title = element_text(size = rel(2), color = "#5d7187"),
        plot.subtitle = element_text(face = "italic"))
```
]
]
.column.bg-main2[.vmiddle[
![](ggplot2_intro_files/figure-html/textsize5b-1.png)
]]

---
class: split-40

# Weitere Textoptionen

.column.bg-main1[.smaller.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_brewer(palette = "Accent",
                    breaks = c("fake", "real"),
                    labels = c("künstl.", "echt")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
* theme_minimal(base_size = 14, base_family = "serif") +
  theme(axis.text =  element_text(color = "#dc322f"), 
        plot.title = element_text(size = rel(2), color = "#5d7187"),
        plot.subtitle = element_text(face = "italic"))
```
]
]
.column.bg-main2[.vmiddle[
![](ggplot2_intro_files/figure-html/textsize6-1.png)
]]

---
class: split-40

# Weitere Textoptionen

.column.bg-main1[.smaller.vmiddle[

```r
ggplot(data = christmas_trees, 
       aes(x=jahr,y=anz_baeume, 
           fill = baumart)) +
  geom_col(position=position_dodge()) +
  scale_fill_brewer(palette = "Accent",
                    breaks = c("fake", "real"),
                    labels = c("künstl.", "echt")) +
  labs(title = "Wie echt sind deine Blätter?",
       subtitle = "Verkaufte Weihnachtsbäume in USA | Quelle: Statista",
       y = "Anzahl verkaufte Bäume (in Mio)",
       x = "Jahr",
       fill = "") +
* theme_minimal(base_size = 8, base_family = "mono") +
  theme(axis.text =  element_text(color = "#dc322f"), 
        plot.title = element_text(size = rel(2), color = "#5d7187"),
        plot.subtitle = element_text(face = "italic"))
```
]
]
.column.bg-main2[.vmiddle[
![](ggplot2_intro_files/figure-html/textsize7-1.png)
]]

---
# Übungsdaten

Wir werden mit Daten aus dem [Gapminder](http://www.gapminder.org) Projekt arbeiten. Ein Auszug aus diesen Daten kann direkt in R mit dem Paket `gapminder` von Jenny Bryan verwendet werden. Der Datensatz enthält die Lebenserwartung `lifeExp`, Bevölkerungszahlen `pop` und BIP pro Person `gdpPercap` für 142 Länder zwischen 1952 und 2007:

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> country </th>
   <th style="text-align:left;"> continent </th>
   <th style="text-align:right;"> year </th>
   <th style="text-align:right;"> lifeExp </th>
   <th style="text-align:right;"> pop </th>
   <th style="text-align:right;"> gdpPercap </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Ethiopia </td>
   <td style="text-align:left;"> Africa </td>
   <td style="text-align:right;"> 1962 </td>
   <td style="text-align:right;"> 40.059 </td>
   <td style="text-align:right;"> 25145372 </td>
   <td style="text-align:right;"> 419.4564 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Cuba </td>
   <td style="text-align:left;"> Americas </td>
   <td style="text-align:right;"> 1997 </td>
   <td style="text-align:right;"> 76.151 </td>
   <td style="text-align:right;"> 10983007 </td>
   <td style="text-align:right;"> 5431.9904 </td>
  </tr>
</tbody>
</table>

---
# Aufgabe 1

.center[
<img src="ggplot2_intro_files/figure-html/aufg1-1.png" width="30%" />
]
Visualisieren Sie den Verlauf der Lebenserwartung, Bevölkerungszahl oder des BIP pro Kopf über die Zeit!
.smallish[
+ Installieren Sie das Paket `install.packages("gapminder")`

+ Mit `gdf <- gapminder::gapminder` können Sie die Daten aufrufen und unter `gdf` ablegen (der Name ist natürlich frei wählbar)

+ Wählen Sie zwei oder mehr Länder aus und legen Sie diese in einem neuen `data.frame` ab
  + mit `grepl("Wortbestandteil",Variable)` lassen sich string-Variablen einfach filtern  
  
  + zB. filtern nach Uganda und Namibia mit base R: 
  `df1 <- gdf[grepl("Uga",gdf$country)|grepl("Nam",gdf$country), ]`   
  
  + `country` ist als `factor` definiert, was zu Problemen führt bei der Darstellung: `df1$country <- as.character(df1$country)`  
]

Wenn etwas unklar ist, nicht funktioniert usw. **fragen!** Viel Spaß!

---
# Wenn Farben nicht reichen

Manchmal haben wir auch Daten, für einfach nur verschiedene Farben oder Shapes nicht hilfreich sind. Ein Beispiel ist dieser Datensatz, der die Inhaftungszahlen in den USA nach Bevölkerungsgruppen und Urbanität aufschlüsselt:

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> year </th>
   <th style="text-align:left;"> urbanicity </th>
   <th style="text-align:left;"> pop_category </th>
   <th style="text-align:right;"> rate_per_100000 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1983 </td>
   <td style="text-align:left;"> rural </td>
   <td style="text-align:left;"> Black </td>
   <td style="text-align:right;"> 1116.8762 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1983 </td>
   <td style="text-align:left;"> rural </td>
   <td style="text-align:left;"> White </td>
   <td style="text-align:right;"> 154.6412 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1983 </td>
   <td style="text-align:left;"> small/mid </td>
   <td style="text-align:left;"> Black </td>
   <td style="text-align:right;"> 1137.5779 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1983 </td>
   <td style="text-align:left;"> small/mid </td>
   <td style="text-align:left;"> White </td>
   <td style="text-align:right;"> 149.7800 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1983 </td>
   <td style="text-align:left;"> suburban </td>
   <td style="text-align:left;"> Black </td>
   <td style="text-align:right;"> 880.4485 </td>
  </tr>
</tbody>
</table>
Die Daten gibt es [hier](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-01-22)

---
class: split-40

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = df2) + 
aes(x = year, y = rate_per_100000,
    color = pop_category,
    shape = urbanicity)  +
  geom_line() +
  geom_point() +
  theme_minimal() +
  labs(title = "Inhaftierung nach Ethnie and Urbanität",
       subtitle = "USA, 1983-2015",
       caption = "Quelle: Vera Institute of Justice",
       x = "", y = "Incarcerated per 100000",
       col = "Ethnie", shape = "Region")
```
.note85[Das ist so ein bisschen.... *viel*]
]
]
.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/incarceration_plot1-1.png" width="80%" />
]
]

---
class: split-40

# facet_grid

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = df2) + 
aes(x = year, y = rate_per_100000) +
* facet_grid(urbanicity~pop_category) +
  geom_line() +
  theme_minimal() +
  labs(title = "Hinhaftierung nach Ethnie and Urbanität",
       subtitle = "USA, 1983-2015",
       caption = "Quelle: Vera Institute of Justice",
       x = "", y = "Incarcerated per 100000")
```
.note85[`geom_point` ist dann eher unübersichtlich, daher nur jeweils eine Linie]
]
]
.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/incarceration_plot2-1.png" width="80%" />
]
]

---

# Zwei-Schrittlösung für Individualdaten

In der Regel haben wir den Sozialwissenschaften aber Individualdaten, bspw. wenn wir ein Säulendiagramm aus den Familienstandsangaben aus dem Allbus 2016 erstellen.

So sehen die Ausgangsdaten aus:
<table>
 <thead>
  <tr>
   <th style="text-align:right;"> respid </th>
   <th style="text-align:right;"> sex </th>
   <th style="text-align:right;"> mstat </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
</tbody>
</table>
---
class: split-40

# Schritt 1: Häufigkeitstabelle

.column.bg-main1[
.small.vmiddle[<br><br><br><br><br><br>]
.small.top[

```r
a16$mstat[a16$mstat ==-9] <- NA
xtabs(~sex+mstat, data = a16) 
```
]
.small.top[
.small.vmiddle[<br><br><br><br>]

```r
kt <- xtabs(~sex+mstat, data = a16) 
tab_df <- data.frame( kt )
head(tab_df)
```
]]

.column.bg-main2[
.small.vmiddle[<br><br><br><br><br>]
.content.top.center[

```
     mstat
  sex    1    2    3    4    5    6    9
    1 1011   33   56  133  534    2    0
    2  908   27  180  183  415    5    1
```
]
.content.top.center[
.small.top[<br>]
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> sex </th>
   <th style="text-align:left;"> mstat </th>
   <th style="text-align:right;"> Freq </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:right;"> 1011 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:right;"> 908 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:right;"> 33 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:right;"> 27 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> 3 </td>
   <td style="text-align:right;"> 56 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> 3 </td>
   <td style="text-align:right;"> 180 </td>
  </tr>
</tbody>
</table>
]]
---
class: split-40

# Schritt 2: Säulendiagramm

.column.bg-main1[.small.vmiddle[

```r
*ggplot(data = tab_df) +
  aes(x = mstat, y = Freq, fill = sex ) + 
  geom_col(position=position_dodge()) +
  scale_fill_manual(values = c("navajowhite","navy"), 
                    breaks = c(1,2), 
                    labels = c("Männer", "Frauen")) +
  theme_minimal()  +
  labs(title = "Familienstand",
       subtitle = "Absolute Häufigkeiten nach Geschlecht",
       caption = "Quelle: Allbus 2016", 
       x = "Familienstand",
       y = "Absolute Häufigkeit",
       fill = "Geschlecht" ) 
```
]]

.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/allbus02-1.png" width="80%" />
]]

---
class: split-40

# Prozente statt absolute Häufigkeiten

...das geht genauso auch für Zeilenprozente (& Spaltenprozente)

.column.bg-main1[
.small.vmiddle[<br><br><br><br><br><br>]
.small.top[

```r
kt <- xtabs(~sex+mstat, data = a16) 
prop.table(kt, margin = 1) %>% round(.,3)
```
]
.small.top[
.small.vmiddle[<br><br><br><br>]

```r
kt2 <- prop.table(kt, margin = 1) 
tab_df2 <- data.frame( kt2 )
head(tab_df2)
```
]]

.column.bg-main2[
.small.vmiddle[<br><br><br><br><br>]
.content.top.center[

```
     mstat
  sex     1     2     3     4     5     6     9
    1 0.572 0.019 0.032 0.075 0.302 0.001 0.000
    2 0.528 0.016 0.105 0.106 0.241 0.003 0.001
```
]
.content.top.center[
.small.vmiddle[<br><br>]
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> sex </th>
   <th style="text-align:left;"> mstat </th>
   <th style="text-align:right;"> Freq </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:right;"> 0.5715093 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:right;"> 0.5282141 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:right;"> 0.0186546 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:right;"> 0.0157068 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:left;"> 3 </td>
   <td style="text-align:right;"> 0.0316563 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 2 </td>
   <td style="text-align:left;"> 3 </td>
   <td style="text-align:right;"> 0.1047120 </td>
  </tr>
</tbody>
</table>
]
]
---
class: split-40

# Säulendiagramm mit Zeilenprozenten

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = tab_df2) + 
  aes(x = mstat, y = Freq, fill = sex ) + 
  geom_col(position=position_dodge()) +
  scale_fill_manual(values = c("#F7B449","#132157"), 
                    breaks = c(1,2), 
                    labels = c("Männer", "Frauen")) +
  theme_minimal()  +
  labs(title = "Familienstand",
       subtitle = "Anteile pro Geschlecht",
       caption = "Quelle: Allbus 2016", 
       x = "Familienstand",
       y = "Relative Häufigkeit",
       fill = "Geschlecht" ) 
```
]]

.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/allbus_p02-1.png" width="80%" />
]]
---
class: split-40

# Achsenformat anpassen

```r
*install.packages("scales")
ggplot(data = tab_df2) + 
  aes(x = mstat, y = Freq, fill = sex ) + 
  geom_col(position=position_dodge()) +
  scale_fill_manual(values = c("#F7B449","#132157"), 
                    breaks = c(1,2), 
                    labels = c("Männer", "Frauen")) +
  theme_minimal()  +
  labs(title = "Familienstand",
       subtitle = "Anteile pro Geschlecht",
       caption = "Quelle: Allbus 2016", 
       x = "Familienstand",
       y = "Relative Häufigkeit",
       fill = "Geschlecht" ) +
* scale_y_continuous(labels=scales::percent)
```
]]

.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/allbus_per_scale-1.png" width="80%" />
]]

---
class: split-40

# Achsenformat anpassen

```r
ggplot(data = tab_df2) + 
  aes(x = mstat, y = Freq, fill = sex ) + 
  geom_col(position=position_dodge()) +
  scale_fill_manual(values = c("#F7B449","#132157"), 
                    breaks = c(1,2), labels = c("Männer", "Frauen")) +
  theme_minimal()  +
  labs(title = "Familienstand",
       subtitle = "Anteile pro Geschlecht",
       caption = "Quelle: Allbus 2016", 
       x = "Familienstand",
       y = "Relative Häufigkeit",
       fill = "Geschlecht" ) +
  scale_y_continuous(labels=scales::percent) +
  scale_x_discrete(  
*   breaks = c(1:6,9),
*   labels = c("verh. zus.","verh. get.",
*              "verw.", "gesch.","ledig",
*              "lp. zus.","lp. get."))
```
]]

.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/allbus_per_scale_x-1.png" width="80%" />
]]

---
class: split-40

# Achsenformat anpassen

```r
ggplot(data = tab_df2) + 
  aes(x = mstat, y = Freq, fill = sex ) + 
  geom_col(position=position_dodge()) +
  scale_fill_manual(values = c("#F7B449","#132157"), 
                    breaks = c(1,2), labels = c("Männer", "Frauen")) +
  theme_minimal()  +
  labs(title = "Familienstand",
       subtitle = "Anteile pro Geschlecht",
       caption = "Quelle: Allbus 2016", 
       x = "Familienstand",
       y = "Relative Häufigkeit",
       fill = "Geschlecht" ) +
  scale_y_continuous(labels=scales::percent) +
  scale_x_discrete(  
*   breaks = c(1:6,9),
*   labels = c("verh.\nzus.","verh.\nget.",
*              "verw.", "gesch.","ledig",
*              "lebensp.\nzus.","lebensp.\nget."))
```
]]

.column.bg-main2[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/allbus_per_scale_x2-1.png" width="80%" />
]]

---
# Achsenformat

+ `scale_ _continuous` vs.  `scale_ _discrete`
+ Numerische Achsen können auch transformiert, zB. logarithmisiert werden 
+ Mehr [hier](https://tinyurl.com/mndbvku)

---
# Export/Speichern von ggplots

Für den Export von ggplots können `ggsave()` verwenden, es wird automatisch der letzte aufgerufene Plot gespeichert. Allerdings empfiehlt es sich, den plot als Objekt zu speichern und diesen dann in  `ggsave()` explizit aufzurufen.

```r
plot1 <- ggplot(data = ...) + geom_...

ggsave(plot = plot1,
       filename = "C:/Users/Filser/Dokumente/plot1.png")
```

+ Die Größe der exportierten Datei kann mit `height = 3, width = 5, units = "cm"` auf 3x5cm gesetzt werden. 
+ Der Hintergrund kann mit der Option `bg = "transparent"` auf Transparent gesetzt werden (hilfreich für Integration in PowerPoint)
+ Mit `dpi = ` kann die Auflösung der Grafiken verändert werden

---
# Weitere Optionen

+ `ylim()` und `xlim()` kontrollieren die x- und y-Achsenspannweite, zB. setzt `+ ylim(c(0,35))` die y-Achse auf 0-35. *Achtung!* alle Werte außerhalb werden dann komplett ignoriert!  
+ in `+ theme()` können unzählige weiterer Optionen festgelegt werden, `theme()` ist ein zusätzliches Layer und sollte möglichst *nach* `theme_minimal()` usw. angehängt werden.  Eine nützliche Option ist `+theme(aspect.ratio = 1)` um das Seitenverhältnis auf 1:1 zu setzen

---

# Übungsaufgaben 2

Lesen Sie den kumulierten Allbus-Datensatz ein (siehe StudIP)
+ Erstellen Sie ein Säulendiagramm für `gkpol` (Größenklassen für die Wohngemeinde)  
+ Beschriften Sie die Achsen, verändern Sie die Position der Legende  
+ Verwenden Sie Farben aus dem [colorbrewer](http://colorbrewer2.org/)

---

# Teil 2: Koeffizientenplots

---
class: split-40
# Gender Pay Gap

.pull-left.small.vmiddle[
<br><br><br><br><br><br><br>
<br><br><br>

```r
m1 <- lm(inc ~ sex , data = a16_ft)
```
]
.pull-right.small[

```
  
  ========================
               m1         
  ------------------------
  (Intercept)  2475.52 ***
                (41.54)   
  sexw         -640.82 ***
                (69.28)   
  ------------------------
  R^2             0.06    
  Num. obs.    1391       
  ========================
  *** p < 0.001, ** p < 0.01, * p < 0.05
```
]

---
class: split-40
# Gender Pay Gap
.pull-left.small.vmiddle[
<br><br><br><br><br><br><br>
<br><br><br>

```r
m1 <- lm(inc ~ sex , data = a16_ft)
m2 <- lm(inc ~ sex + age , data = a16_ft)
```
]
.pull-right.small[

```
  
  =====================================
               m1           m2         
  -------------------------------------
  (Intercept)  2475.52 ***  1220.83 ***
                (41.54)     (122.58)   
  sexw         -640.82 ***  -607.37 ***
                (69.28)      (66.63)   
  age                         28.68 ***
                              (2.65)   
  -------------------------------------
  R^2             0.06         0.13    
  Num. obs.    1391         1391       
  =====================================
  *** p < 0.001, ** p < 0.01, * p < 0.05
```
]

---
class: split-40
# Gender Pay Gap

.pull-left.small.vmiddle[
<br><br><br><br><br><br><br>
<br><br><br>

```r
m1 <- lm(inc ~ sex, data = a16_ft)
m2 <- lm(inc ~ sex + age , data = a16_ft)
m3 <- lm(inc ~ sex + age + educ + eastwest + gkpol, 
         data = a16_ft)
```
]
.pull-right.xsmall[

```
  
  ==================================================
               m1           m2           m3         
  --------------------------------------------------
  (Intercept)  2475.52 ***  1220.83 ***   407.32    
                (41.54)     (122.58)     (411.48)   
  sexw         -640.82 ***  -607.37 ***  -692.88 ***
                (69.28)      (66.63)      (60.12)   
  age                         28.68 ***    32.22 ***
                              (2.65)       (2.39)   
  educ2                                   341.06    
                                         (384.68)   
  educ3                                   806.75 *  
                                         (381.12)   
  educ4                                  1199.90 ** 
                                         (388.52)   
  educ5                                  1645.26 ***
                                         (380.86)   
  eastwest2                              -559.92 ***
                                          (66.21)   
  gkpol2                                 -101.70    
                                         (153.71)   
  gkpol3                                 -118.18    
                                         (146.50)   
  gkpol4                                 -277.51    
                                         (151.11)   
  gkpol5                                 -373.70 *  
                                         (168.52)   
  gkpol6                                 -273.38    
                                         (160.41)   
  gkpol7                                 -145.13    
                                         (156.89)   
  --------------------------------------------------
  R^2             0.06         0.13         0.31    
  Num. obs.    1391         1391         1391       
  ==================================================
  *** p < 0.001, ** p < 0.01, * p < 0.05
```
]

---

# Koeffizientenplot

.content[
<img src="ggplot2_intro_files/figure-html/unnamed-chunk-42-1.png" width="60%" height="60%" style="display: block; margin: auto;" />
]

---
class: split-40

# Modelloutput vorbereiten

```r
summary(m3)
```
]
.xsmall.pull-right80[

```
  
  Call:
  lm(formula = inc ~ sex + age + educ + eastwest + gkpol, data = a16_ft)
  
  Residuals:
      Min      1Q  Median      3Q     Max 
  -3180.9  -614.1  -115.2   436.7  6320.7 
  
  Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
  (Intercept)  407.323    411.478   0.990  0.32240    
  sexw        -692.883     60.124 -11.524  < 2e-16 ***
  age           32.221      2.393  13.466  < 2e-16 ***
  educ2        341.059    384.682   0.887  0.37545    
  educ3        806.750    381.125   2.117  0.03446 *  
  educ4       1199.897    388.522   3.088  0.00205 ** 
  educ5       1645.261    380.860   4.320 1.67e-05 ***
  eastwest2   -559.917     66.212  -8.456  < 2e-16 ***
  gkpol2      -101.702    153.713  -0.662  0.50832    
  gkpol3      -118.183    146.496  -0.807  0.41996    
  gkpol4      -277.513    151.110  -1.836  0.06650 .  
  gkpol5      -373.696    168.524  -2.217  0.02675 *  
  gkpol6      -273.377    160.405  -1.704  0.08855 .  
  gkpol7      -145.132    156.891  -0.925  0.35510    
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  Residual standard error: 1065 on 1377 degrees of freedom
    (98 observations deleted due to missingness)
  Multiple R-squared:  0.3114,	Adjusted R-squared:  0.3049 
  F-statistic: 47.91 on 13 and 1377 DF,  p-value: < 2.2e-16
```
]

---
class: split-40

# Modelloutput vorbereiten

```r
install.packages("broom")
```

```r
m3 <- lm(inc ~ sex + age + educ + eastwest + gkpol, 
         data = a16_ft)
library(broom)
tidy(m3) 
```
]
.smaller[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 407.323 </td>
   <td style="text-align:right;"> 411.478 </td>
   <td style="text-align:right;"> 0.990 </td>
   <td style="text-align:right;"> 0.322 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> sexw </td>
   <td style="text-align:right;"> -692.883 </td>
   <td style="text-align:right;"> 60.124 </td>
   <td style="text-align:right;"> -11.524 </td>
   <td style="text-align:right;"> 0.000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> age </td>
   <td style="text-align:right;"> 32.221 </td>
   <td style="text-align:right;"> 2.393 </td>
   <td style="text-align:right;"> 13.466 </td>
   <td style="text-align:right;"> 0.000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ2 </td>
   <td style="text-align:right;"> 341.059 </td>
   <td style="text-align:right;"> 384.682 </td>
   <td style="text-align:right;"> 0.887 </td>
   <td style="text-align:right;"> 0.375 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ3 </td>
   <td style="text-align:right;"> 806.750 </td>
   <td style="text-align:right;"> 381.125 </td>
   <td style="text-align:right;"> 2.117 </td>
   <td style="text-align:right;"> 0.034 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ4 </td>
   <td style="text-align:right;"> 1199.897 </td>
   <td style="text-align:right;"> 388.522 </td>
   <td style="text-align:right;"> 3.088 </td>
   <td style="text-align:right;"> 0.002 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ5 </td>
   <td style="text-align:right;"> 1645.261 </td>
   <td style="text-align:right;"> 380.860 </td>
   <td style="text-align:right;"> 4.320 </td>
   <td style="text-align:right;"> 0.000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eastwest2 </td>
   <td style="text-align:right;"> -559.917 </td>
   <td style="text-align:right;"> 66.212 </td>
   <td style="text-align:right;"> -8.456 </td>
   <td style="text-align:right;"> 0.000 </td>
  </tr>
</tbody>
</table>

]

---
class: split-40

# Modelloutput vorbereiten

```r
install.packages("broom")
```

```r
m3 <- lm(inc ~ sex + age + educ + eastwest + gkpol, 
         data = a16_ft)
library(broom)
tidy(m3, 
*    conf.int = 95)
```
]
.smaller[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
   <th style="text-align:right;"> conf.low </th>
   <th style="text-align:right;"> conf.high </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 407.323 </td>
   <td style="text-align:right;"> 411.478 </td>
   <td style="text-align:right;"> 0.990 </td>
   <td style="text-align:right;"> 0.322 </td>
   <td style="text-align:right;"> -399.869 </td>
   <td style="text-align:right;"> 1214.515 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> sexw </td>
   <td style="text-align:right;"> -692.883 </td>
   <td style="text-align:right;"> 60.124 </td>
   <td style="text-align:right;"> -11.524 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;"> -810.827 </td>
   <td style="text-align:right;"> -574.938 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> age </td>
   <td style="text-align:right;"> 32.221 </td>
   <td style="text-align:right;"> 2.393 </td>
   <td style="text-align:right;"> 13.466 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;"> 27.527 </td>
   <td style="text-align:right;"> 36.914 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ2 </td>
   <td style="text-align:right;"> 341.059 </td>
   <td style="text-align:right;"> 384.682 </td>
   <td style="text-align:right;"> 0.887 </td>
   <td style="text-align:right;"> 0.375 </td>
   <td style="text-align:right;"> -413.566 </td>
   <td style="text-align:right;"> 1095.684 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ3 </td>
   <td style="text-align:right;"> 806.750 </td>
   <td style="text-align:right;"> 381.125 </td>
   <td style="text-align:right;"> 2.117 </td>
   <td style="text-align:right;"> 0.034 </td>
   <td style="text-align:right;"> 59.101 </td>
   <td style="text-align:right;"> 1554.398 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ4 </td>
   <td style="text-align:right;"> 1199.897 </td>
   <td style="text-align:right;"> 388.522 </td>
   <td style="text-align:right;"> 3.088 </td>
   <td style="text-align:right;"> 0.002 </td>
   <td style="text-align:right;"> 437.737 </td>
   <td style="text-align:right;"> 1962.057 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ5 </td>
   <td style="text-align:right;"> 1645.261 </td>
   <td style="text-align:right;"> 380.860 </td>
   <td style="text-align:right;"> 4.320 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;"> 898.133 </td>
   <td style="text-align:right;"> 2392.390 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eastwest2 </td>
   <td style="text-align:right;"> -559.917 </td>
   <td style="text-align:right;"> 66.212 </td>
   <td style="text-align:right;"> -8.456 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;"> -689.805 </td>
   <td style="text-align:right;"> -430.029 </td>
  </tr>
</tbody>
</table>

]

---
class: split-40

# Modelloutput vorbereiten

```r
m3_df <- tidy(m3, conf.int = 95)

ggplot(data = m3_df , aes(....)) 
```

.smaller[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
   <th style="text-align:right;"> conf.low </th>
   <th style="text-align:right;"> conf.high </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;font-weight: bold;"> 407.323 </td>
   <td style="text-align:right;"> 411.478 </td>
   <td style="text-align:right;"> 0.990 </td>
   <td style="text-align:right;"> 0.322 </td>
   <td style="text-align:right;font-weight: bold;"> -399.869 </td>
   <td style="text-align:right;font-weight: bold;"> 1214.515 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> sexw </td>
   <td style="text-align:right;font-weight: bold;"> -692.883 </td>
   <td style="text-align:right;"> 60.124 </td>
   <td style="text-align:right;"> -11.524 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;font-weight: bold;"> -810.827 </td>
   <td style="text-align:right;font-weight: bold;"> -574.938 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> age </td>
   <td style="text-align:right;font-weight: bold;"> 32.221 </td>
   <td style="text-align:right;"> 2.393 </td>
   <td style="text-align:right;"> 13.466 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;font-weight: bold;"> 27.527 </td>
   <td style="text-align:right;font-weight: bold;"> 36.914 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ2 </td>
   <td style="text-align:right;font-weight: bold;"> 341.059 </td>
   <td style="text-align:right;"> 384.682 </td>
   <td style="text-align:right;"> 0.887 </td>
   <td style="text-align:right;"> 0.375 </td>
   <td style="text-align:right;font-weight: bold;"> -413.566 </td>
   <td style="text-align:right;font-weight: bold;"> 1095.684 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ3 </td>
   <td style="text-align:right;font-weight: bold;"> 806.750 </td>
   <td style="text-align:right;"> 381.125 </td>
   <td style="text-align:right;"> 2.117 </td>
   <td style="text-align:right;"> 0.034 </td>
   <td style="text-align:right;font-weight: bold;"> 59.101 </td>
   <td style="text-align:right;font-weight: bold;"> 1554.398 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ4 </td>
   <td style="text-align:right;font-weight: bold;"> 1199.897 </td>
   <td style="text-align:right;"> 388.522 </td>
   <td style="text-align:right;"> 3.088 </td>
   <td style="text-align:right;"> 0.002 </td>
   <td style="text-align:right;font-weight: bold;"> 437.737 </td>
   <td style="text-align:right;font-weight: bold;"> 1962.057 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> educ5 </td>
   <td style="text-align:right;font-weight: bold;"> 1645.261 </td>
   <td style="text-align:right;"> 380.860 </td>
   <td style="text-align:right;"> 4.320 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;font-weight: bold;"> 898.133 </td>
   <td style="text-align:right;font-weight: bold;"> 2392.390 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> eastwest2 </td>
   <td style="text-align:right;font-weight: bold;"> -559.917 </td>
   <td style="text-align:right;"> 66.212 </td>
   <td style="text-align:right;"> -8.456 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;font-weight: bold;"> -689.805 </td>
   <td style="text-align:right;font-weight: bold;"> -430.029 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gkpol2 </td>
   <td style="text-align:right;font-weight: bold;"> -101.702 </td>
   <td style="text-align:right;"> 153.713 </td>
   <td style="text-align:right;"> -0.662 </td>
   <td style="text-align:right;"> 0.508 </td>
   <td style="text-align:right;font-weight: bold;"> -403.239 </td>
   <td style="text-align:right;font-weight: bold;"> 199.836 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gkpol3 </td>
   <td style="text-align:right;font-weight: bold;"> -118.183 </td>
   <td style="text-align:right;"> 146.496 </td>
   <td style="text-align:right;"> -0.807 </td>
   <td style="text-align:right;"> 0.420 </td>
   <td style="text-align:right;font-weight: bold;"> -405.562 </td>
   <td style="text-align:right;font-weight: bold;"> 169.197 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gkpol4 </td>
   <td style="text-align:right;font-weight: bold;"> -277.513 </td>
   <td style="text-align:right;"> 151.110 </td>
   <td style="text-align:right;"> -1.836 </td>
   <td style="text-align:right;"> 0.067 </td>
   <td style="text-align:right;font-weight: bold;"> -573.944 </td>
   <td style="text-align:right;font-weight: bold;"> 18.919 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gkpol5 </td>
   <td style="text-align:right;font-weight: bold;"> -373.696 </td>
   <td style="text-align:right;"> 168.524 </td>
   <td style="text-align:right;"> -2.217 </td>
   <td style="text-align:right;"> 0.027 </td>
   <td style="text-align:right;font-weight: bold;"> -704.286 </td>
   <td style="text-align:right;font-weight: bold;"> -43.105 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gkpol6 </td>
   <td style="text-align:right;font-weight: bold;"> -273.377 </td>
   <td style="text-align:right;"> 160.405 </td>
   <td style="text-align:right;"> -1.704 </td>
   <td style="text-align:right;"> 0.089 </td>
   <td style="text-align:right;font-weight: bold;"> -588.042 </td>
   <td style="text-align:right;font-weight: bold;"> 41.288 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gkpol7 </td>
   <td style="text-align:right;font-weight: bold;"> -145.132 </td>
   <td style="text-align:right;"> 156.891 </td>
   <td style="text-align:right;"> -0.925 </td>
   <td style="text-align:right;"> 0.355 </td>
   <td style="text-align:right;font-weight: bold;"> -452.904 </td>
   <td style="text-align:right;font-weight: bold;"> 162.640 </td>
  </tr>
</tbody>
</table>

]

---

```r
*ggplot(data = m3_df,aes(y = term, x = estimate))
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_1-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
* geom_vline(aes(xintercept = 0),linetype = 2)
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_2-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
* geom_point(size = 2)
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_3-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
* geom_errorbarh(aes(xmin=conf.low, xmax = conf.high))
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_4-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
* aes(height = .25)
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_5-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
* theme_minimal()
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_6-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
* labs(y = "" )
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_7-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
* labs(subtitle = "Vollzeitbeschäftige")
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_10-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
  labs(subtitle = "Vollzeitbeschäftige") +
* labs(caption= "Quelle: Allbus 2016" )
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_11-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
  labs(subtitle = "Vollzeitbeschäftige") +
  labs(caption= "Quelle: Allbus 2016" ) +
* aes(color = sub("\\d$", "", term)  )
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_12-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
  labs(subtitle = "Vollzeitbeschäftige") +
  labs(caption= "Quelle: Allbus 2016" ) +
  aes(color = sub("\\d$", "", term)  ) +
* scale_color_brewer(palette = "Paired")
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_13-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
  labs(subtitle = "Vollzeitbeschäftige") +
  labs(caption= "Quelle: Allbus 2016" ) +
  aes(color = sub("\\d$", "", term)  ) +
  scale_color_brewer(palette = "Paired") +
* guides(color = F)
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_14-1.png" width="80%" />
]]
---
class: split-40
count: false

```r
ggplot(data = m3_df,aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
  labs(subtitle = "Vollzeitbeschäftige") +
  labs(caption= "Quelle: Allbus 2016" ) +
  aes(color = sub("\\d$", "", term)  ) +
  scale_color_brewer(palette = "Paired") +
  guides(color = F) +
* scale_y_discrete(limits = rev(factor(df$term)) )
```
]]
.column[.content.vmiddle.center[
<img src="ggplot2_intro_files/figure-html/coefplot_15-1.png" width="80%" />
]]

---
class: split-40

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = df, 
       aes(y = term, x = estimate)) +
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
  labs(subtitle = "Vollzeitbeschäftige") +
  labs(caption= "Quelle: Allbus 2016" ) +
  aes(color = sub("\\d$", "", term)  ) +
  scale_color_brewer(palette = "Paired") +
  guides(color = F) +
* scale_y_discrete(limits = rev(factor(df$term)),
*                  labels = c("Einw: >500k","Einw: 100k - <500k",
*                             "Einw: 50k - <100k","Einw: 20k - <50k",
*                             "Einw: 5k - <20k","Einw. 2k - < 5k",
*                             "neue Länder",
*                             "Abi","Fachabi","Mittlere Reife",
*                             "Hauptschule",
*                             "Alter","Frau",
*                             "Intercept") )
```
]
]
.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/coefyaxis-1.png)
]]

---
class: split-40

# Modellvergleich

.blade1[Häufig interessiert uns nur eine Variable: <br> .smallish[Wie verändert sich der Gender Pay Gap nach Kontrolle für verschiedene Variablen?]]

.column.bg-main1[.small.vmiddle[
<br>
<br><br>
+ Modell 1

```r
m1 <- lm(inc ~ sex , data = a16_ft)
m1_df <- tidy(m1,conf.int=95)
filter(m1_df, term == "sexw")
```

+ Modell 2

```r
m2 <- lm(inc ~ sex + age, data = a16_ft)
m2_df <- tidy(m2,conf.int=95)
filter(m2_df, term == "sexw")
```
+ Modell 3

```r
m3 <- lm(inc ~ sex + age + educ + eastwest + gkpol, data = a16_ft)
m3_df <- tidy(m3,conf.int=95)
filter(m3_df, term == "sexw")
```
]]

.column.bg-main2[.smaller.vmiddle[
<br>
<br>
<br>

```
    term estimate std.error statistic p.value conf.low conf.high
  1 sexw -640.816    69.281    -9.249       0 -776.724  -504.909
```
<br>
<br>
<br>
<br>

```
    term estimate std.error statistic p.value conf.low conf.high
  1 sexw -607.369    66.626    -9.116       0 -738.067  -476.672
```
<br>
<br>

<br>

```
    term estimate std.error statistic p.value conf.low conf.high
  1 sexw -692.883    60.124   -11.524       0 -810.827  -574.938
```
]]

---

# Modellvergleich

.blade1[.smallish[Mit `bind_rows` können wir die ausgewählten Zeilen zu einem neuen `data.frame` zusammenfügen. Mit `.id` können wir eine ID-Variable für die Modelle setzen]]
.small[

```r
sex_coef <-  bind_rows(filter(m1_df, term == "sexw"),  
                       filter(m2_df, term == "sexw"),
                       filter(m3_df, term == "sexw"),
*                      .id = "model")
```

```r
sex_coef
```

```
    model term estimate std.error statistic p.value conf.low conf.high
  1     1 sexw  -640.82     69.28     -9.25       0  -776.72   -504.91
  2     2 sexw  -607.37     66.63     -9.12       0  -738.07   -476.67
  3     3 sexw  -692.88     60.12    -11.52       0  -810.83   -574.94
```

]

---
class: split-40

# Modellvergleich

.pull-left40[.smallish[Diesen `data.frame` können wir jetzt wieder (gg-)ploten]]
.column.bg-main1[.small.vmiddle[

```r
ggplot(data = sex_coef) +
  aes(y = model)+ 
  aes(x = estimate) + 
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "Kontrollvariablen" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
  labs(subtitle = "Vollzeitbeschäftige") +
  labs(caption= "Quelle: Allbus 2016" ) 
```
]]
.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/modelvergl-1.png)
]]

---
class: split-40

# Modellvergleich

.column.bg-main1[.small.vmiddle[

```r
ggplot(data = sex_coef) +
  aes(y = model)+ 
  aes(x = estimate) + 
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "Kontrollvariablen" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
  labs(subtitle = "Vollzeitbeschäftige") +
  labs(caption= "Quelle: Allbus 2016" )  +
* scale_y_discrete(breaks = c(1:3),
*                  labels = c("keine",
*                             "Alter",
*                             "Alter, Bildung, Ost/West, Wohnortgr."))
```
]]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/modelvergl2-1.png)

]]

---
class: split-40

# Modellvergleich

.column.bg-main1[.smaller.vmiddle[

```r
ggplot(data = sex_coef) +
  aes(y = model)+ 
  aes(x = estimate) + 
  geom_vline(aes(xintercept = 0),linetype = 2) +
  geom_point(size = 2) + 
  geom_errorbarh(aes(xmin=conf.low, xmax = conf.high)) +
  aes(height = .25)  +
  theme_minimal() +
  labs(y = "Kontrollvariablen" ) +
  labs(x = expression(hat(beta)) ) +
  labs(title = "Einkommen") +
  labs(subtitle = "Vollzeitbeschäftige") +
  labs(caption= "Quelle: Allbus 2016" ) +
  scale_y_discrete(breaks = c(1:3), 
                   labels = c("keine", 
                              "Alter", 
*                             "Alter, \nBildung, \nOst/West, \nWohnortgr."))
```
]]

.column.bg-main2[.rightplot.vmiddle[
![](ggplot2_intro_files/figure-html/modelvergl3-1.png)

]]

---

# Weitere Modellierungsvarianten

```r
m4 <- lm(inc ~ sex + age + I(age^2) , data = a16_ft)
m5 <- lm(inc ~ sex * age + I(age^2) , data = a16_ft)
```
]

```r
tidy(m4,conf.int=95)
```
.center[

```
           term estimate std.error statistic p.value  conf.low conf.high
  1 (Intercept) -455.755   389.059    -1.171   0.242 -1218.964   307.453
  2        sexw -601.476    66.173    -9.089   0.000  -731.286  -471.665
  3         age  116.748    19.588     5.960   0.000    78.324   155.173
  4    I(age^2)   -1.059     0.233    -4.537   0.000    -1.516    -0.601
```
]

```r
tidy(m5,conf.int=95)
```
.center[

```
           term estimate std.error statistic p.value  conf.low conf.high
  1 (Intercept) -945.584   402.923    -2.347   0.019 -1735.990  -155.179
  2        sexw  404.371   242.144     1.670   0.095   -70.638   879.379
  3         age  131.130    19.747     6.640   0.000    92.392   169.868
  4    I(age^2)   -1.126     0.232    -4.847   0.000    -1.582    -0.670
  5    sexw:age  -23.375     5.416    -4.316   0.000   -33.999   -12.751
```
]
]

---

# Koeffizientenplot

```r
m4 <- lm(inc ~ sex + age + I(age^2) , data = a16_ft)
m5 <- lm(inc ~ sex * age + I(age^2) , data = a16_ft)
```
]

.small.center[

<img src="ggplot2_intro_files/figure-html/unnamed-chunk-70-1.png" width="80%" style="display: block; margin: auto;" />
]

---

# Vorhergesagte Werte mit ggeffects

```r
install.packages("ggeffects")
library(ggeffects)
```
]
.small[

```r
m4 <- lm(inc ~ sex + age + I(age^2) , data = a16_ft)
lm_4 <- ggpredict(m4, terms = c("age[20,30,40,50,60]","sex"))
```

```r
lm_4
```

]
.smaller[
<table>
 <thead>
  <tr>
   <th style="text-align:right;"> x </th>
   <th style="text-align:right;"> predicted </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> conf.low </th>
   <th style="text-align:right;"> conf.high </th>
   <th style="text-align:left;"> group </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;font-weight: bold;"> 20 </td>
   <td style="text-align:right;font-weight: bold;"> 1455.802 </td>
   <td style="text-align:right;"> 105.095 </td>
   <td style="text-align:right;"> 1249.819 </td>
   <td style="text-align:right;"> 1661.786 </td>
   <td style="text-align:left;font-weight: bold;"> m </td>
  </tr>
  <tr>
   <td style="text-align:right;font-weight: bold;"> 20 </td>
   <td style="text-align:right;font-weight: bold;"> 854.327 </td>
   <td style="text-align:right;"> 108.197 </td>
   <td style="text-align:right;"> 642.264 </td>
   <td style="text-align:right;"> 1066.389 </td>
   <td style="text-align:left;font-weight: bold;"> w </td>
  </tr>
  <tr>
   <td style="text-align:right;font-weight: bold;"> 30 </td>
   <td style="text-align:right;font-weight: bold;"> 2094.028 </td>
   <td style="text-align:right;"> 53.718 </td>
   <td style="text-align:right;"> 1988.742 </td>
   <td style="text-align:right;"> 2199.313 </td>
   <td style="text-align:left;font-weight: bold;"> m </td>
  </tr>
  <tr>
   <td style="text-align:right;font-weight: bold;"> 30 </td>
   <td style="text-align:right;font-weight: bold;"> 1492.552 </td>
   <td style="text-align:right;"> 62.529 </td>
   <td style="text-align:right;"> 1369.997 </td>
   <td style="text-align:right;"> 1615.106 </td>
   <td style="text-align:left;font-weight: bold;"> w </td>
  </tr>
  <tr>
   <td style="text-align:right;font-weight: bold;"> 40 </td>
   <td style="text-align:right;font-weight: bold;"> 2520.550 </td>
   <td style="text-align:right;"> 52.874 </td>
   <td style="text-align:right;"> 2416.919 </td>
   <td style="text-align:right;"> 2624.182 </td>
   <td style="text-align:left;font-weight: bold;"> m </td>
  </tr>
  <tr>
   <td style="text-align:right;font-weight: bold;"> 40 </td>
   <td style="text-align:right;font-weight: bold;"> 1919.075 </td>
   <td style="text-align:right;"> 63.729 </td>
   <td style="text-align:right;"> 1794.168 </td>
   <td style="text-align:right;"> 2043.981 </td>
   <td style="text-align:left;font-weight: bold;"> w </td>
  </tr>
</tbody>
</table>

]

---
class: split-40

# Quadratischer Term
.column.bg-main1[.small.vmiddle[

```r
m4 <- lm(inc ~ sex + age + I(age^2) , data = a16_ft)
lm_4 <- ggpredict(m4, terms = c("age[20,30,40,50,60]","sex"))
ggplot(lm_4) +
* aes(x=x, y=predicted, colour = group)) +
  geom_line() +
  geom_point() +
  theme_minimal() +
  geom_errorbar(
*   aes(ymin= conf.low, ymax = conf.high),
                width = .35)
```
]
]
.column.bg-main2[.small.vmiddle.center[

<img src="ggplot2_intro_files/figure-html/unnamed-chunk-75-1.png" width="80%" />
]]

---
class: split-40

# Quadratischer Term
.column.bg-main1[.small.vmiddle[

```r
m4 <- lm(inc ~ sex + age + I(age^2) , data = a16_ft)
lm_4 <- ggpredict(m4, terms = c("age[20,30,40,50,60]","sex"))
ggplot(lm_4) +
  aes(x=x, y=predicted, colour = group)) + 
  geom_line() +
  geom_point() +
  theme_minimal() +
  geom_errorbar(
    aes(ymin= conf.low, ymax = conf.high), 
                width = .35) +
* labs(title = "Modell m4",
*      y = "vorhergesagte Werte",
*      x = "Alter",
*      color = "Geschlecht")
```
]
]
.column.bg-main2[.small.vmiddle.center[

<img src="ggplot2_intro_files/figure-html/unnamed-chunk-76-1.png" width="80%" />
]]

---
class: split-40

# Quadratischer Term
.column.bg-main1[.small.vmiddle[

```r
m4 <- lm(inc ~ sex + age + I(age^2) , data = a16_ft)
lm_4 <- ggpredict(m4, terms = c("age[20,30,40,50,60]","sex"))
ggplot(lm_4) +
  aes(x=x, y=predicted, colour = group)) + 
  geom_line() +
  geom_point() +
  theme_minimal() +
  geom_errorbar(
    aes(ymin= conf.low, ymax = conf.high), 
                width = .35) +
  labs(title = "Modell m4",
       y = "vorhergesagte Werte",
       x = "Alter", 
       color = "Geschlecht") +
* scale_color_brewer(palette = "Accent")
```
]
]
.column.bg-main2[.small.vmiddle.center[

<img src="ggplot2_intro_files/figure-html/unnamed-chunk-77-1.png" width="80%" />
]]

---
class: split-40

# Interaktionsterm
.column.bg-main1[.small.vmiddle[

```r
m5 <- lm(inc ~ sex * age + I(age^2)  , data = a16_ft)
lm_5 <- ggpredict(m5, terms = c("age[20,30,40,50,60]","sex"))
ggplot(lm_5) +
  aes(x=x, y=predicted, colour = group)) + 
  geom_line() +
  geom_point() +
  theme_minimal() +
  geom_errorbar(aes(ymin= conf.low, ymax = conf.high),width = .35)+
  labs(title = "Modell m5",
       y = "vorhergesagte Werte",
       x = "Alter",
       color = "Geschlecht")
```
]
]
.column.bg-main2[.small.vmiddle.center[

<img src="ggplot2_intro_files/figure-html/unnamed-chunk-79-1.png" width="80%" />
]]
---
class: split-40

# Interaktionsterm
.column.bg-main1[.small.vmiddle[

```r
m5 <- lm(inc ~ sex * age + I(age^2)  , data = a16_ft)
*lm_5 <- ggpredict(m5, terms = c("age[20,25,30,35,40,45,50,55,60]","sex"))
ggplot(lm_5) +
  aes(x=x, y=predicted, colour = group)) + 
  geom_line() +
  geom_point() +
  theme_minimal() +
  geom_errorbar(aes(ymin= conf.low, ymax = conf.high),width = .35)+
  labs(title = "Modell m5",
       y = "vorhergesagte Werte",
       x = "Alter",
       color = "Geschlecht")
```
]
]
.column.bg-main2[.small.vmiddle.center[

<img src="ggplot2_intro_files/figure-html/unnamed-chunk-81-1.png" width="80%" />
]]

---
class: split-40

# Interaktionsterm

.blade1[.smallish[Natürlich können wir auch hier Kontrollvariablen aufnehmen - zB. `eastwest`. Diese können wir dann als `facets` im Plot verwenden]]
.smallish[

```r
m6 <- lm(inc ~ sex * age + I(age^2) + eastwest , data = a16_ft)
lm_6 <- ggpredict(m6, terms = c("age[20,30,40,50,60]","sex","eastwest"))
```
]

.smallish[
<table class="table" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> x </th>
   <th style="text-align:right;"> predicted </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> conf.low </th>
   <th style="text-align:right;"> conf.high </th>
   <th style="text-align:left;"> group </th>
   <th style="text-align:left;"> facet </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 1357.506 </td>
   <td style="text-align:right;"> 115.503 </td>
   <td style="text-align:right;"> 1131.124 </td>
   <td style="text-align:right;"> 1583.888 </td>
   <td style="text-align:left;"> m </td>
   <td style="text-align:left;font-weight: bold;color: #10102c !important;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 812.957 </td>
   <td style="text-align:right;"> 124.842 </td>
   <td style="text-align:right;"> 568.272 </td>
   <td style="text-align:right;"> 1057.642 </td>
   <td style="text-align:left;"> m </td>
   <td style="text-align:left;font-weight: bold;color: #10102c !important;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 1261.362 </td>
   <td style="text-align:right;"> 126.693 </td>
   <td style="text-align:right;"> 1013.047 </td>
   <td style="text-align:right;"> 1509.676 </td>
   <td style="text-align:left;"> w </td>
   <td style="text-align:left;font-weight: bold;color: #10102c !important;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 716.813 </td>
   <td style="text-align:right;"> 137.186 </td>
   <td style="text-align:right;"> 447.933 </td>
   <td style="text-align:right;"> 985.693 </td>
   <td style="text-align:left;"> w </td>
   <td style="text-align:left;font-weight: bold;color: #10102c !important;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 30 </td>
   <td style="text-align:right;"> 2142.783 </td>
   <td style="text-align:right;"> 62.117 </td>
   <td style="text-align:right;"> 2021.037 </td>
   <td style="text-align:right;"> 2264.529 </td>
   <td style="text-align:left;"> m </td>
   <td style="text-align:left;font-weight: bold;color: #10102c !important;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 30 </td>
   <td style="text-align:right;"> 1598.234 </td>
   <td style="text-align:right;"> 74.251 </td>
   <td style="text-align:right;"> 1452.705 </td>
   <td style="text-align:right;"> 1743.764 </td>
   <td style="text-align:left;"> m </td>
   <td style="text-align:left;font-weight: bold;color: #10102c !important;"> 2 </td>
  </tr>
</tbody>
</table>
]

---
class: split-40

# Interaktionsterm

.pull-left40[.smallish[Natürlich können wir auch hier Kontrollvariablen aufnehmen - zB. `eastwest`. Diese können wir dann als `facets` im Plot verwenden]]
.column.bg-main1[.small.vmiddle[

```r
m6 <- lm(inc ~ sex * age + I(age^2) + eastwest , data = a16_ft)
lm_6 <- ggpredict(m6, terms = c("age[20,30,40,50,60]","sex","eastwest"))
ggplot(lm_6) +
  aes(x=x, y=predicted, colour = group) + 
  geom_line() +
  geom_point() +
* facet_wrap(~facet) +
  theme_minimal() +
  geom_errorbar(aes(ymin= conf.low, ymax = conf.high),width = .35)+
  labs(title = "Modell m6",
       y = "vorhergesagte Werte",
       x = "Alter",
       color = "Geschlecht")
```
]
]
.column.bg-main2[.small.vmiddle.center[
<br><br><br>
<img src="ggplot2_intro_files/figure-html/unnamed-chunk-85-1.png" width="80%" />
]]

---
# Abschließende Bemerkungen

+ alle Befehle & Strategien funktionieren am besten, wenn kategoriale Variablen **vorab** als `factor` definiert wurden
+ dabei können entweder die existierenden Ausprägungen (Zahlencodes) behalten oder diese mit labels überschrieben werden
+ zB für `sex`:
.smallish[

```r
a16_ft$sex <- factor(a16_ft$sex, levels = c(1,2), labels = c("m","w") )
```
]
  Hier werden also 1 und 2 mit `m` und `w` überschrieben

+ Die Strategie für logistische `glm` oder Cox-Regressionsmodelle `coxph` entspricht dem Vorgehen bei OLS-Regressionsmodellen 
  + für Koeffizientenplots ist jedoch jeweils zu beachten ob logit-Koeffizienten, Odds/Hazard Ratios oder average marginal effects dargestellt werden sollen
  + für `ggeffects` gibt es mehr Informationen zB. [hier](https://cran.r-project.org/web/packages/ggeffects/vignettes/marginaleffects.html) 
  
---
# Übungsaufgaben 3

.smallish[
+ Laden Sie den kumulierten Allbus-Datensatz (siehe Hinweise am Ende)
+ Filtern Sie die in Vollzeit erwerbstätigen Befragten aus 2014:

```r
a14_ft <- filter(ak, year == 2014, work == 1)
```
+ Berechnen Sie Regressionsmodelle mit `inc` als abhängiger Variable, indem Sie Modell für Modell folgende unab. Variablen einfügen:
  + `sex` (Geschlecht der Befragten)
  + `age` (Alter der Befragten)
  + `hs18` (Körpergröße der Befragten)
  + `educ` (Bildungsabschluss) 
  + `gkpol` (Wohnortgröße)
+ Definieren Sie die kategorialen Variablen (`sex`, `educ`, `gkpol`) als `factor`! 
+ Erstellen Sie Koeffizientenplots für die Modelle!
+ Wie verändert sich der Koeffizient für das Geschlecht mit zusätzlichen Kontrollvariablen? Erstellen Sie eine entsprechende Darstellung!
  
.smallish[...Fortsetzung nächste Seite]
]

---

# Übungsaufgaben 3 (Fortsetzung)

```r
lm_q <- lm(inc  ~ sex * age + I(age^2))
```
+ Erstellen Sie mit `ggpredict` die vorhergesagten Werte
+ Erstellen Sie die passende graphische Darstellung!

]
---

# Fälle hervorheben

```r
install.packages("ggrepel")
library(ggrepel)
```
.smallish[
Beispiele gibt es bspw. [hier](https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html)
]

---
# Linksammlung

+ das [R Graphics Cookbook](http://www.cookbook-r.com/Graphs/) bietet eine umfangreiche Sammlung an Beispielplots inkl. dazugehöriger Syntax 
+ das [ggplot2 Cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf) gibt eine Übersicht zu den wichtigsten Funktionen und Befehlen
+ [50 Beispiele](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html) von ggplot2-Graphiken
+ [R Graph Gallery](https://wwwr-graph-gallery.com) mit verschiedensten Darstellungsformen
+ [ggplot-Flipbook](https://evamaerey.github.io/ggplot_flipbook/ggplot_flipbook_xaringan.html#1) mit Beispielen
+ [Liste](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf) aller Farben in R 
+ [Hex-Codefinder](https://www.color-hex.com/)
+ [ColorBrewer](http://colorbrewer2.org): Farbpaletten über `scale_..._brewer()` direkt abrufbar
+ Weitere Pakete:
  + [`scico`](https://github.com/thomasp85/scico): weitere Farbpaletten über `scale_..._scico()`
  + [`ggthemes`](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/) weitere Themes
  + [`patchwork`](https://github.com/thomasp85/patchwork) ggplots kombinieren
  + [`gganimate`](https://github.com/thomasp85/gganimate) ggplots in gifs verwandeln

---

--- 
# Einlesen der Beispieldatensätze 
.smaller[

```r
install.packages(c("readxl","dplyr"))
```

+ Weihnachtsbäume

```r
library(dplyr)
christmas_trees <- readxl::read_xlsx("...../cristmas_trees.xlsx") %>% 
  mutate(jahr = rep(2004:2017,each=2)) %>% 
  rename(anz_baeume = `anzahl verkaufte Bäume`)
```

+ Inhaftierungszahlen

```r
library(dplyr)
df <- readr::read_csv("..../prison_summary.csv")
df2 <- df %>% 
  filter(!(pop_category %in% c("Female", "Male", "Total", "Other"))) 
```

+ Allbus 2016

```r
a16 <- read.csv("...../ZA5250_v2-0-0.csv",sep = ";",
                header = T,stringsAsFactors = F) 
```

+ Kumulierter Allbus

```r
library(dplyr)
ak <- readr::read_delim(".../ZA4586_v1-0-0.csv", delim = ";")
ak <- mutate_if(.data = ak, is.numeric, ~ifelse(.<0,NA,.)) # alle missings raus
```
]

---