class: center, title-slide, middle # Einführung in ggplot-Grammatik ## Daten bändigen & visualisieren ### B. Philipp Kleer ### Methodentage 2021 ### 11. Oktober 2021 .social[ [<svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#EB811B;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M294.75 188.19h-45.92V342h47.47c67.62 0 83.12-51.34 83.12-76.91 0-41.64-26.54-76.9-84.67-76.9zM256 8C119 8 8 119 8 256s111 248 248 248 248-111 248-248S393 8 256 8zm-80.79 360.76h-29.84v-207.5h29.84zm-14.92-231.14a19.57 19.57 0 1 1 19.57-19.57 19.64 19.64 0 0 1-19.57 19.57zM300 369h-81V161.26h80.6c76.73 0 110.44 54.83 110.44 103.85C410 318.39 368.38 369 300 369z"></path></svg>](https://orcid.org/0000-0003-1935-387X) [<svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#EB811B;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M105.2 24.9c-3.1-8.9-15.7-8.9-18.9 0L29.8 199.7h132c-.1 0-56.6-174.8-56.6-174.8zM.9 287.7c-2.6 8 .3 16.9 7.1 22l247.9 184-226.2-294zm160.8-88l94.3 294 94.3-294zm349.4 88l-28.8-88-226.3 294 247.9-184c6.9-5.1 9.7-14 7.2-22zM425.7 24.9c-3.1-8.9-15.7-8.9-18.9 0l-56.6 174.8h132z"></path></svg>](https://gitlab.com/bpkleer) [<svg viewBox="0 0 512 512" style="position:relative;display:inline-block;top:.1em;fill:#EB811B;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M496 128v16a8 8 0 0 1-8 8h-24v12c0 6.627-5.373 12-12 12H60c-6.627 0-12-5.373-12-12v-12H24a8 8 0 0 1-8-8v-16a8 8 0 0 1 4.941-7.392l232-88a7.996 7.996 0 0 1 6.118 0l232 88A8 8 0 0 1 496 128zm-24 304H40c-13.255 0-24 10.745-24 24v16a8 8 0 0 0 8 8h464a8 8 0 0 0 8-8v-16c0-13.255-10.745-24-24-24zM96 192v192H60c-6.627 0-12 5.373-12 12v20h416v-20c0-6.627-5.373-12-12-12h-36V192h-64v192h-64V192h-64v192h-64V192H96z"></path></svg>](https://www.uni-giessen.de/faculties/f03/departments/dps/staff/researchers/kleer?set_language=en) [<svg viewBox="0 0 448 512" style="position:relative;display:inline-block;top:.1em;fill:#EB811B;height:1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M0 32v448h448V32H0zm262.2 334.4c-6.6 3-33.2 6-50-14.2-9.2-10.6-25.3-33.3-42.2-63.6-8.9 0-14.7 0-21.4-.6v46.4c0 23.5 6 21.2 25.8 23.9v8.1c-6.9-.3-23.1-.8-35.6-.8-13.1 0-26.1.6-33.6.8v-8.1c15.5-2.9 22-1.3 22-23.9V225c0-22.6-6.4-21-22-23.9V193c25.8 1 53.1-.6 70.9-.6 31.7 0 55.9 14.4 55.9 45.6 0 21.1-16.7 42.2-39.2 47.5 13.6 24.2 30 45.6 42.2 58.9 7.2 7.8 17.2 14.7 27.2 14.7v7.3zm22.9-135c-23.3 0-32.2-15.7-32.2-32.2V167c0-12.2 8.8-30.4 34-30.4s30.4 17.9 30.4 17.9l-10.7 7.2s-5.5-12.5-19.7-12.5c-7.9 0-19.7 7.3-19.7 19.7v26.8c0 13.4 6.6 23.3 17.9 23.3 14.1 0 21.5-10.9 21.5-26.8h-17.9v-10.7h30.4c0 20.5 4.7 49.9-34 49.9zm-116.5 44.7c-9.4 0-13.6-.3-20-.8v-69.7c6.4-.6 15-.6 22.5-.6 23.3 0 37.2 12.2 37.2 34.5 0 21.9-15 36.6-39.7 36.6z"></path></svg>](https://www.researchgate.net/profile/Benedikt_Kleer) ] --- # Was macht der folgende Code? .pull-left-40[ ```r df %>% slice(seq(1, 1000, 100)) %>% filter(os == "iOS") %>% group_by(device) %>% summarize(mean(timeOfUse)) ``` ] -- .pull-right-60[ **Lösung:** - der Datensatz `df` wird geteilt, wir wählen mit `seq()` jeden 100. Fall - wir nehmen daraus dann nur Fälle, deren Variable `os` gleich `"iOS"` ist - dann sortieren wir nach der Variable `device` - zuletzt lassen wir den Mittelwert der Variable `timeOfUse` gruppiert nach `device` ausgeben. ] --- # Und was macht dieser Code? ```r df %>% select(gndr, income, residence, inhabitants) %>% bind_rows(df2) %>% mutate(sizeOfTown = case_when(inhabitants < 10000 ~ "small town", inhabitants > 10000 & inhabitants < 50000 ~ "city", inhabitants > 50000 ~ "large city")) ``` -- **Lösung:** - aus dem Datensatz `df` werden die Variablen `gndr`, `income`, `residence` und `inhabitants` gefiltert - danach fügen wir die Fälle aus dem Datensatz `df2` hinzu. - zuletzt schaffen wir eine neue Variable `sizeOfTown`, die der Höhe der Einwohner:innen eine Beschreibung zuordnet (ordinale Variable) --- # Und was macht dieser Code? ```r df %>% bind_rows(df3) %>% filter(income >> 1000 && gndr = "female") %>% mutate(sizeOfTown = case_when(inhabitants < 10000 ~ "small town", inhabitants > 10000 & inhabitants < 50000 ~ "city", inhabitants > 50000 ~ "large city")) ``` -- **Lösung:** - fehlerhaft Code! - in `filter()` genutzte Anweisungen sind falsch! - korrekt ist: `filter(income > 1000 & gndr == "female")` --- class: inverse2, mline, center, middle # Das war's! Ab zu `ggplot`!