Part 2: Advanced Analysis - Welcome to Day 2!Teil 2: Fortgeschrittene Analyse - Willkommen zu Tag 2!

Yesterday you built your first species accumulation curve. Today we’ll take it to the next level with real scientific analysis!

What you’ll learn today:

Spatial sampling - Walk through plots like a real field ecologist
Measuring uncertainty - Run from multiple starting points
Comparing natives vs aliens - Do they accumulate differently?
Polished figures - Make nice-looking plots

How to use these exercises:

Replace each ____ with the correct code
Run each chunk to see if it works
Read the hints if you get stuck
The exercises build on each other - do them in order!

Gestern hast du deine erste Artenakkumulationskurve gebaut. Heute bringen wir es mit echter wissenschaftlicher Analyse auf das nächste Level!

Was du heute lernst:

Räumliche Probenahme - Gehe durch Plots wie ein echter Feldökologe
Unsicherheit messen - Starte von mehreren Punkten
Heimische vs Aliens vergleichen - Akkumulieren sie unterschiedlich?
Schöne Abbildungen - Erstelle ansprechende Plots

So verwendest du diese Übungen:

Ersetze jedes ____ mit dem richtigen Code
Führe jeden Chunk aus um zu sehen ob er funktioniert
Lies die Hinweise wenn du nicht weiterkommst
Die Übungen bauen aufeinander auf - mache sie der Reihe nach!

Exercise 1: Setup - Load Data and FunctionsÜbung 1: Setup - Daten und Funktionen laden

Concept: Before we start, we need to load our data and define some helper functions. These functions will make our analysis easier.

Konzept: Bevor wir beginnen, müssen wir unsere Daten laden und einige Hilfsfunktionen definieren. Diese Funktionen machen unsere Analyse einfacher.

1a: Load packages and data1a: Pakete und Daten laden

# Load packages
# Pakete laden
library(tidyverse)

# Load data
# Daten laden
header <- read_csv("../data/austria_header.csv")
species <- read_csv("../data/austria_species.csv")

# Check data loaded correctly
# Überprüfe dass Daten korrekt geladen sind
print(paste("Plots loaded:", nrow(header)))
print(paste("Species records:", nrow(species)))
print(paste("Unique species:", length(unique(species$WFO_TAXON))))

1b: New R Concepts for Today1b: Neue R-Konzepte für heute

Before we define our helper functions, let’s learn some new R concepts that we’ll use today. Yesterday you learned variables, vectors, and unique(). Today we add:

Writing your own functions - Fibonacci example:

The Fibonacci sequence: 1, 1, 2, 3, 5, 8, 13… (each number = sum of previous two)

# VERSION 1: Using a vector (stores ALL numbers)
fibonacci_vector <- function(n) {
  fib <- c(1, 1)                      # Start with first two numbers
  for (i in 3:n) {
    fib[i] <- fib[i-1] + fib[i-2]     # Add: previous + one before that
  }
  return(fib)
}

fibonacci_vector(7)  # Returns: 1 1 2 3 5 8 13

# VERSION 2: Using 2 variables (memory efficient - only tracks last 2)
fibonacci_two_vars <- function(n) {
  prev <- 1                           # The number before current
  curr <- 1                           # Current number
  for (i in 3:n) {
    new <- prev + curr                # Calculate next
    prev <- curr                      # Shift: current becomes previous
    curr <- new                       # Shift: new becomes current
  }
  return(curr)
}

fibonacci_two_vars(7)  # Returns: 13 (the 7th Fibonacci number)

Default values in functions:

# If user doesn't provide 'n', use 10 as default
fibonacci <- function(n = 10) {
  fib <- c(1, 1)
  for (i in 3:n) fib[i] <- fib[i-1] + fib[i-2]
  return(fib)
}

fibonacci()      # Returns first 10 numbers (uses default)
fibonacci(5)     # Returns first 5 numbers

Useful functions we’ll use:

Function	What it does	Example
`paste()`	Combine text and values	`paste("Found:", 5, "species")`
`sample()`	Pick random items	`sample(1:100, 5)` picks 5 random numbers
`set.seed()`	Make random reproducible	`set.seed(42)` always gives same “random”
`rep()`	Repeat values	`rep(FALSE, 5)` gives `FALSE FALSE FALSE FALSE FALSE`
`which.min()`	Position of smallest value	`which.min(c(5,2,8))` returns `2`
`setdiff()`	What’s in A but not B	`setdiff(c(1,2,3), c(2,3,4))` returns `1`
`pull()`	Extract column as vector	`data %>% pull(column)`
`is.null()`	Check if NULL	`is.null(NULL)` returns `TRUE`

Special values:

NULL = “nothing” / empty / not specified
Inf = infinity (bigger than any number)

Bevor wir unsere Hilfsfunktionen definieren, lernen wir einige neue R-Konzepte, die wir heute verwenden. Gestern hast du Variablen, Vektoren und unique() gelernt. Heute fügen wir hinzu:

Eigene Funktionen schreiben - Fibonacci-Beispiel:

Die Fibonacci-Folge: 1, 1, 2, 3, 5, 8, 13… (jede Zahl = Summe der vorherigen zwei)

# VERSION 1: Mit Vektor (speichert ALLE Zahlen)
fibonacci_vektor <- function(n) {
  fib <- c(1, 1)                      # Starte mit ersten zwei Zahlen
  for (i in 3:n) {
    fib[i] <- fib[i-1] + fib[i-2]     # Addiere: vorherige + davor
  }
  return(fib)
}

fibonacci_vektor(7)  # Gibt zurück: 1 1 2 3 5 8 13

# VERSION 2: Mit 2 Variablen (speichereffizient - nur letzte 2)
fibonacci_zwei_vars <- function(n) {
  vorherige <- 1                      # Die Zahl vor der aktuellen
  aktuelle <- 1                       # Aktuelle Zahl
  for (i in 3:n) {
    neue <- vorherige + aktuelle      # Berechne nächste
    vorherige <- aktuelle             # Verschiebe: aktuelle wird vorherige
    aktuelle <- neue                  # Verschiebe: neue wird aktuelle
  }
  return(aktuelle)
}

fibonacci_zwei_vars(7)  # Gibt zurück: 13 (die 7. Fibonacci-Zahl)

Standardwerte in Funktionen:

# Wenn der User 'n' nicht angibt, benutze 10 als Standard
fibonacci <- function(n = 10) {
  fib <- c(1, 1)
  for (i in 3:n) fib[i] <- fib[i-1] + fib[i-2]
  return(fib)
}

fibonacci()      # Gibt erste 10 Zahlen zurück (benutzt Standard)
fibonacci(5)     # Gibt erste 5 Zahlen zurück

Nützliche Funktionen, die wir verwenden:

Funktion	Was sie macht	Beispiel
`paste()`	Text und Werte kombinieren	`paste("Gefunden:", 5, "Arten")`
`sample()`	Zufällige Elemente wählen	`sample(1:100, 5)` wählt 5 Zufallszahlen
`set.seed()`	Zufall reproduzierbar machen	`set.seed(42)` gibt immer gleichen “Zufall”
`rep()`	Werte wiederholen	`rep(FALSE, 5)` gibt `FALSE FALSE FALSE FALSE FALSE`
`which.min()`	Position des kleinsten Werts	`which.min(c(5,2,8))` gibt `2`
`setdiff()`	Was ist in A aber nicht in B	`setdiff(c(1,2,3), c(2,3,4))` gibt `1`
`pull()`	Spalte als Vektor extrahieren	`data %>% pull(spalte)`
`is.null()`	Prüfen ob NULL	`is.null(NULL)` gibt `TRUE`

Spezielle Werte:

NULL = “nichts” / leer / nicht angegeben
Inf = Unendlich (größer als jede Zahl)

1c: Define helper functions (just copy & run!)1c: Hilfsfunktionen definieren (einfach kopieren & ausführen!)

Just run this code block! You don’t need to understand every line - these are tools we’ll use later. Click the details below only if you’re curious.

Führe diesen Code-Block einfach aus! Du musst nicht jede Zeile verstehen - das sind Werkzeuge die wir später benutzen. Klicke die Details unten nur wenn du neugierig bist.

🔨 Building the Functions Step-by-Step

Click each step to see how we build up from simple to complete:

Klicke jeden Schritt um zu sehen wie wir von einfach zu vollständig aufbauen:

1 calc_distance: Start with the math formula calc_distance: Starte mit der Mathe-Formel

Goal: Calculate distance between two points.

Ziel: Berechne Distanz zwischen zwei Punkten.

Remember Pythagoras? For two points (x1,y1) and (x2,y2):

Erinnerst du dich an Pythagoras? Für zwei Punkte (x1,y1) und (x2,y2):

distance = sqrt((x2-x1)² + (y2-y1)²)

In R, we just wrap this in a function:

In R wickeln wir das einfach in eine Funktion:

calc_distance <- function(x1, y1, x2, y2) {
  sqrt((x2 - x1)^2 + (y2 - y1)^2)
}

✨ Bonus: This automatically works with vectors! If x2,y2 are vectors of 100 points, you get 100 distances back! ✨ Bonus: Das funktioniert automatisch mit Vektoren! Wenn x2,y2 Vektoren mit 100 Punkten sind, bekommst du 100 Distanzen zurück!

2 nn_walk: First, think about what we need to track nn_walk: Zuerst überlegen was wir verfolgen müssen

Goal: Visit all plots, always going to the nearest unvisited one.

Ziel: Besuche alle Plots, gehe immer zum nächsten unbesuchten.

What do we need to keep track of?

Was müssen wir verfolgen?

Which plots have we already visited? → visited (TRUE/FALSE for each)
Welche Plots haben wir schon besucht? → visited (TRUE/FALSE für jeden)
What order did we visit them? → visit_order (list of plot IDs)
In welcher Reihenfolge haben wir sie besucht? → visit_order (Liste der Plot-IDs)
Where are we now? → current (index of current plot)
Wo sind wir jetzt? → current (Index des aktuellen Plots)

# Setup our tracking variables
n <- nrow(header_data)           # total number of plots
visited <- rep(FALSE, n)         # all start as unvisited
visit_order <- numeric(n)        # empty vector to fill
current <- 1                     # start at plot 1

3 nn_walk: Build the main loop nn_walk: Baue die Hauptschleife

Now we loop through, visiting one plot at a time:

Jetzt schleifen wir durch und besuchen einen Plot nach dem anderen:

for (i in 1:n) {
  # 1. Mark current plot as visited
  visited[current] <- TRUE

  # 2. Record which plot we visited
  visit_order[i] <- header_data$PlotObservationID[current]

  # 3. Find the next plot (closest unvisited)
  # ... but how?
}

🤔 Problem: How do we find the closest UNVISITED plot? 🤔 Problem: Wie finden wir den nächsten UNBESUCHTEN Plot?

4 nn_walk: The "Infinity Trick" for excluding visited plots nn_walk: Der "Unendlich-Trick" zum Ausschließen besuchter Plots

We calculate distance to ALL plots, then use a clever trick:

Wir berechnen die Distanz zu ALLEN Plots, dann nutzen wir einen cleveren Trick:

# Calculate distance from current to ALL plots
distances <- calc_distance(
  header_data$Longitude[current], header_data$Latitude[current],
  header_data$Longitude, header_data$Latitude
)

# THE TRICK: Set visited plots to Infinity!
distances[visited] <- Inf

# Now which.min() will never pick a visited plot
current <- which.min(distances)

✨ Why Inf? Because which.min() finds the smallest value. Infinity can never be smallest, so visited plots are automatically excluded! ✨ Warum Inf? Weil which.min() den kleinsten Wert findet. Unendlich kann nie der kleinste sein, also werden besuchte Plots automatisch ausgeschlossen!

5 nn_walk: Add flexibility with optional start point nn_walk: Flexibilität mit optionalem Startpunkt

Sometimes we want to control where to start, sometimes random is fine:

Manchmal wollen wir kontrollieren wo wir starten, manchmal ist zufällig ok:

nn_walk <- function(header_data, start_idx = NULL) {
  # If user didn't specify, pick random
  if (is.null(start_idx)) {
    start_idx <- sample(1:nrow(header_data), 1)
  }
  current <- start_idx
  # ... rest of function
}

✨ Default = NULL means "not specified". We check with is.null() and provide our own default behavior. ✨ Standard = NULL bedeutet "nicht angegeben". Wir prüfen mit is.null() und liefern unser eigenes Standardverhalten.

6 build_accumulation: Think about what "accumulation" means build_accumulation: Überlege was "Akkumulation" bedeutet

Goal: Count how many UNIQUE species we've found after each plot.

Ziel: Zähle wie viele EINZIGARTIGE Arten wir nach jedem Plot gefunden haben.

Let's think through an example:

Denken wir ein Beispiel durch:

# Plot 1: Oak, Beech         → found = {Oak, Beech}           → count = 2
# Plot 2: Beech, Pine        → found = {Oak, Beech, Pine}     → count = 3
# Plot 3: Oak, Oak, Maple    → found = {Oak, Beech, Pine, Maple} → count = 4

Key insight: We only count NEW species (ones we haven't seen before)!

Schlüsselerkenntnis: Wir zählen nur NEUE Arten (die wir noch nicht gesehen haben)!

7 build_accumulation: Use setdiff() to find new species build_accumulation: Nutze setdiff() um neue Arten zu finden

setdiff(A, B) = "what's in A but NOT in B"

setdiff(A, B) = "was ist in A aber NICHT in B"

found <- c("Oak", "Beech")           # species we already have
plot_species <- c("Beech", "Pine")   # species in new plot

setdiff(plot_species, found)         # Returns: "Pine"
# Only Pine is NEW!

This becomes our loop:

Das wird unsere Schleife:

found <- c()  # start empty

for (i in 1:length(plot_order)) {
  # Get species in this plot
  plot_spp <- get_species_for_plot(plot_order[i])

  # Find what's NEW
  new_spp <- setdiff(plot_spp, found)

  # Add new species to our collection
  found <- c(found, new_spp)

  # Record the count
  accum[i] <- length(found)
}

8 build_accumulation: Add optional filtering build_accumulation: Füge optionale Filterung hinzu

We want ONE function that works for native, alien, or all species:

Wir wollen EINE Funktion die für heimische, Alien, oder alle Arten funktioniert:

build_accumulation <- function(species_data, plot_order, status_filter = NULL) {
  # If filter provided, apply it FIRST
  if (!is.null(status_filter)) {
    species_data <- species_data %>% filter(STATUS == status_filter)
  }
  # ... rest of function works on filtered data
}

Now we can call it three ways:

Jetzt können wir sie auf drei Arten aufrufen:

build_accumulation(species, order)           # all species
build_accumulation(species, order, "nat")    # only native
build_accumulation(species, order, "neo")    # only alien

9 find_saturation: A simple but useful helper find_saturation: Ein einfacher aber nützlicher Helfer

Goal: Find when we've discovered 80% of all species.

Ziel: Finde wann wir 80% aller Arten entdeckt haben.

# If final count is 100 species, target is 80
target <- max(curve) * 0.8

# Find FIRST position where we reach target
# curve >= target gives: FALSE FALSE FALSE TRUE TRUE TRUE ...
# which() gives positions of TRUEs: 4, 5, 6, ...
# [1] gives first one: 4
which(curve >= target)[1]

✨ Pattern: which(condition)[1] = "first position where condition is TRUE" ✨ Muster: which(bedingung)[1] = "erste Position wo Bedingung TRUE ist"

👆 Click each step above to expand! Now here are the complete functions:

👆 Klicke jeden Schritt oben zum Aufklappen! Hier sind die vollständigen Funktionen:

# ==== FUNCTION 1: calc_distance ====
# Calculates Euclidean distance. VECTORIZED: x2,y2 can be vectors!
# Berechnet Euklidische Distanz. VEKTORISIERT: x2,y2 können Vektoren sein!

calc_distance <- function(x1, y1, x2, y2) {
  # Pythagorean theorem: sqrt(dx² + dy²). Works with vectors!
  # Satz des Pythagoras: sqrt(dx² + dy²). Funktioniert mit Vektoren!
  sqrt((x2 - x1)^2 + (y2 - y1)^2)
}


# ==== FUNCTION 2: nn_walk ====
# Nearest-neighbour walk: always go to closest unvisited plot
# Nearest-Neighbour-Walk: gehe immer zum nächsten unbesuchten Plot

nn_walk <- function(header_data, start_idx = NULL) {
  # How many plots total? / Wie viele Plots insgesamt?
  n <- nrow(header_data)

  # If no start given, pick random / Wenn kein Start, wähle zufällig
  if (is.null(start_idx)) start_idx <- sample(1:n, 1)

  # Track which plots we've visited / Verfolge welche Plots besucht
  visited <- rep(FALSE, n)
  # Store the order we visit them / Speichere Reihenfolge
  visit_order <- numeric(n)
  # Start at this plot / Starte bei diesem Plot
  current <- start_idx

  # Loop through all plots / Schleife durch alle Plots
  for (i in 1:n) {
    # Mark current as visited / Markiere aktuellen als besucht
    visited[current] <- TRUE
    # Save the plot ID / Speichere Plot-ID
    visit_order[i] <- header_data$PlotObservationID[current]

    # If not done yet / Falls noch nicht fertig
    if (i < n) {
      # Calculate distance from current to ALL others (vectorized!)
      # Berechne Distanz von aktuellem zu ALLEN anderen (vektorisiert!)
      distances <- calc_distance(
        header_data$Longitude[current],
        header_data$Latitude[current],
        header_data$Longitude,
        header_data$Latitude
      )
      # Inf trick: visited plots can never be "minimum"
      # Inf-Trick: besuchte Plots können nie "Minimum" sein
      distances[visited] <- Inf
      # Go to closest unvisited / Gehe zum nächsten unbesuchten
      current <- which.min(distances)
    }
  }
  # Return the order of plot IDs / Gib Reihenfolge der Plot-IDs zurück
  return(visit_order)
}


# ==== FUNCTION 3: build_accumulation ====
# Count cumulative species as we visit each plot in order
# Zähle kumulative Arten während wir jeden Plot der Reihe nach besuchen

build_accumulation <- function(species_data, plot_order, status_filter = NULL) {
  # If filter provided (e.g., "nat"), keep only matching species
  # Falls Filter angegeben, behalte nur passende Arten
  if (!is.null(status_filter)) {
    species_data <- species_data %>% filter(STATUS == status_filter)
  }

  # Empty vector to collect all species found / Leerer Vektor für gefundene Arten
  found <- c()
  # Pre-allocate result vector / Ergebnisvektor vorbelegen
  accum <- numeric(length(plot_order))

  # For each plot in our walking order / Für jeden Plot in unserer Laufreihenfolge
  for (i in seq_along(plot_order)) {
    # Get unique species in this plot / Hole einzigartige Arten in diesem Plot
    plot_spp <- species_data %>%
      # Filter to current plot / Filtere auf aktuellen Plot
      filter(PlotObservationID == plot_order[i]) %>%
      # Extract species names / Extrahiere Artnamen
      pull(WFO_TAXON) %>%
      # Remove duplicates within plot / Entferne Duplikate im Plot
      unique()

    # setdiff: what's NEW? (in plot but not yet found)
    # setdiff: was ist NEU? (im Plot aber noch nicht gefunden)
    new_spp <- setdiff(plot_spp, found)
    # Add new species to our collection / Füge neue Arten zur Sammlung
    found <- c(found, new_spp)
    # Count total species so far / Zähle Gesamtarten bisher
    accum[i] <- length(found)
  }
  # Return the accumulation curve / Gib Akkumulationskurve zurück
  return(accum)
}


# ==== FUNCTION 4: find_saturation ====
# Find when we reach X% of total species (default 80%)
# Finde wann wir X% der Gesamtarten erreichen (Standard 80%)

find_saturation <- function(curve, threshold = 0.8) {
  # Calculate target: 80% of final count / Berechne Ziel: 80% der Endzahl
  target <- max(curve) * threshold
  # which()[1] = first position where TRUE / Erste Position wo TRUE
  which(curve >= target)[1]
}

print("All functions loaded!")

Hint: Run both code blocks above. If you get errors, check that the data files exist in ../data/. Hinweis: Führe beide Code-Blöcke aus. Bei Fehlern prüfe ob die Dateien in ../data/ existieren.

⚡ Rcpp Turbo Functions (if R is too slow)

If analyses take too long, these C++ versions are 10-50x faster. Run this ONCE at the start to replace the R functions:Falls Analysen zu lange dauern, sind diese C++ Versionen 10-50x schneller. Führe dies EINMAL am Anfang aus um die R-Funktionen zu ersetzen:

# Install Rcpp if needed / Installiere Rcpp falls nötig
# install.packages("Rcpp")
library(Rcpp)

# Compile the C++ functions / Kompiliere die C++ Funktionen
cppFunction('
NumericVector nn_walk_cpp(NumericVector lon, NumericVector lat, IntegerVector plot_ids, int start_idx) {
  int n = lon.size();
  NumericVector visit_order(n);
  LogicalVector visited(n, false);

  int current = start_idx - 1;  // Convert to 0-indexed

  for (int i = 0; i < n; i++) {
    visited[current] = true;
    visit_order[i] = plot_ids[current];

    if (i < n - 1) {
      double min_dist = R_PosInf;
      int next_idx = -1;

      for (int j = 0; j < n; j++) {
        if (!visited[j]) {
          double dx = lon[j] - lon[current];
          double dy = lat[j] - lat[current];
          double dist = sqrt(dx*dx + dy*dy);
          if (dist < min_dist) {
            min_dist = dist;
            next_idx = j;
          }
        }
      }
      current = next_idx;
    }
  }
  return visit_order;
}
')

cppFunction('
IntegerVector build_accum_cpp(IntegerVector plot_obs_id, IntegerVector taxon_id,
                               IntegerVector plot_order) {
  int n_plots = plot_order.size();
  IntegerVector accum(n_plots);
  std::set<int> found_species;

  for (int i = 0; i < n_plots; i++) {
    int target_plot = plot_order[i];

    for (int j = 0; j < plot_obs_id.size(); j++) {
      if (plot_obs_id[j] == target_plot) {
        found_species.insert(taxon_id[j]);
      }
    }
    accum[i] = found_species.size();
  }
  return accum;
}
', includes = "#include <set>")

# Wrapper function for nn_walk (drop-in replacement)
# Wrapper-Funktion für nn_walk (direkter Ersatz)
nn_walk_fast <- function(header_data, start_idx = NULL) {
  if (is.null(start_idx)) start_idx <- sample(1:nrow(header_data), 1)
  nn_walk_cpp(header_data$Longitude, header_data$Latitude,
              header_data$PlotObservationID, start_idx)
}

# Wrapper for build_accumulation (requires pre-processing)
# Wrapper für build_accumulation (benötigt Vorverarbeitung)
build_accumulation_fast <- function(species_data, plot_order, status_filter = NULL) {
  if (!is.null(status_filter)) {
    species_data <- species_data[species_data$STATUS == status_filter, ]
  }
  # Convert taxon names to integers for speed
  taxon_factor <- as.integer(factor(species_data$WFO_TAXON))
  build_accum_cpp(species_data$PlotObservationID, taxon_factor, as.integer(plot_order))
}

# Replace the R functions / Ersetze die R-Funktionen
nn_walk <- nn_walk_fast
build_accumulation <- build_accumulation_fast

print("⚡ Rcpp turbo mode activated! Functions are now 10-50x faster.")
print("⚡ Rcpp Turbo-Modus aktiviert! Funktionen sind jetzt 10-50x schneller.")

Note: Requires Rcpp package and a C++ compiler (Rtools on Windows).Hinweis: Benötigt Rcpp-Paket und einen C++ Compiler (Rtools unter Windows).

1d: How fast is my code? (Complexity)1d: Wie schnell ist mein Code? (Komplexität)

Why does speed matter? With 100 plots, slow code takes seconds. With 10,000 plots, it could take hours!

What counts as “work”? To keep things simple, we count multiplications (and divisions). These are the “expensive” operations - additions and assignments are so fast we ignore them.

The problem - Nested loops: If you put a loop INSIDE a loop, multiplications multiply!

N (input size)	Multiplications (N×N)	Time
10	100	instant
100	10,000	instant
1,000	1,000,000	slow
10,000	100,000,000	very slow!

# O(N²) - gets slow fast!
for (i in 1:N) {
  for (j in 1:N) {
    # This runs N × N = N² times!
  }
}
# N=100 → 10,000 multiplications
# N=1000 → 1,000,000 multiplications (1000× slower!)

Our nn_walk function has this O(N²) behavior (it checks distance to all plots at each step). This is exactly when it makes sense to use a compiled language (like C++) instead of an interpreted language (like R). Compiled languages run the same operations 10-100× faster - that’s why we offer the Rcpp version!

Warum ist Geschwindigkeit wichtig? Mit 100 Plots dauert langsamer Code Sekunden. Mit 10.000 Plots könnte es Stunden dauern!

Was zählt als “Arbeit”? Um es einfach zu halten, zählen wir Multiplikationen (und Divisionen). Das sind die “teuren” Operationen - Additionen und Zuweisungen sind so schnell, dass wir sie ignorieren.

Das Problem - Verschachtelte Schleifen: Wenn du eine Schleife IN eine Schleife packst, multiplizieren sich die Multiplikationen!

N (Eingabegröße)	Multiplikationen (N×N)	Zeit
10	100	sofort
100	10.000	sofort
1.000	1.000.000	langsam
10.000	100.000.000	sehr langsam!

# O(N²) - wird schnell langsam!
for (i in 1:N) {
  for (j in 1:N) {
    # Das läuft N × N = N² mal!
  }
}
# N=100 → 10.000 Multiplikationen
# N=1000 → 1.000.000 Multiplikationen (1000× langsamer!)

Unsere nn_walk Funktion hat dieses O(N²) Verhalten (sie prüft die Distanz zu allen Plots bei jedem Schritt). Genau dann macht es Sinn, eine kompilierte Sprache (wie C++) statt einer interpretierten Sprache (wie R) zu verwenden. Kompilierte Sprachen führen die gleichen Operationen 10-100× schneller aus - deshalb bieten wir die Rcpp-Version an!

Exercise 2: Understanding Spatial SamplingÜbung 2: Räumliche Probenahme verstehen

Concept: When we sample plots randomly, we might jump all over the map. But a nearest-neighbour walk samples like a real ecologist would - always going to the closest unvisited plot. This creates a realistic spatial accumulation curve.

New concepts used here:

Concept	What it does	Example
`%in%`	Check if values are in a list	`filter(ID %in% my_ids)`
`scale_color_manual()`	Set custom colors for categories	`values = c("A" = "red", "B" = "blue")`

Konzept: Wenn wir Plots zufällig sampeln, springen wir vielleicht über die ganze Karte. Aber ein Nearest-Neighbour-Walk sampelt wie ein echter Ökologe - geht immer zum nächsten unbesuchten Plot. Das erstellt eine realistische räumliche Akkumulationskurve.

Neue Konzepte hier:

Konzept	Was es macht	Beispiel
`%in%`	Prüft ob Werte in einer Liste sind	`filter(ID %in% meine_ids)`
`scale_color_manual()`	Setzt eigene Farben für Kategorien	`values = c("A" = "red", "B" = "blue")`

2a: Create a sample dataset2a: Erstelle einen Beispieldatensatz

# Take a random sample of 150 plots (for speed)
# Nimm eine zufällige Stichprobe von 150 Plots (für Geschwindigkeit)
set.seed(42)  # For reproducibility / Für Reproduzierbarkeit
sample_size <- 150

sample_ids <- sample(unique(header$PlotObservationID), sample_size)
sample_header <- header %>% filter(PlotObservationID %in% sample_ids)
sample_species <- species %>% filter(PlotObservationID %in% sample_ids)

print(paste("Sample created:", nrow(sample_header), "plots"))

2b: Run a nearest-neighbour walk2b: Führe einen Nearest-Neighbour-Walk aus

# Start from plot 1 and walk to nearest neighbours
# Starte von Plot 1 und gehe zu nächsten Nachbarn
nn_order <- nn_walk(sample_header, start_idx = 1)

# Build curves for native and alien species
# Baue Kurven für heimische und Alien-Arten
native_curve <- build_accumulation(sample_species, nn_order, "____")
alien_curve <- build_accumulation(sample_species, nn_order, "____")

# Check the results
# Überprüfe die Ergebnisse
print(paste("Native species found:", max(native_curve)))
print(paste("Alien species found:", max(alien_curve)))

2c: Plot the comparison2c: Plotte den Vergleich

# Create data frame for plotting
# Erstelle Data Frame zum Plotten
curve_data <- data.frame(
  plots = rep(1:sample_size, 2),
  species = c(native_curve, alien_curve),
  status = rep(c("Native", "Alien"), each = sample_size)
)

ggplot(curve_data, aes(x = plots, y = species, color = status)) +
  geom_line(linewidth = 1.2) +
  scale_color_manual(values = c("Native" = "darkgreen", "Alien" = "____")) +
  labs(
    title = "Native vs Alien Species Accumulation",
    x = "Plots Sampled (Nearest-Neighbour Order)",
    y = "Cumulative Species",
    color = ""
  ) +
  theme_minimal()

Hint: Native species use STATUS = “nat”, alien species use STATUS = “neo”. Hinweis: Heimische Arten haben STATUS = “nat”, Alien-Arten haben STATUS = “neo”.

Exercise 3: Measuring UncertaintyÜbung 3: Unsicherheit messen

Concept: The curve we get depends on WHERE we start! If we start in a native-rich area, natives accumulate fast. If we start near an alien hotspot, aliens accumulate fast. We need to run from MANY starting points to see the true pattern.

New concepts in this exercise:

Concept	What it does	Example
`matrix(NA, nrow, ncol)`	Create empty table with rows & columns	`matrix(NA, nrow=20, ncol=100)`
`colMeans(mat)`	Mean of each column	`colMeans(mat)` returns one mean per column
`sd()`	Standard deviation (spread of values)	`sd(c(1,2,3,4,5))` returns 1.58
`quantile(x, p)`	Value at percentile p	`quantile(x, 0.975)` = 97.5th percentile
`geom_ribbon()`	Shaded band in ggplot	`geom_ribbon(aes(ymin, ymax))`
`alpha = 0.3`	Transparency (0=invisible, 1=solid)	Makes bands see-through

Why matrix() instead of data.frame()? - matrix = ALL values are the same type (only numbers). Faster for math! - data.frame = columns can be different types (text, numbers, TRUE/FALSE)

We use matrix here because we only store numbers and need to do fast calculations.

Konzept: Die Kurve die wir bekommen hängt davon ab WO wir starten! Starten wir in einem heimisch-reichen Gebiet, akkumulieren Heimische schnell. Starten wir nahe einem Alien-Hotspot, akkumulieren Aliens schnell. Wir müssen von VIELEN Startpunkten starten um das wahre Muster zu sehen.

Neue Konzepte in dieser Übung:

Konzept	Was es macht	Beispiel
`matrix(NA, nrow, ncol)`	Erstelle leere Tabelle mit Zeilen & Spalten	`matrix(NA, nrow=20, ncol=100)`
`colMeans(mat)`	Mittelwert jeder Spalte	`colMeans(mat)` gibt einen Mittelwert pro Spalte
`sd()`	Standardabweichung (Streuung der Werte)	`sd(c(1,2,3,4,5))` gibt 1.58
`quantile(x, p)`	Wert am Perzentil p	`quantile(x, 0.975)` = 97.5-tes Perzentil
`geom_ribbon()`	Schattiertes Band in ggplot	`geom_ribbon(aes(ymin, ymax))`
`alpha = 0.3`	Transparenz (0=unsichtbar, 1=solide)	Macht Bänder durchsichtig

Warum matrix() statt data.frame()? - matrix = ALLE Werte sind gleicher Typ (nur Zahlen). Schneller für Mathe! - data.frame = Spalten können verschiedene Typen sein (Text, Zahlen, TRUE/FALSE)

Wir benutzen matrix hier weil wir nur Zahlen speichern und schnelle Berechnungen brauchen.

3a: Run from multiple starting points3a: Von mehreren Startpunkten starten

# Run from 20 different starting points
# Starte von 20 verschiedenen Startpunkten
n_seeds <- 20

# Storage for results (matrix: rows = seeds, columns = plots)
# Speicher für Ergebnisse (Matrix: Zeilen = Seeds, Spalten = Plots)
native_runs <- matrix(NA, nrow = n_seeds, ncol = sample_size)
alien_runs <- matrix(NA, nrow = n_seeds, ncol = sample_size)

# Run from each starting point
# Starte von jedem Startpunkt
print(paste("Running", n_seeds, "starting points..."))
for (seed in 1:n_seeds) {
  nn_order <- nn_walk(sample_header, start_idx = seed)
  native_runs[seed, ] <- build_accumulation(sample_species, nn_order, "____")
  alien_runs[seed, ] <- build_accumulation(sample_species, nn_order, "____")

  if (seed %% 5 == 0) print(paste("  Completed", seed))
}
print("Done!")

3b: Calculate mean and confidence intervals3b: Berechne Mittelwert und Konfidenzintervalle

# Mean across all runs (colMeans = mean of each column)
# Mittelwert über alle Läufe (colMeans = Mittelwert jeder Spalte)
native_mean <- colMeans(____)
alien_mean <- colMeans(alien_runs)

# 95% confidence intervals using a loop
# 95% Konfidenzintervalle mit einer Schleife
native_lower <- numeric(sample_size)
native_upper <- numeric(sample_size)
alien_lower <- numeric(sample_size)
alien_upper <- numeric(sample_size)

for (i in 1:sample_size) {
  native_lower[i] <- quantile(native_runs[, i], 0.025)
  native_upper[i] <- quantile(native_runs[, i], ____)
  alien_lower[i] <- quantile(alien_runs[, i], ____)
  alien_upper[i] <- quantile(alien_runs[, i], 0.975)
}

# How much variation is there?
# Wie viel Variation gibt es?
midpoint <- sample_size %/% 2
native_cv <- round(100 * sd(native_runs[, midpoint]) / mean(native_runs[, midpoint]), 1)
alien_cv <- round(100 * sd(alien_runs[, midpoint]) / mean(alien_runs[, midpoint]), 1)

print("Variation at midpoint:")
print(paste("  Native CV:", native_cv, "%"))
print(paste("  Alien CV:", alien_cv, "%"))

3c: Plot with uncertainty bands3c: Plotte mit Unsicherheitsbändern

# Create summary data frame
# Erstelle Zusammenfassungs-Data-Frame
summary_data <- data.frame(
  plots = rep(1:sample_size, 2),
  mean = c(native_mean, alien_mean),
  lower = c(native_lower, alien_lower),
  upper = c(native_upper, alien_upper),
  status = rep(c("Native", "Alien"), each = sample_size)
)

# Plot with shaded confidence bands
# Plotte mit schattierten Konfidenzbändern
ggplot(summary_data, aes(x = plots)) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = status), alpha = ____) +
  geom_line(aes(y = mean, color = status), linewidth = 1.2) +
  scale_color_manual(values = c("Native" = "darkgreen", "Alien" = "red")) +
  scale_fill_manual(values = c("Native" = "darkgreen", "Alien" = "red")) +
  labs(
    title = "Species Accumulation with Uncertainty",
    subtitle = paste(n_seeds, "starting points - shaded = 95% CI"),
    x = "Plots Sampled",
    y = "Cumulative Species",
    color = "", fill = ""
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

Hint: Use mean for the average. For 95% CI, upper bound is 0.975 quantile, lower is 0.025. Hinweis: Benutze mean für den Durchschnitt. Für 95% CI ist obere Grenze 0.975 Quantil, untere 0.025.

Exercise 4: Comparing Saturation SpeedÜbung 4: Sättigungsgeschwindigkeit vergleichen

Concept: Saturation is when we’ve found most of the species and the curve flattens. If natives saturate FASTER than aliens, it means we encounter most native species quickly as we move across the landscape. If aliens saturate SLOWER, different alien species occur in different regions — high spatial turnover.

New concept - Boxplots:

geom_boxplot() shows the distribution of values: - Box = middle 50% of data (25th to 75th percentile) - Line inside = median (middle value) - Whiskers = extend to ~1.5× the box height - Dots = outliers (extreme values)

Konzept: Sättigung ist wenn wir die meisten Arten gefunden haben und die Kurve abflacht. Wenn Heimische SCHNELLER saturieren, bedeutet das, dass wir die meisten heimischen Arten schnell finden wenn wir durch die Landschaft gehen. Wenn Aliens LANGSAMER saturieren, kommen verschiedene Alien-Arten in verschiedenen Regionen vor — hoher räumlicher Turnover.

Neues Konzept - Boxplots:

geom_boxplot() zeigt die Verteilung von Werten: - Box = mittlere 50% der Daten (25. bis 75. Perzentil) - Linie innen = Median (mittlerer Wert) - Whiskers = erstrecken sich ~1.5× die Boxhöhe - Punkte = Ausreißer (extreme Werte)

4a: Find saturation points4a: Finde Sättigungspunkte

# At what point do we reach 80% of species?
# Bei welchem Punkt erreichen wir 80% der Arten?
native_sat <- numeric(n_seeds)
alien_sat <- numeric(n_seeds)

for (i in 1:n_seeds) {
  native_sat[i] <- find_saturation(native_runs[i, ], threshold = 0.8)
  alien_sat[i] <- find_saturation(alien_runs[i, ], threshold = ____)
}

# Convert to percentage of total plots
# In Prozent der Gesamtplots umwandeln
native_sat_pct <- 100 * native_sat / sample_size
alien_sat_pct <- 100 * alien_sat / sample_size

print("Plots needed to reach 80% of species:")
print(paste("  Native:", round(mean(native_sat_pct)), "% (SD:", round(sd(native_sat_pct), 1), ")"))
print(paste("  Alien:", round(mean(alien_sat_pct)), "% (SD:", round(sd(alien_sat_pct), 1), ")"))

4b: Visualize with boxplot4b: Visualisiere mit Boxplot

# Create data for boxplot
# Erstelle Daten für Boxplot
sat_data <- data.frame(
  saturation_pct = c(native_sat_pct, alien_sat_pct),
  status = rep(c("Native", "Alien"), each = n_seeds)
)

ggplot(sat_data, aes(x = status, y = saturation_pct, fill = status)) +
  geom_boxplot() +
  scale_fill_manual(values = c("Native" = "____", "Alien" = "____")) +
  labs(
    title = "How Fast Do Species Saturate?",
    subtitle = "% of plots needed to find 80% of species",
    y = "% of Plots Needed",
    x = ""
  ) +
  theme_minimal() +
  theme(legend.position = "none")

4c: Calculate slope ratio4c: Berechne Steigungsverhältnis

# Compare early vs late slopes
# Vergleiche frühe vs späte Steigungen
# High ratio = fast saturation (steep early, flat late)
# Hohes Verhältnis = schnelle Sättigung (steil früh, flach spät)

calc_slope_ratio <- function(curve) {
  n <- length(curve)
  early_end <- round(n * 0.2)
  late_start <- round(n * 0.8)

  early_slope <- (curve[early_end] - curve[1]) / early_end
  late_slope <- (curve[n] - curve[late_start]) / (n - late_start)

  if (late_slope > 0) return(early_slope / late_slope)
  else return(NA)
}

native_ratios <- numeric(n_seeds)
alien_ratios <- numeric(n_seeds)

for (i in 1:n_seeds) {
  native_ratios[i] <- calc_slope_ratio(native_runs[i, ])
  alien_ratios[i] <- calc_slope_ratio(alien_runs[i, ])
}

print("Slope ratio (higher = faster saturation):")
print(paste("  Native:", round(mean(native_ratios, na.rm = TRUE), 1)))
print(paste("  Alien:", round(mean(alien_ratios, na.rm = TRUE), 1)))

Hint: Fill in colors “darkgreen” and “red”. Hinweis: Fülle Farben “darkgreen” und “red” ein.

Exercise 5: Geographic PatternsÜbung 5: Geografische Muster

Concept: Does it matter WHERE in Austria we start? Some areas might have more aliens (near cities, roads). Let’s map how starting location affects what we find!

New ggplot concepts for mapping:

Concept	What it does	Example
`coord_quickmap()`	Correct map proportions (so Austria doesn’t look stretched!)	Add at end of ggplot
`scale_color_gradient2()`	Color scale with 3 colors (low→mid→high)	`low="green", mid="yellow", high="red"`
`midpoint = value`	Where the middle color appears	`midpoint = 0.5` for 50%

Konzept: Ist es wichtig WO in Österreich wir starten? Manche Gebiete haben vielleicht mehr Aliens (nahe Städten, Straßen). Kartieren wir wie der Startort beeinflusst was wir finden!

Neue ggplot-Konzepte für Karten:

Konzept	Was es macht	Beispiel
`coord_quickmap()`	Korrekte Kartenproportion (damit Österreich nicht gestreckt aussieht!)	Am Ende von ggplot hinzufügen
`scale_color_gradient2()`	Farbskala mit 3 Farben (niedrig→mittel→hoch)	`low="green", mid="yellow", high="red"`
`midpoint = wert`	Wo die mittlere Farbe erscheint	`midpoint = 0.5` für 50%

5a: Map alien proportion by starting location5a: Kartiere Alien-Anteil nach Startort

# Test from 30 different starting locations
# Teste von 30 verschiedenen Startorten
n_map_seeds <- 30
checkpoint <- 75  # Check after this many plots

seed_results <- data.frame(
  seed_idx = 1:n_map_seeds,
  seed_lon = numeric(n_map_seeds),
  seed_lat = numeric(n_map_seeds),
  native_count = numeric(n_map_seeds),
  alien_count = numeric(n_map_seeds)
)

for (i in 1:n_map_seeds) {
  # Record starting location
  # Erfasse Startort
  seed_results$seed_lon[i] <- sample_header$Longitude[i]
  seed_results$seed_lat[i] <- sample_header$Latitude[i]

  # Run from this starting point
  # Starte von diesem Startpunkt
  nn_order <- nn_walk(sample_header, start_idx = i)

  # Get counts at checkpoint
  # Hole Zählungen am Checkpoint
  native_curve <- build_accumulation(sample_species, nn_order, "nat")
  alien_curve <- build_accumulation(sample_species, nn_order, "neo")

  seed_results$native_count[i] <- native_curve[min(checkpoint, length(native_curve))]
  seed_results$alien_count[i] <- alien_curve[min(checkpoint, length(alien_curve))]
}

# Calculate alien proportion
# Berechne Alien-Anteil
seed_results$alien_prop <- seed_results$alien_count /
  (seed_results$native_count + seed_results$____)

# Map it!
# Kartiere es!
ggplot(seed_results, aes(x = seed_lon, y = seed_lat, color = alien_prop)) +
  geom_point(size = 4) +
  scale_color_gradient2(
    low = "darkgreen", mid = "yellow", high = "red",
    midpoint = mean(seed_results$alien_prop),
    name = "Alien\nproportion"
  ) +
  coord_quickmap() +
  labs(
    title = "How Starting Location Affects Alien Detection",
    subtitle = paste("Alien proportion after", checkpoint, "plots"),
    x = "Longitude", y = "Latitude"
  ) +
  theme_minimal()

Hint: Fill in alien_count to complete the proportion calculation. Hinweis: Fülle alien_count ein um die Anteilsberechnung zu vervollständigen.

Exercise 6: Putting It All TogetherÜbung 6: Alles zusammenfügen

Concept: Let’s put it all together and create a polished figure that shows our main finding: how native vs alien species accumulate differently across Austrian plots.

Konzept: Fassen wir alles zusammen und erstellen eine schöne Abbildung die unser Hauptergebnis zeigt: wie heimische vs Alien-Arten unterschiedlich über österreichische Plots akkumulieren.

6a: Run full analysis6a: Führe vollständige Analyse aus

# Larger sample for final analysis
# Größere Stichprobe für finale Analyse
n_seeds <- 30
sample_size <- 200

set.seed(2024)
sample_ids <- sample(unique(header$PlotObservationID), sample_size)
sample_header <- header %>% filter(PlotObservationID %in% sample_ids)
sample_species <- species %>% filter(PlotObservationID %in% sample_ids)

# Run from all seeds
# Starte von allen Seeds
native_runs <- matrix(NA, nrow = n_seeds, ncol = sample_size)
alien_runs <- matrix(NA, nrow = n_seeds, ncol = sample_size)

print("Running full analysis...")
for (seed in 1:n_seeds) {
  nn_order <- nn_walk(sample_header, start_idx = seed)
  native_runs[seed, ] <- build_accumulation(sample_species, nn_order, "nat")
  alien_runs[seed, ] <- build_accumulation(sample_species, nn_order, "neo")
  if (seed %% 10 == 0) print(paste("  Completed", seed, "of", n_seeds))
}
print("Done!")

# Calculate summaries using colMeans and loops
# Berechne Zusammenfassungen mit colMeans und Schleifen
native_mean <- colMeans(native_runs)
alien_mean <- colMeans(alien_runs)

native_lower <- numeric(sample_size)
native_upper <- numeric(sample_size)
alien_lower <- numeric(sample_size)
alien_upper <- numeric(sample_size)

for (i in 1:sample_size) {
  native_lower[i] <- quantile(native_runs[, i], 0.025)
  native_upper[i] <- quantile(native_runs[, i], 0.975)
  alien_lower[i] <- quantile(alien_runs[, i], 0.025)
  alien_upper[i] <- quantile(alien_runs[, i], 0.975)
}

results <- data.frame(
  plots = rep(1:sample_size, 2),
  mean = c(native_mean, alien_mean),
  lower = c(native_lower, alien_lower),
  upper = c(native_upper, alien_upper),
  status = rep(c("Native", "Alien"), each = sample_size)
)

6b: Create polished figure6b: Erstelle schöne Abbildung

# Polished figure
# Schöne Abbildung
final_plot <- ggplot(results, aes(x = plots)) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = status), alpha = 0.25) +
  geom_line(aes(y = mean, color = status), linewidth = 1.3) +
  scale_color_manual(values = c("Native" = "#228B22", "Alien" = "#DC143C")) +
  scale_fill_manual(values = c("Native" = "#228B22", "Alien" = "#DC143C")) +
  labs(
    title = "Species Accumulation: Native vs Alien Plants in Austria",
    subtitle = paste0(n_seeds, " starting points, 95% confidence intervals"),
    x = "Number of Plots Sampled",
    y = "Cumulative Species Count",
    color = "", fill = ""
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position = c(0.85, 0.2),
    legend.background = element_rect(fill = "white", color = "gray80"),
    plot.title = element_text(face = "bold")
  )

print(final_plot)

# Save it (uncomment to run)
# ggsave("austria_accumulation.png", final_plot, width = 10, height = 7, dpi = 300)

Exercise 7: Interpret Your ResultsÜbung 7: Interpretiere deine Ergebnisse

Concept: Science isn’t just about making plots - it’s about understanding what they mean! Let’s summarize our findings.

Konzept: Wissenschaft bedeutet nicht nur Plots zu erstellen - es geht darum zu verstehen was sie bedeuten! Fassen wir unsere Ergebnisse zusammen.

7a: Generate results summary7a: Generiere Ergebniszusammenfassung

print("========== RESULTS SUMMARY ==========")

# Total species found
print("Species found:")
print(paste("  Native:", round(mean(native_runs[, sample_size])), "species"))
print(paste("  Alien:", round(mean(alien_runs[, sample_size])), "species"))

# Saturation comparison (using a loop)
native_sat <- numeric(n_seeds)
alien_sat <- numeric(n_seeds)
for (i in 1:n_seeds) {
  native_sat[i] <- find_saturation(native_runs[i, ], threshold = 0.8)
  alien_sat[i] <- find_saturation(alien_runs[i, ], threshold = 0.8)
}

print("Plots to reach 80% of species:")
print(paste("  Native:", round(mean(native_sat)), "plots (",
    round(100*mean(native_sat)/sample_size), "%)"))
print(paste("  Alien:", round(mean(alien_sat)), "plots (",
    round(100*mean(alien_sat)/sample_size), "%)"))

# Draw conclusion
print("========== CONCLUSION ==========")
if (mean(native_sat) < mean(alien_sat)) {
  print("Native species saturate FASTER than aliens!")
  print("-> We find most native species quickly across the landscape")
  print("-> Different alien species occur in different regions (high spatial turnover)")
} else {
  print("Aliens saturate FASTER than natives!")
  print("-> Aliens are more widespread than expected")
}

Discussion Questions: 1. Do natives really saturate faster? What does this mean ecologically? 2. Why might different alien species occur in different regions? (Think about how they arrived) 3. What would you investigate next? (Urban areas? Elevation? Climate?) Diskussionsfragen: 1. Saturieren Heimische wirklich schneller? Was bedeutet das ökologisch? 2. Warum könnten verschiedene Alien-Arten in verschiedenen Regionen vorkommen? (Denke daran wie sie angekommen sind) 3. Was würdest du als nächstes untersuchen? (Städtische Gebiete? Höhenlage? Klima?)

Quick ReferenceSchnellreferenz

Function	Purpose	Example
`colMeans()`	Mean of each column	`colMeans(mat)`
`quantile()`	Get percentiles	`quantile(x, 0.975)`
`geom_ribbon()`	Add shaded band	`+ geom_ribbon(aes(ymin, ymax))`
`scale_color_manual()`	Custom colors	`+ scale_color_manual(values = c(...))`
`scale_color_gradient2()`	Diverging color scale	`+ scale_color_gradient2(...)`
`ggsave()`	Save plot to file	`ggsave("plot.png", width = 10)`
`theme()`	Customize plot appearance	`+ theme(legend.position = "bottom")`
`element_rect()`	Rectangle element for themes	`element_rect(fill = "white")`
`element_text()`	Text element for themes	`element_text(face = "bold")`

Funktion	Zweck	Beispiel
`colMeans()`	Mittelwert jeder Spalte	`colMeans(mat)`
`quantile()`	Perzentile berechnen	`quantile(x, 0.975)`
`geom_ribbon()`	Schattiertes Band	`+ geom_ribbon(aes(ymin, ymax))`
`scale_color_manual()`	Eigene Farben	`+ scale_color_manual(values = c(...))`
`scale_color_gradient2()`	Divergierende Farbskala	`+ scale_color_gradient2(...)`
`ggsave()`	Plot in Datei speichern	`ggsave("plot.png", width = 10)`
`theme()`	Plot-Erscheinung anpassen	`+ theme(legend.position = "bottom")`
`element_rect()`	Rechteck-Element für Themes	`element_rect(fill = "white")`
`element_text()`	Text-Element für Themes	`element_text(face = "bold")`

Day 2: ExercisesTag 2: Übungen

Advanced Analysis & Research ProjectFortgeschrittene Analyse & Forschungsprojekt

Fill in the blanks!Fülle die Lücken aus!

2026-01-28

Part 2: Advanced Analysis - Welcome to Day 2!Teil 2: Fortgeschrittene Analyse - Willkommen zu Tag 2!

Exercise 1: Setup - Load Data and FunctionsÜbung 1: Setup - Daten und Funktionen laden

1a: Load packages and data1a: Pakete und Daten laden

1b: New R Concepts for Today1b: Neue R-Konzepte für heute

1c: Define helper functions (just copy & run!)1c: Hilfsfunktionen definieren (einfach kopieren & ausführen!)

🔨 Building the Functions Step-by-Step

⚡ Rcpp Turbo Functions (if R is too slow)

1d: How fast is my code? (Complexity)1d: Wie schnell ist mein Code? (Komplexität)

Exercise 2: Understanding Spatial SamplingÜbung 2: Räumliche Probenahme verstehen

2a: Create a sample dataset2a: Erstelle einen Beispieldatensatz

2b: Run a nearest-neighbour walk2b: Führe einen Nearest-Neighbour-Walk aus

2c: Plot the comparison2c: Plotte den Vergleich

Exercise 3: Measuring UncertaintyÜbung 3: Unsicherheit messen

3a: Run from multiple starting points3a: Von mehreren Startpunkten starten

3b: Calculate mean and confidence intervals3b: Berechne Mittelwert und Konfidenzintervalle

3c: Plot with uncertainty bands3c: Plotte mit Unsicherheitsbändern

Exercise 4: Comparing Saturation SpeedÜbung 4: Sättigungsgeschwindigkeit vergleichen

4a: Find saturation points4a: Finde Sättigungspunkte

4b: Visualize with boxplot4b: Visualisiere mit Boxplot

4c: Calculate slope ratio4c: Berechne Steigungsverhältnis

Exercise 5: Geographic PatternsÜbung 5: Geografische Muster

5a: Map alien proportion by starting location5a: Kartiere Alien-Anteil nach Startort

Exercise 6: Putting It All TogetherÜbung 6: Alles zusammenfügen

6a: Run full analysis6a: Führe vollständige Analyse aus

6b: Create polished figure6b: Erstelle schöne Abbildung

Exercise 7: Interpret Your ResultsÜbung 7: Interpretiere deine Ergebnisse

7a: Generate results summary7a: Generiere Ergebniszusammenfassung

Quick ReferenceSchnellreferenz