Motivation
Implementation (Backend)
Motivation
Implementation (Backend)
Implementation (Frontend)
Motivation
Implementation (Backend)
Implementation (Frontend)
Outlook & sources
Which parties will pass the 5% threshold and enter the "Bundestag" (German parliament)?
and
Which parties will form the governing coalition (currently Union - SPD, so called grand coalition)?
For the 2017 election also of special interest
Which party will have the 3rd largest share of votes?
To answer these questions, pundits and writers usually focus on raw voting intention polls:
"Which party would you vote for if election was today?"
Union | SPD | Greens | FDP | The Left | Pirates | AfD | Others |
---|---|---|---|---|---|---|---|
40% | 26% | 10% | 5% | 9% | 2% | 4% | 4% |
Sample uncertainty is ignored (even if the sample is representative, we would expect individual polls to deviate from the true shares).
Redistribution of votes is ignored (all votes for parties that do not pass the 5% threshold are redistributed proportionally to parties that pass the threshold).
Overreaction to individual polls (Some polls can be "off" or only depict the voting intention in a very short time-period; different weighting methods used by different pollsters)
Union | SPD | Greens | FDP | The Left | Pirates | AfD | Others |
---|---|---|---|---|---|---|---|
40% | 26% | 10% | 5% | 9% | 2% | 4% | 4% |
Union | SPD | Greens | FDP | The Left |
---|---|---|---|---|
44.44% | 28.89% | 11.11% | 5.56% | 10% |
forsa_13$percentround(gtools::rdirichlet(3, 1995*forsa_13$percent+0.5), 4)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]## [1,] 0.4022 0.2634 0.1014 0.0439 0.0894 0.0157 0.0425 0.0414## [2,] 0.4206 0.2526 0.0915 0.0446 0.0949 0.0210 0.0336 0.0413## [3,] 0.4152 0.2629 0.0965 0.0518 0.0872 0.0154 0.0356 0.0353
round(gtools::rdirichlet(3, 20*forsa_13$percent+0.5), 4)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]## [1,] 0.4861 0.2415 0.0551 0.0123 0.0663 0.0148 0.0969 0.0270## [2,] 0.2443 0.1557 0.2510 0.1100 0.0548 0.0205 0.0898 0.0738## [3,] 0.4347 0.2295 0.0750 0.1335 0.0505 0.0016 0.0732 0.0021
n=10000
, FDP would not pass the 5% threshold in 50% of the casesx>50
P(event|sample)=#simulations with event#simulations
aggregate polls from different pollsters within a 14-day time-window (pooled survey)
Calculate the Posterior Dirichlet distribution (based on Multinomial Likelihood and flat/uninformative Dirichlet Prior)
Calculate "secondary" properties (e.g. probability that Union-FDP would have simple majority) via Monte-Carlo sampling
Simulate election outcomes from known Posterior (based on current pooled survey)
P(event|sample)=#eventnumber of simulations
Backend implemented in the R-package coalitions
(see Workflow vignette)
scrapes wahlrecht.de for (new) polls
(calculates pooled sample)
calculate and sample from Posteriori
Redistribute votes below 5% threshold and calculate Seats based on method by Sainte-Lague-Scheppers (German Law)
Calculate coalition probabilities
devtools::install_github("adibender/coalitions")
tibble
)surveys <- get_surveys()surveys
## # A tibble: 7 x 2## pollster surveys ## <chr> <list> ## 1 allensbach <tibble [41 × 5]> ## 2 emnid <tibble [222 × 5]>## 3 forsa <tibble [231 × 5]>## 4 fgw <tibble [82 × 5]> ## 5 gms <tibble [96 × 5]> ## 6 infratest <tibble [107 × 5]>## 7 insa <tibble [301 × 5]>
surveys %>% unnest() %>% select(-start, -end)
## # A tibble: 1,080 x 4## pollster date respondents survey ## <chr> <date> <dbl> <list> ## 1 allensbach 2018-01-25 1221 <tibble [7 × 3]>## 2 allensbach 2017-12-21 1443 <tibble [7 × 3]>## 3 allensbach 2017-11-30 1299 <tibble [7 × 3]>## 4 allensbach 2017-10-25 1454 <tibble [7 × 3]>## 5 allensbach 2017-09-22 1074 <tibble [7 × 3]>## 6 allensbach 2017-09-19 1083 <tibble [7 × 3]>## 7 allensbach 2017-09-06 1043 <tibble [7 × 3]>## 8 allensbach 2017-08-22 1421 <tibble [7 × 3]>## 9 allensbach 2017-07-18 1403 <tibble [7 × 3]>## 10 allensbach 2017-06-20 1437 <tibble [7 × 3]>## # ... with 1,070 more rows
surveys %>% unnest() %>% slice(1) %>% unnest() %>% select(-start, -end)
## # A tibble: 7 x 6## pollster date respondents party percent votes## <chr> <date> <dbl> <chr> <dbl> <dbl>## 1 allensbach 2018-01-25 1221 cdu 34.0 415 ## 2 allensbach 2018-01-25 1221 spd 21.0 256 ## 3 allensbach 2018-01-25 1221 greens 10.5 128 ## 4 allensbach 2018-01-25 1221 fdp 10.0 122 ## 5 allensbach 2018-01-25 1221 left 8.50 104 ## 6 allensbach 2018-01-25 1221 afd 12.0 147 ## 7 allensbach 2018-01-25 1221 others 4.00 48.8
pooled_survey <- surveys %>% pool_surveys()pooled_survey %>% select(-start, -end)
## # A tibble: 7 x 6## pollster date respondents party percent votes## <chr> <date> <dbl> <chr> <dbl> <dbl>## 1 pooled 2018-02-13 2533 afd 13.9 352## 2 pooled 2018-02-13 2533 cdu 31.4 796## 3 pooled 2018-02-13 2533 fdp 9.44 239## 4 pooled 2018-02-13 2533 greens 12.5 317## 5 pooled 2018-02-13 2533 left 10.5 266## 6 pooled 2018-02-13 2533 others 4.20 106## 7 pooled 2018-02-13 2533 spd 18.0 455
draws <- pooled_survey %>% draw_from_posterior(seed=123)draws[1:6, ]
## # A tibble: 6 x 7## afd cdu fdp greens left others spd## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 0.135 0.327 0.0841 0.126 0.116 0.0438 0.169## 2 0.125 0.323 0.0951 0.126 0.104 0.0392 0.188## 3 0.148 0.320 0.0934 0.126 0.101 0.0425 0.169## 4 0.133 0.307 0.0940 0.125 0.106 0.0462 0.189## 5 0.142 0.312 0.0957 0.116 0.103 0.0411 0.189## 6 0.132 0.322 0.0900 0.126 0.103 0.0423 0.184
## calculate probabilities to pass 5% thresholddraws %>% summarize_all(funs(mean(.>0.05)))
## # A tibble: 1 x 7## afd cdu fdp greens left others spd## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 1.00 1.00 1.00 1.00 1.00 0.0274 1.00
seats <- get_seats(draws, pooled_survey, distrib.fun=sls, hurdle=0.05)seats
## # A tibble: 60,000 x 3## sim party seats## <int> <chr> <int>## 1 1 afd 84## 2 1 cdu 204## 3 1 fdp 53## 4 1 greens 79## 5 1 left 73## 6 1 spd 105## 7 2 afd 78## 8 2 cdu 201## 9 2 fdp 59## 10 2 greens 78## # ... with 59,990 more rows
probs <- seats %>% have_majority() %>% calculate_probs(coalitions=list(c("cdu", "fdp"), c("cdu", "fdp", "greens")))probs
## # A tibble: 2 x 2## coalition probability## <chr> <dbl>## 1 cdu_fdp 0## 2 cdu_fdp_greens 100
set.seed(123)pooled_survey %>% nest(party:votes, .key="survey") %>% get_probabilities(seed=123, nsim=1e4) %>% unnest()
## # A tibble: 6 x 4## pollster date coalition probability## <chr> <date> <chr> <dbl>## 1 pooled 2018-02-13 cdu 0## 2 pooled 2018-02-13 cdu_fdp 0## 3 pooled 2018-02-13 cdu_fdp_greens 100## 4 pooled 2018-02-13 spd 0## 5 pooled 2018-02-13 left_spd 0## 6 pooled 2018-02-13 greens_left_spd 0
We visualize the posterior via "ridgeline plots" (formerly Joy plots)
Uses ggplot
, ggridges
, gganimate
gg_distrb <- ggplot(schw_gelb, aes(x = percent, y = date2, group=date2, frame=date, cumulative=TRUE, fill=..x..>50)) + geom_density_ridges_gradient(scale = 10, size = 0.25) + geom_vline(xintercept = 50, lty=1, lwd=1.2, col="grey90") + scale_fill_manual( name = "Rechnerische Sitzemehrheit", breaks = c("FALSE", "TRUE"), labels = c("Nein", "Ja"), values = c("#d1d0ce", "steelblue") ) + theme(legend.position = "bottom") + scale_x_continuous(labels = function(x) paste0(x, "%")) + scale_y_continuous(trans = rev_date) + guides(fill=guide_legend(override.aes=list(alpha=1))) + theme( axis.text = element_text(size = rel(1.3)), axis.title.y = element_blank(), axis.title.x = element_blank(), legend.text = element_text(size = rel(1.2)), legend.title = element_text(size = rel(1.3))) + labs( title = "Wahrscheinlichkeitsverteilung der Stimmanteile für Union - FDP", subtitle = "(seit Oktober 2016)")
gg_distrb <- ggplot(schw_gelb, aes(x = percent, y = date, group=date, frame=date, cumulative=TRUE, fill=..x..>50)) + geom_density_ridges_gradient(scale = 10, size = 0.25) + # geom_vline(xintercept = 50, lty=1, lwd=1.2, col="grey90") + # scale_fill_manual( # name = "Rechnerische Sitzemehrheit", # breaks = c("FALSE", "TRUE"), # labels = c("Nein", "Ja"), # values = c("#d1d0ce", "steelblue") ) + # theme(legend.position = "bottom") + # scale_x_continuous(labels = function(x) paste0(x, "%")) + # scale_y_continuous(trans = rev_date) + # guides(fill=guide_legend(override.aes=list(alpha=1))) + # theme( # axis.text = element_text(size = rel(1.3)), # axis.title.y = element_blank(), # axis.title.x = element_blank(), # legend.text = element_text(size = rel(1.2)), # legend.title = element_text(size = rel(1.3))) + # labs( # title = "Wahrscheinlichkeitsverteilung der Stimmanteile für Union - FDP", # subtitle = "(seit Oktober 2016)")
ggplot
, you are able to gganimate
!gganimate
package currently not on CRAN, install via:devtools::install_github("drgtwo/gganimate")
library(ggplot2)library(ggridges)library(gganimate)gg_distrb <- ggplot(schw_gelb, aes(x = percent, y = date, group=date, frame=date, cumulative=TRUE, fill=..x..>50)) + geom_density_ridges_gradient(scale = 10, size = 0.25)gganimate(gg_distrb, "output.gif", interval=.2, ani.width=600)
ani.options
)interval
argument →
higher speed)frame
argument in the call to ggplot
(this is
the variable over which the animation will iterate)cumulative=TRUE
current frame also contains previous framesalpha
argument does not work with *_gradient
geoms1) Creating a homepage with Shiny
2) Setting up the server with Shiny Server
3) APIs and stuff: tweetR and googlesheets
4) Keep the website running
Shiny in a nutshell:
R package by RStudio
Web application framework
Creation of interactive dashboards, running R in the background
Resources: Applied R Shiny Meetup, shiny.rstudio.com
Tip: Use shinydashboard for a more appealing dashboard UI
Shiny in a nutshell:
R package by RStudio
Web application framework
Creation of interactive dashboards, running R in the background
Resources: Applied R Shiny Meetup, shiny.rstudio.com
Tip: Use shinydashboard for a more appealing dashboard UI
Why Shiny?
Easy integration of interactive R output and calculations
No need for learning another language, Shiny creates the HTML, CSS and JavaScript for you!
library(shiny)ui <- fluidPage( selectInput("plotType_picker", "Pick plot type", choices = c("boxplot","histogram")), plotOutput("my_plot"))server <- function(input, output) { x <- rnorm(100) output$my_plot <- renderPlot({ if (input$plotType_picker == "boxplot") { boxplot(x) } else hist(x) })}shinyApp(ui = ui, server = server)
Shiny Server in a nutshell:
Linux-based open source web server by RStudio
Access to the homepage starts an R process with a Shiny app on the server
Resources:
Setting up Shiny Server (on 64-bit Ubuntu 12.04+): (administrator's guide)
1) Install R and all needed R packages on the server
2) Install Shiny Server (install guide)
$ sudo apt-get install gdebi-core$ sudo wget https://download3.rstudio.org/ubuntu-12.04/x86_64/shiny-server-1.5.3.838-amd64.deb$ sudo gdebi shiny-server-1.5.3.838-amd64.deb
3) Customize the shiny-server.conf file to your needs
'sanitize_errors false;'
gets you clearer error messages'app_idle_timeout 0;'
saves startup time of the R process4) Put the Shiny files inside the path specified in shiny.server.conf
Capabilities of the free version (Shiny Server Open Source):
For non-commercial projects
Up to 20 users simultaneously
No multiple R processes!
→ code efficiently and precalculate results where possible
Capabilities of the free version (Shiny Server Open Source):
For non-commercial projects
Up to 20 users simultaneously
No multiple R processes!
→ code efficiently and precalculate results where possible
Shiny Server Pro: Commercial use, multiple R processes etc.
The tweetR package in a nutshell:
Our use case: Send tweets with new results
Resources: user vignette, tweetR on GitHub
Alternative (more modern) package: rtweet
Sending a Tweet with tweetR:
1) Register a new Twitter app on apps.twitter.com
2) Use the credentials to do the authorization with R
setup_twitter_oauth(consumer_key = "your_consumer_key", consumer_secret = "your_consumer_secret", access_token = "your_access_token", access_secret = "your_access_secret")
3) Start tweeting!
tweet(message = "Tweet tweet", mediaPath = "my_picture.png")
The googlesheets package in a nutshell:
Our use case: Offering an API for our results
Resources: googlesheets on GitHub
Exporting a table to Google Sheets with googlesheets:
1) Extract your credentials:
auth_info <- gs_auth()saveRDS(auth_info, file = "auth_info.rds")
2) Use the credentials to do the authorization with R
gs_auth(token = "auth_info.rds")
3) Start uploading!
my_table <- data.frame("person" = c("Sepp","Uli","Franz"), "likes_koala" = c("yes", "yes", "yes"))write.csv(my_table, file = "my_table.csv")gs_upload("my_table.csv", sheet_title = "my_googleSheet", overwrite = TRUE)
Automation of the server:
We check hourly if new surveys are available and update the results
Implementation: see next slide
Tip: Automatic error notification using Pushbullet
Supports notifications to all major (Desktop and mobile) systems
R package: RPushbullet on GitHub
1) Set up the R script
while (1 < 2) { # do something eternally # Step 1: check for new surveys and perform calculations update_results() # Step 2: Update services with new results if (new_results) { # if new results are available send_tweet() export_googleSheets() # restart the server to fetch the new results (on Ubuntu 15.04+) system("sudo systemctl restart shiny-server") } # Step 3: Rest for an hour Sys.sleep(60*10)}
2) Start the R script on the server
R CMD BATCH update_results_everyHour.R &
1) Future plans
2) Sources
Make frontend R package (shiny app) open source
Simplify data handling, Create online data base (with API), etc.
Extend the framework to:
General sources:
Raw voting intention polls: www.wahlrecht.de
Our slides are powered by Xaringan
tidyverse (previously hadleyverse)
General sources:
Raw voting intention polls: www.wahlrecht.de
Our slides are powered by Xaringan
tidyverse (previously hadleyverse)
How to reach us?
Contact us directly: koala@stat.uni-muenchen.de
Feel free to contribute on GitHub
Keep in mind:
We do not make predictions!
Many voters (~40%) still undecided
Keep in mind:
We do not make predictions!
Many voters (~40%) still undecided
... and follow us on Twitter :-)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |