Shapefiles in R using sf package
We will leave theory for some later posts and concentrate on a practical aspect for now. Our goal is to be able to create maps. To do that, we need to know the boundaries of areas i.e. countries, provinces or districts. That kind of data is usually stored in a special type of file, shapefile. This is clearly and oversimplification, but, as it is useful, lets stick to it for a moment.
There are a number of R packages to handle shapefiles. Most important are rgdal, sp; however we will focus on a relatively new addition - sf package that follows tidy data philosophy. Please look into the tutorial linked to get more info on how sf package works and learn a bit more about spatial data.
In our example we will use data on districts in Wroclaw. As mentioned before, data is in .shp file. Let’s read it into R and check what kind of object is it.
districts_wroclaw <- sf::read_sf('../../static/data/districts_wroclaw/districts_wroclaw.shp',
options = "ENCODING=UTF-8")
class(districts_wroclaw)
## [1] "sf" "tbl_df" "tbl" "data.frame"
head(districts_wroclaw)
## Simple feature collection with 6 features and 9 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 16.89252 ymin: 51.04515 xmax: 17.10723 ymax: 51.16599
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 6 x 10
## OBJECTI NROSIED NAZWAOS DATA SHAPE_A SHAPE_L old_dst dstrct_ inhbtnt
## <chr> <int> <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl>
## 1 340 21 KrzykiP… 2016/… 5254965 9995 Krzyki Krzyki-… 14.0
## 2 341 24 GądówPo… 2016/… 3134569 7589 Fabryc… Gądów-P… 26.2
## 3 348 42 Sołtyso… 2016/… 4547041 9004 Psie P… Sołtyso… 2.90
## 4 349 18 Bieńkow… 2016/… 1433161 4760 Krzyki Bieńkow… 0.500
## 5 351 32 Żerniki 2016/… 3908726 9133 Fabryc… Żerniki 3.40
## 6 352 29 Muchobó… 2016/… 6879545 13228 Fabryc… Muchobó… 8.30
## # ... with 1 more variable: geometry <POLYGON [°]>
Interesting! Apart from being special sf object, it is also a regular data.frame. Thanks to that, one can use packages like dplyr and tidyr directly with spatial data. This might not sounds that exciting, but it is a huge improvement compared to what was available in R year ago.
It is always nice, to be able to use old toolbox with new data. Luckily for us, plotting can be done using ggplot. Just make sure you use the new version from Github, and you can enjoy new geom_sf!
devtools::install_github('tidyverse/ggplot2')
Then everything becomes super easy.
ggplot() +
geom_sf(data = districts_wroclaw, aes(fill = inhbtnt), color='white') +
scale_fill_gradient(low = "white", high = "red")
Fill of districts represent number of inhabitants.
Some useful functions from sf
How to show distribution of spatial data? The following is based on a nice web tutorial. let’s say that our goal is to visualize the difference between where younger and older people live. We could calculate median and create choropleth, but that way we loose a significant amount of information.
We will use data from polish office for cartography.
wroclaw_demo <- sf::read_sf('../../static/data/dem-rejurb-rejstat-shp/REJURB_20171231.shp')
We don’t know where exactly each person lives and drawing 600K points makes no sense. Therefore, for each region, we shall draw randomly a number of points proportional to the number of inhabitants.
First, we preprocesss data to get number of inhabitants younger and older 44 (which is as close to median as we can get)
wroclaw_demo_splited <- wroclaw_demo %>%
mutate(below_44 = W0_2 +W3_6 +W7_12+ W13_15+ W16_18+ W19_24+ W25_34+ W35_44,
above_44 = SUMA - below_44) %>%
gather(group, value, below_44, above_44) %>%
split(.$group)
To generate points we need one more packages. Original code used purr imap, but it can be substituted by lapply. Result is somewhat more cumbersome, but in case you don’t know purrr, you can limit number of new things for the day.
install.packages(c('lwgeom'))
generate_samples <- function(data)
suppressMessages(st_sample(data, size = round(data$value / 100)))
points <- lapply(wroclaw_demo_splited, generate_samples)
points <- lapply(names(points), function(name){
st_sf(data_frame(age = rep(name, length(points[[name]]))),
geometry = points[[name]])
})
points <- do.call(rbind, points)
Once points are generated everything becomes easy.
More posts on how to create maps, including interactive, are coming! You can visit my workshop materials as well.