This post will be a simple analysis on electoral data from Brazil. I’ll explore some points at the data and draw beautiful plots using ggplot package. I hope you enjoy the potential of R to make informative and clean graphs.
A brief explain on the packages:
dplyr and data.table are the undispensible tools for every analysis.
ggplot2, scales and extrafont are the best tools for static plots.
Rcurl is good workaround package to solve download issues.
Before going further, let’s get a single file and have a overall look on it. The candidates file of electoral race can be downloaded from the Superior Electoral Court website.
The system R function calls a bash command. Let’s use them to have a look on the files. The system function have a different way of outputting the results. This is a problem that can easily be overcome by a little hacking.
The, likely, biggest file have only 46 Mb.
The file does not have a header, the NULL entries are “#NULO#” and the separator is ”;”. The year states files come with a README pdf. There you can see the header.
The next step is do it in a loop so I can have candidates files candidates file for the 2004 year until 2016. As the files comes separeted by states, we have to bind them into one. It can be done in R, but it’s easier and faster in bash using cat.
Fortunately I found a Github repository that have mapped the header - Github silvadenisson. I saved the naming part of the file in my Github. Know, let’s use it.
Downloading and transforming the raw data
I’ll select the variables listed below to continue my analysis:
ANO_ELEICAO: Election year
DESCRICAO_CARGO: Description of the position that the candidate runs for.
At first I want to see the percentage of women that canditates in politics in Brazil. So, let’s use dplyr to manipulte the data.
ANO_ELEICAO
fem_percent
xtralabs
2004
0.2125
Lowest:
2008
0.2129
2012
0.3152
2016
0.3188
Highest:
Analysis - drawing a portrait of electoral candidates
Before you run the next command I advise you install de extrafont package, indispensable if you use Linux. The font you use in a plot absolutely changes the view of it. This package helps you administrates your fonts and even install new.
The situation is getting better, but is far from being considered good. There is still a gap of 18.1% to the ideal cenary of half of candidates.