Jekyll2023-01-24T23:31:13+00:00https://www.stevans.org/feed.xmlFrom Galaxy Science to Data ScienceMatt Stevans's personal website.Matt Stevansmatt@stevans.orgQuantifying the value of reinvesting paid dividends2020-09-11T00:00:00+00:002020-09-11T00:00:00+00:00https://www.stevans.org/reinvesting-dividends<p>I’ve always been fascinated by finance and investing. This might have something to do with being raised by an accountant who would often preach the virtues of saving for retirement, but I’m not sure. My interest even led me to give a talk to my friends in the UT astronomy department titled <a href="https://github.com/OttoStruve/gsps/blob/gh-pages/slides/stevans_retirement.pdf">Do You Want to Buy A Yacht?: Saving for retirement 101</a> for fun.</p>
<p>Anyway, a few years ago I discovered “robo advisors” or computer applications that leverage automation and algorithms to manage your investments with increased efficiency. One feature that stood out to me was the ability to automatically reinvest earned dividends. Well, I was surprised to see that some roboadvisor brokerage sites like <a href="https://www.m1finance.com/">M1 Finances</a> did not display the historical performance of stocks or funds assuming dividends were automatically reinvested.</p>
<p>This made me wonder, how would these historical performance plots look if dividend reinvestment was included? What is the potential value of reinvesting dividends with a stock or market fund with an average dividend? And, how would reinvesting dividends affect the returns of other popular investments like bonds or funds of high dividend stocks? I aimed to answer these question in this blog post.</p>
<p>To answer these questions I first had to gain access to stock data. After a bit of searching, I found a service called <a href="https://www.tiingo.com/">Tiingo</a> that provides daily stock data via the python API <a href="https://pandas-datareader.readthedocs.io/en/latest/index.html">pandas-datareader</a>. <input type="checkbox" id="version" /><label for="version"><sup></sup></label><span><br /><br />Version 0.7.0<br /><br /></span></p>
<p class="notice--primary">If you want to see the code I wrote to wrangle, analyze, and visualize the data, check out <a href="https://github.com/stevans/reinvesting-dividends/blob/master/notebooks/1.0-mls-reinvesting-dividends_v2.ipynb"><strong>this Python Jupyter Notebook</strong></a> on my GitHub page.</p>
<p>With Tiingo (and after signing up for an account and API token) I pulled the date, stock price, and dividend payment data.</p>
<p>Then I did the arithmetic to simulate dividend reinvestment. I designated the dividends to be reinvested at the opening price on the day after they were paid, so for each dividend payment I calculated the relative price increase (or decrease) at the end of each day after a dividend was paid. This array of price changes was then multiplied by the value of the dividend paid to get the daily value of the paid dividend. I then summed the daily value of the dividends to get the daily total value of all dividends.</p>
<p>With this script I can access the daily value of an investment in any stock or Exchange Traded Fund (ETF; an ETF is a collection of stocks similar to a mutual fund) assuming different treatment of the dividends: 1. exclude dividends, 2. add the value of dividends without reinvesting them, and 3. add the value of reinvested dividends. I can now remake the performance plots for any investment. The following figure shows what a $100 investment in Vanguard’s all market ETF (ticker symbol: VTI) would be worth over time if invested on Jan 4, 2010 with the three previously mentioned treatments of dividends.</p>
<iframe title="Reinvesting dividends increases returns significantly" aria-label="Interactive line chart" id="datawrapper-chart-htBjx" src="https://datawrapper.dwcdn.net/htBjx/3/" scrolling="no" frameborder="0" style="background: #FFFFFF; width: 0; min-width: 100% !important; border: none;" height="400"></iframe>
<script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(a){if(void 0!==a.data["datawrapper-height"])for(var e in a.data["datawrapper-height"]){var t=document.getElementById("datawrapper-chart-"+e)||document.querySelector("iframe[src*='"+e+"']");t&&(t.style.height=a.data["datawrapper-height"][e]+"px")}}))}();
</script>
<!--- Added "background: #FFFFFF;" after style=" to secure a solid white background --->
<p>This figure shows that automatic dividend reinvesting increases total returns or gains. In fact, for this particular ETF, the total gains from dividend reinvesting is 20% larger than the total gains when dividends were excluded and 7% larger than if dividends were included but not reinvested assuming no inflation. So, roboadvisors are leaving a significant value of their service, automatic dividends, out of their stock performance plots. <input type="checkbox" id="think" /><label for="think"><sup></sup></label><span><br /><br />You might be thinking, but past dividends are not a guarantee of future dividends. And you would be correct, but it’s also true that past capital gains of stocks are not a gaurantee of future capital returns. So, as long as historical performance plots are clearly labeled, I don’t see an ethical problem with including dividends in historical performance plots. Better yet, these sites could add a toggle switch for the user to toggle between capital gains only and gains assuming reinvested dividends.<br /><br /></span></p>
<p>What is the value of reinvesting dividends for other types of popular investments?</p>
<p>Although VTI is a good representative dividend investment because it is an all market ETF and therefore has a dividend that is equivalent to the volume-weighted average dividend, there are many ETFs with different goals, risks, and underlying assets or stocks. To see the impact of automatic dividend reinvesting on funds from a range of asset classes, I simulated the historical returns for a diverse set of ETFs from Vanguard, the Vanguard Select ETFs. I included two of Vanguard’s dividend ETFs and a large-cap growth ETF for greater range of funds.</p>
<p>For a simplified comparison, I calculated the average annual total return (AATR) over a 5 year period ending Aug 31, 2020. I used <a href="https://personal.vanguard.com/us/glossary/a/GlossaryAverageAnnualTotalReturnContent.jsp">the definition of AATR</a> from Vanguard’s website. The following figure shows the 5-year AATR for the Vanguard ETFs broken down into three components: gains from capital only, gains from dividends only, and gains from reinvesting dividends. The ETFs are ordered by the AATR calculated excluding dividends.</p>
<iframe title="Reinvesting dividends increases annual returns of ETFs by about 0.3%" aria-label="Split Bars" id="datawrapper-chart-QXEci" src="https://datawrapper.dwcdn.net/QXEci/1/" scrolling="no" frameborder="0" style=" background: #FFFFFF; width: 0; min-width: 100% !important; border: none;" height="806"></iframe>
<script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(a){if(void 0!==a.data["datawrapper-height"])for(var e in a.data["datawrapper-height"]){var t=document.getElementById("datawrapper-chart-"+e)||document.querySelector("iframe[src*='"+e+"']");t&&(t.style.height=a.data["datawrapper-height"][e]+"px")}}))}();
</script>
<!--- Added "background: #FFFFFF;" after style=" to secure a solid white background --->
<p>The right column of this figure shows that the ETFs’ components of gains from reinvesting dividends range from 0.1% to 0.6%, with an average of 0.3%, and are small compared to their respective capital-only component and dividends-only component. <input type="checkbox" id="vug" /><label for="vug"><sup></sup></label><span><br /><br />Expect for the Growth ETF (VUG) which has a relatively high reinvesting dividends component (0.6%) and a relatively small dividend-only component (0.8%).<br /><br /></span> This is consistent with the rationale that the gains from reinvesting dividends are fundamentally a multiplication of the capital-only gains and the dividends-only gains (i.e., two fraction multiplied together produce a smaller fraction). So, reinvesting dividends increases gains by a small amount for all asset classes, which can lead to significant gains when compounded over many years as illustrated in the first. Coincidentally, the average benefits of dividend reinvesting (0.3% a year) is about equal to the typical yearly fee for roboadvisors listed on <a href="https://www.investopedia.com/best-robo-advisors-4693125">Investopedia</a> and <a href="https://www.nerdwallet.com/best/investing/robo-advisors">Nerdwallet</a>, so we may have stumbled upon the reason some roboadvisor sites don’t display performance plots including the gains from reinvesting dividends. <input type="checkbox" id="free" /><label for="free"><sup></sup></label><span><br /><br />Interestingly, M1 Finances, the site that inspired this post, does not charge a yearly fee for their basic account. However, they reinvest your dividends <a href="https://support.m1finance.com/hc/en-us/articles/360000641687-Dividends-in-your-M1-Portfolio">across your portfolio</a>, so unless your portfolio is one stock or fund, a fund performance plot with reinvested dividends included will still be inappropriate.<br /><br /></span> They don’t want to sell you on a benefit that you won’t actually received.</p>
<p>The figure also shows that the size of the reinvested-dividends component seems to be correlated more with the size of capital-only component than the size of dividend-only component. <input type="checkbox" id="cor" /><label for="cor"><sup></sup></label><span><br /><br />In fact, the reinvested-dividends component is strongly correlated with the capital-only component (with an r correlation coefficient of 0.89) and is moderately anti-correlated with the dividends-only component (with an r correlation coefficient of -0.55).<br /><br /></span> At first, I was a little surprised by this since at face value I thought funds with larger dividends would be more impacted by dividend reinvesting. But now it make sense to me given that (again) gains from reinvesting dividends are fundamentally a multiplication of the capital-only gains and the dividends-only gains, and given that the ETFs with the largest capital-only gains (~12%) have decent dividends-only gains (~1.5%) while the ETFs with largest dividends-only gains (~2-3%) have relatively small capital-only gains (~2-5%) (i.e., <code class="language-plaintext highlighter-rouge">0.12*0.015=0.0018</code> is greater than <code class="language-plaintext highlighter-rouge">0.025*0.04=0.001</code>). This means that people invested in growth or all market funds have more to gain by reinvesting dividends than those invested in bond or dividend funds.</p>
<p>Here are the main take aways from this post:</p>
<ol>
<li>Reinvesting the dividends of an all market ETF like VTI can grow your initial investment by 20% over ten years–7% more than if you kept your earned dividends in cash with no inflation.</li>
<li>People invested in a typical Vanguard ETF can grow their investment by an additional 0.3% annually by reinvesting their dividends.</li>
<li>If you use a roboadvisor, the average benefit of automatic dividend reinvesting will likely be cancelled out by the roboadvisor fee.</li>
<li>Reinvesting dividends is more profitable with more risky funds (i.e., growth or all market) over less risky funds (i.e., bonds or dividend stocks).</li>
</ol>Matt Stevansmatt@stevans.orgI’ve always been fascinated by finance and investing. This might have something to do with being raised by an accountant who would often preach the virtues of saving for retirement, but I’m not sure. My interest even led me to give a talk to my friends in the UT astronomy department titled Do You Want to Buy A Yacht?: Saving for retirement 101 for fun.Using regression to learn how Bernie Sanders’s voter coalition changed between 2016 and 2020 in Texas2020-09-03T00:00:00+00:002020-09-03T00:00:00+00:00https://www.stevans.org/sanders-tx-2020<p>In this post I present a multiple regression analysis on county-level demographics from the US Census with the aim of understanding what types of voters drove the change in Bernie Sanders’s primary vote share between 2016 and 2020 in Texas.</p>
<p>I find that the demographics of Texas at the county level are too correlated to permit a detailed and quantified understanding of how individual demographics relate to the change in Sanders’s vote share using multiple regression.</p>
<p>By building only simple models, I find Sanders’s Texas coalition lost non-Hispanic White voters and gained Hispanic voters between 2016 and 2020, which is consistent with trends in national opinion polling and exit polling.</p>
<p class="notice--primary">If you want to see the code I wrote to wrangle, analyze, and visualize the data, check out <a href="https://github.com/stevans/bernie-texas-2020/tree/master/notebooks"><strong>this pair of Python Jupyter Notebooks</strong></a> on my GitHub page.</p>
<h2 id="introduction">Introduction</h2>
<p>This project idea came to me one day while I was reading a <a href="https://fivethirtyeight.com/features/historic-turnout-in-2020-not-so-far/">FiveThirtyEight article</a> assessing whether the 2020 Democratic primaries were having record-breaking turnout. <input type="checkbox" id="intro" /><label for="intro"><sup></sup></label><span><br /><br />They weren’t.<br /><br /></span></p>
<p>In that article they mentioned performing a regression analysis on demographic data to understand the type of voter driving increases in voter turnout in the 15 states that had held primaries up to that point.</p>
<p>This got me thinking that the data must exist for me to do a similar regression analysis for my blog. This seemed like a great opportunity to further my understanding of regression modeling which I had encountered at a basic level many times in astronomy classrooms and to apply my skills in another domain that interests me—politics.</p>
<h3 id="sanders-vote-share-decreased">Sanders vote share decreased</h3>
<p>Bernies Sanders was a surprising success in the 2016 Democratic primaries finishing second with 13,210,550 million votes and winning 23 states’ primary elections. <input type="checkbox" id="wiki" /><label for="wiki"><sup></sup></label><span><br /><br />Stats from the Wikipedia article: <a href="https://en.wikipedia.org/wiki/Results_of_the_2016_Democratic_Party_presidential_primaries">Results of the 2016 Democratic Party presidential primaries</a>. <br /><br /></span></p>
<p>In the 2020 primary, he had a marked drop in vote share in national polling due, in part, to the crowded field of candidates. In Texas, Sanders went from earning 33.2% of all votes in 2016 to 29.9% in 2020. <input type="checkbox" id="raw" /><label for="raw"><sup></sup></label><span><br /><br />His raw votes went up from 476,547 to 626,339.<br /><br /></span></p>
<p>Exit polling by CNN showed Sanders’s vote share of individual demographics dropped in all categories except vote share of Hispanics. Most notably, Sanders’s vote share of Whites (presumably non-Hispanic) dropped from 41% to 29%, his vote share of Hispanics increased from 29% to 40%, and his vote share Blacks stayed at 15%. <input type="checkbox" id="cnn" /><label for="cnn"><sup></sup></label><span><br /><br />CNN exiting polls of Texas in <a href="https://www.cnn.com/election/2016/primaries/polls/tx/Dem">2016</a> and <a href="https://www.cnn.com/election/2020/primaries-caucuses/entrance-and-exit-polls/texas/democratic">2020</a>.<br /><br /></span></p>
<h3 id="using-regression-with-demographics">Using regression with demographics</h3>
<p>Regression analysis is a powerful tool for understanding the relation between properties of a population. Regressing on demographics has been used to understand their relationship to election outcomes in a variety of ways including <a href="https://fivethirtyeight.com/features/how-fivethirtyeight-2020-primary-model-works/">election forecasting</a>, <a href="https://projecteuclid.org/download/pdf_1/euclid.ss/1049993203">identifying irregularities in voting</a> in the pivotal state of Florida in the 2000 presidential election, and <a href="https://fivethirtyeight.com/features/historic-turnout-in-2020-not-so-far/">understanding what type of voter drove increases</a> in voter turnout in the 2020 Democratic primaries.</p>
<p>I debated with myself for a long time on the correct type of regression model to use for this analysis. While a negative binomial or Poisson regression analysis is appropriate because election data and most demographic data are fundamentally counting data, I have very little experience with these regression types. On the other hand, I’m very familiar with ordinary least squares (OLS) linear regression, which is the most popular so there’s tons of resources online, and it is the easiest to interpret. Plus, OLS can be used with counting data if they’re transformed to fractions or percentages of total, at the cost of introducing some correlation between related variables. Ultimately, I settled on using OLS because I knew that my analysis met at least some of the <a href="https://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/">seven classical OLS assumptions</a> and that the assumptions can assessed after a model was built by using diagnostics plots like residual plots. Put simply, as long as the observations of the error term in the linear model have constant variance and they are uncorrelated with each other and uncorrelated with the independent variables, then OLS can be used.</p>
<p>One of the perks of OLS is that it is easy to interpret. It allows you to understand quantitatively how the independent variables (in our case demographic variables) each relate to the dependent variable (e.g., the decrease in Sanders’s total vote share). The relationship comes straight from the model coefficients; a change of one unit in independent variable “A” leads to a change in the dependent variable equal to the coefficient of “A” with all other variables being held constant.</p>
<h2 id="the-data">The data</h2>
<p>To build a precise and interpretable model, independent variables must be chosen based on a <a href="https://statisticsbyjim.com/regression/model-specification-variable-selection/">strong theoretical understanding</a> of the system under investigation. In place of a full literature review on how demographics relate to election outcomes, I adopted the demographic variables used by the election experts at FiveThirtyEight for their <a href="https://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast/">2016 general</a> and <a href="https://fivethirtyeight.com/features/how-fivethirtyeight-2020-primary-model-works/">2020 primary</a> election forecast models. The demographic variables they use are related to race, education, income, an urban/rural measure (population density), religion, and liberal-conservative lean. Here is the list of variables I were able to collect that most closely match those used by FiveThirtyEight:</p>
<ol>
<li>White non-Hispanic</li>
<li>White Hispanic</li>
<li>Black</li>
<li>Other race</li>
<li>Fraction with a 4-yr degree</li>
<li>Mean per capita income</li>
<li>Population density</li>
<li>Black Protestant</li>
<li>Catholic</li>
<li>Evangelical</li>
<li>Jewish</li>
<li>Mainline</li>
<li>Mormon</li>
<li>Other religion</li>
<li>Unclaimed religion</li>
<li>Margin of victory of Democratic candidates in 2008-2016 presidential elections (a proxy for county partisan lean)</li>
</ol>
<p>The race, education, income, and population density variables were pulled from the U.S. Census Bureau. I used US Census data from the American Community Survey (ACS) 5-year estimates accessed via the easy-to-use and well-documented python package<a href="https://github.com/jtleider/censusdata">censusdata</a>. The past Presidential election results were pulled from <a href="https://github.com/Data4Democracy">Data for Democracy’s</a> <a href="https://data.world/data4democracy/election-transparency">data.world election transparency page</a>. The 2016 and 2020 primary election results were gathered from the <a href="https://www.sos.state.tx.us/elections/historical/elections-results-archive.shtml">Texas Secretary of State website</a>. The religious adherence data were pulled from the Association of Religious Data Archives, <a href="http://www.thearda.com/Archive/Files/Descriptions/RCMSCY10.asp">U.S. Religion Census, 2010</a>. The Census and elections data are of high quality and have no missing data. The religious data are from 2010 (the latest available) and was collected via self reporting by congregations. Some congregations draw adherents from neighboring counties leading to some counties having more religious adherents than total population. Adherents outside of the 236 groups included in the study are combined with non-religious people in the “Unclaimed religion” category. I used data at the county level because counties are the smallest unit of comprehensive coverage shared by all of our data sources and the level at which the state of Texas collects and reports election results.</p>
<p>A comment on training/test split:
Typically when building a model a test sample is left out to test the final model for over fitting. We did not leave out a test set because we wanted to train on as much of our small sample (254 counties) as possible, and we will not be using our model to make predictions. We assessed the over-fitting of our model with the predicted R^2 quantity which is similar to leave-one-out cross validation.</p>
<h2 id="building-a-model">Building a model</h2>
<h3 id="exploratory-plots-and-observations">Exploratory plots and observations</h3>
<p>Figure 1 plots the difference in Bernie Sanders’s vote share percentage in 2016 and 2020, our dependent variable, against a number of covariates.</p>
<figure class="align-center">
<img src="https://www.stevans.org/assets/images/posts/sanders-tx-2020/Figure-scatter.png" alt="Scatter Plots" />
<figcaption>Figure 1: The change in the Sanders vote share in Texas from 2016 to 2020 versus demographic covariates.</figcaption>
</figure>
<p>The plots show that the change in Sanders’s vote share is moderately anti-correlated with the percentage of White people; moderately correlated with the percentage of Hispanic people; weakly correlated with the percentage of Black people; and not obviously dependent on the percentage of people of other races; decreasing with the percentage of the population of a 4-year degree; decreasing with mean personal income; decreasing with the percentage of Evangelical adherents; increasing with the percentage of Catholic adherents; not obviously dependent on the percentage of Moron adherents, Black Protestant adherents, unclaimed adherents, and adherents of other religions; moderately correlated with the average margin of victory for the Democratic nominee in the past three presidential elections; and not obviously dependent on population density.</p>
<p>Out of curiosity, in the last two panels we plot the change in Sanders vote share versus Sanders’s vote share in 2016 and a list of (Gaussian-distributed) random numbers and see a strong anti-correlated in the former and (as expected) no apparent correlation in the latter.</p>
<p>The percentage of Mormon, Black Protestant and other religion adherents have values mostly below 5% with a large number of zeroes, which indicates these variables are unlikely to explain the change in Sanders’s vote share which are typically larger in magnitude (the mean magnitude change is 8.7%). In fact, the R^2 of a model built including these variables is only 0.009 greater than when they are excluded. Plots showing heterodasticity (e.g. Black, other race, Catholic, and population density) are transformed using the log function for the following anaylsis. A small percentage (<3.5%) of Black and Catholic values are zero and are assigned the value of 0.1 to permit computing the log.</p>
<!--- [Maybe include here (maybe in a footnote!) the correlation plots for the variables that you use a log transform on, after the log transform.] --->
<h3 id="feature-selection">Feature selection</h3>
<p>With OLS regression, feature selection is an iterative process. The aim is to select the smallest subset of variables that explains nearly as much of the variance in the dependent variable as when including all of the independent variables. Goodness of fit metrics like R^2 and Adjusted R^2 are often used as estimates of the variance explained by the model with their own advantages and disadvantages.</p>
<p>Another aim to feature selection when building a model for interpretation (instead of prediction) is to include features that have statistically significant model coefficients, which requires P-values that are smaller than a prescribed threshold (often 0.05).</p>
<p>To understand the maximum variance explained by our dataset, I started by building a model with all remaining independent variables. (I excluded percent_hispanic because it is strongly correlation with percent_white because strongly correlated independent variables break a fundament assumption of OLS.) By doing so I can get a rough idea of the most significant features based on coefficient size (and can compare them to the features with the strongest correlations in the correlation plots above). I used the <a href="https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html#statsmodels.regression.linear_model.OLS">OLS command</a> from the python package statsmodel for fitting which gave the following results:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gu"> OLS Regression Results
==================================================================================
</span>Dep. Variable: delta_percent_sanders R-squared: 0.465
Model: OLS Adj. R-squared: 0.442
Method: Least Squares F-statistic: 21.08
Date: Tue, 25 Aug 2020 Prob (F-statistic): 5.25e-28
Time: 10:31:07 Log-Likelihood: -870.23
No. Observations: 254 AIC: 1762.
Df Residuals: 243 BIC: 1801.
Df Model: 10
<span class="gu">Covariance Type: nonrobust
===============================================================================================
</span><span class="gh"> coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------------
</span>const 14.1786 5.792 2.448 0.015 2.769 25.588
percent_white_nh -0.2670 0.049 -5.480 0.000 -0.363 -0.171
percent_w4yrdeg -0.4486 0.164 -2.738 0.007 -0.771 -0.126
per_capita_income 6.833e-05 0.000 0.436 0.663 -0.000 0.000
percent_evangelical -0.0754 0.065 -1.165 0.245 -0.203 0.052
percent_unclaimed_religion 0.0066 0.058 0.115 0.909 -0.107 0.120
margin_of_victory_D_avg 0.0066 0.031 0.215 0.830 -0.054 0.067
log_percent_black 0.7573 0.442 1.715 0.088 -0.112 1.627
log_population_density 0.9993 0.441 2.267 0.024 0.131 1.868
log_percent_catholic -0.2365 0.642 -0.368 0.713 -1.501 1.028
<span class="gu">log_percent_other_race -2.2300 0.800 -2.786 0.006 -3.807 -0.653
==============================================================================
</span>Omnibus: 35.883 Durbin-Watson: 2.218
Prob(Omnibus): 0.000 Jarque-Bera (JB): 109.422
Skew: 0.571 Prob(JB): 1.73e-24
<span class="gu">Kurtosis: 6.006 Cond. No. 3.17e+05
==============================================================================
</span>
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.17e+05. This might indicate that there are
strong multicollinearity or other numerical problems.
</code></pre></div></div>
<p>The goodness-of-fit statistic is decent (r^2=0.465) and can be interpreted as the model explaining about 50% of the variance in the data. Many variable coefficients are insignificant, so not every variable is relevant to the change in Sanders vote share. Next I looked at a goodness-of-fit statistic for over-fitting (the predicted r^2) and the VIF statistic for multicollinearity (or correlation between independent variables):</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Predicted R^2: 0.38
VIF Factor features
0 36.121012 percent_white_nh
1 21.870679 percent_w4yrdeg
2 64.362556 per_capita_income
3 11.349248 percent_evangelical
4 10.458186 percent_unclaimed_religion
5 9.849860 margin_of_victory_D_avg
6 2.892990 log_percent_black
7 10.647302 log_population_density
8 6.394612 log_percent_catholic
9 12.464885 log_percent_other_race
</code></pre></div></div>
<p>The predicted r^2 value of 0.38 indicates a slight over-fitting problem. <input type="checkbox" id="predicted_r^2" /><label for="predicted_r^2"><sup></sup></label><span><br /><br />The smaller the predicted r^2 as compared to the regular r^2, the greater the problem of over-fitting.<br /><br /></span> All but one variable has a VIF value over 5 which indicate significant multicollinearity. This is a problem that I explore.</p>
<h3 id="the-problem-of-multicollinearity">The problem of multicollinearity</h3>
<p>The multicollinearity is a big problem for us with our goal of interpreting the coefficients of our model. Let’s visualize this multicollinearity another way to understand it better. Figure 2 is a correlation heat map of the variables in this model (plus Hispanic percentage) made with the <a href="https://seaborn.pydata.org/generated/seaborn.heatmap.html">seaborn.heatmap command</a>:</p>
<figure class="align-center">
<img src="https://www.stevans.org/assets/images/posts/sanders-tx-2020/Figure-heatmap.png" alt="Heatmap" />
<figcaption>Figure 2: Heatmap showing the correlation between all relevant demographic features and the dependent variable.</figcaption>
</figure>
<p>Our multicollinearity problem is almost completely illustrated in the top two rows of the heat map. The top row shows that the percentage of non-Hispanic Whites is at least moderately correlated with six other demographic features and the dependent variable (the change in Sanders vote share). Because the percentage of non-Hispanic Whites strongly anti-correlates with the percentage of Hispanics, the percentage of Hispanics also correlates with the six same demographic features. This means that the independent variables will change in unison and are not independent, which makes it difficult for the model to measure the effect on the dependent variable from each of the independent variables, holding the others constant. This results in less precise coefficient estimates and weaker statical power of the model.</p>
<p>It’s worth noting that only the correlation between variables of the same type (e.g., race, religion) could be attributed to the use of precentages instead of raw counts. The correlations between variables of difference types would still be present if we regressed on counts instead.</p>
<p>So what is happening here? Basically, most Texas counties are bifurcated along most demographics. For example, along race and ethnicity we see that most counties are comprised of mostly White and Hispanic people such that in counties with a large percentage of Whites, there is a low percentage of Hispanics, and in counties with a large percentage of Hispanics, there is a low percentage of Whites. This can be seen in Figure 3 which also includes the percentage of Black people and people who identify as another race:</p>
<figure class="align-center">
<img src="https://www.stevans.org/assets/images/posts/sanders-tx-2020/Figure-race-percent.png" alt="Race Percent" />
<figcaption>Figure 3: The racial makeup of Texas counties. Almost all counties are majority non-Hispanic White or Hispanic White.</figcaption>
</figure>
<p>With religion, not only are evangelicals and catholics often the largest or second largest religious groups in a typical Texas counties, but in Texas evangelical are predominately White and Catholics are predominately Hispanic. Regarding partisan lean in Texas counties, Hispanics overwhelming support Democratic candidates in presidential elections while Whites overwhelming vote republican. Weaker—but still significant—bifurcation exists in Texas counties due to economic and educational measures due to inequality, which also correlate with race.</p>
<p>This bifurcation of Texas creates a dearth of counties in parts of the demographic parameter space preventing the regression model to control for individual variables with precision. Instead the model essentially works with a sparse and redundant data set, weakening the statistical power of the model and weakening the significance of coefficients. With weakened significance of coefficients, we lose our ability to determine which variables are meaningful and to measure their effect on the change in Sander’s vote share.</p>
<p>There are a few ways to deal with multicollinearity. We could remove highly correlated variables (like we have already done with percentage_hispanic), we can combine dependent variables (although I don’t think this makes sense given our variables), or we can perform an analysis designed for highly correlated variables like Lasso regression or MANOVA (which would deserve its own blog post). <input type="checkbox" id="raw" /><label for="raw"><sup></sup></label><span><br /><br />Ideas from the Statistics by Jim article <a href="https://statisticsbyjim.com/regression/multicollinearity-in-regression-analysis/">Multicollinearity in Regression Analysis: Problems, Detection, and Solutions</a>.<br /><br /></span> For the sake of presenting a final model and interpretation, I explore the first option.</p>
<h3 id="a-simple-interpretable-model">A simple interpretable model</h3>
<p>Requiring the final model to include only statistically significant variables with VIFs below five, produced only a handful of simple models with at most three variables. Of these models, the one with the largest R^2 values included only race-related variables: the percentage of the population that is non-Hispanic White, Black, and “other race”, respectively. The regression results for this model were as follows:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gu"> OLS Regression Results
==================================================================================
</span>Dep. Variable: delta_percent_sanders R-squared: 0.419
Model: OLS Adj. R-squared: 0.412
Method: Least Squares F-statistic: 60.04
Date: Tue, 25 Aug 2020 Prob (F-statistic): 2.88e-29
Time: 11:06:30 Log-Likelihood: -880.64
No. Observations: 254 AIC: 1769.
Df Residuals: 250 BIC: 1783.
Df Model: 3
<span class="gu">Covariance Type: nonrobust
===========================================================================================
</span><span class="gh"> coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------------
</span>const 14.1797 2.402 5.904 0.000 9.449 18.910
percent_white_nh -0.3323 0.025 -13.254 0.000 -0.382 -0.283
log_percent_black 1.5020 0.339 4.426 0.000 0.834 2.170
<span class="gu">log_percent_other_race -2.8785 0.756 -3.808 0.000 -4.367 -1.390
==============================================================================
</span>Omnibus: 14.125 Durbin-Watson: 2.295
Prob(Omnibus): 0.001 Jarque-Bera (JB): 36.206
Skew: 0.066 Prob(JB): 1.37e-08
<span class="gu">Kurtosis: 4.845 Cond. No. 304.
==============================================================================
</span>
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Predicted R^2: 0.39
VIF Factor features
0 3.637555 percent_white_nh
1 1.613324 log_percent_black
2 3.742407 log_percent_other_race
</code></pre></div></div>
<p>As expected, using fewer variables decreases the R^2. This model accounts for about 42% of the variance in the change in Sanders’s vote share. The over-fitting is minimal as the predicted R^2 (0.39) is not much less than the R^2 (0.42). Because this model only includes race-related demographics the interpretation is very simplistic.</p>
<p>The constant coefficient is ~14, which means an increase in Sanders’s vote share of 14% between 2016 and 2020 is expected when a Texas county has zeros in the other race variables, or conversely, a Hispanic percentage of 100%. The percent_white_nh coefficient of -0.33 means that for every percentage the White percentage increases, the change in Sanders’s voter share decreases by 0.33 percentage points. The log of the Black percentage and Other race percentage have coefficients of 1.5 and -2.9, respectively. For these log transformed variables, the change in Sanders’s vote share will increase (decrease) by 1.5 (2.9) percentage points every time the Black percentage (Other race percentage) increases by a factor of ~2.72 <input type="checkbox" id="the value of _e_" /><label for="the value of _e_"><sup></sup></label><span><br /><br />the value of <em>e</em><br /><br /></span>, which is a rather insensitive relationships. The confidence intervals on all of the coefficients are fairly large, but they do not effect the general interpretation.</p>
<p>Simply put, the more non-Hispanic White a county is, the greater the loss of vote share for Sanders in 2020. Because the percentage of non-Hispanic Whites and Hispanics in the population are so strongly anti-correlated in Texas counties, Sanders vote share increased when this trend is extended to predominately Hispanic counties. This can be seen directly in the frist two panels of the correlation plots in Figure 1.</p>
<h3 id="limitations">Limitations</h3>
<p>One limitation of this study is that while the demographic data is descriptive of the entire population in each county, the dependent variable is derived from Democratic primary voters, a very non-representative sample of the overall population. Another limitation is the large range in the number votes cast in each county suggests that not every county should be given the same weight and a weighted regression may be more appropriate. Finally, it appears that demographics that are useful for reproducing national election results at the state-level (i.e., the demographics we adopted from FiveThirtyEight’s election forecasts) are not useful for building interpetable models at the county level in Texas.</p>
<h2 id="conclusions">Conclusions</h2>
<p>In this post I used multiple regression analysis on county-level demographical data to understand what types of voters drove the change in Bernie Sanders’s primary vote share between 2016 and 2020 in Texas.</p>
<p>I found that the demographics of Texas at the county level are too correlated to permit building a model with more than three statistically significant demographic variables and thus prohibits a detailed understanding of how individual demographics relate to the change in Sanders’s vote share.</p>
<p>More specifically, Texas counties are predominately White, non-Hispanic or predominately Hispanic and both of these groups appear to be homogenous across the state in terms of the other demographic and socioeconomic variables considered.</p>
<p>One of the few statistically significant models permitted by the dataset shows that Sanders’s Texas coalition shrunk by 0.33 percentage points for every percentage point increase in the White population. In other words, the more non-Hispanic White a county is, the greater the loss of vote share for Sanders in 2020, which is consistent exit polling.</p>Matt Stevansmatt@stevans.orgIn this post I present a multiple regression analysis on county-level demographics from the US Census with the aim of understanding what types of voters drove the change in Bernie Sanders’s primary vote share between 2016 and 2020 in Texas.Adding inline footnotes with automatic numbering in HTML and CSS2020-09-01T00:00:00+00:002020-09-01T00:00:00+00:00https://www.stevans.org/inline-footnotes<p class="notice--success"><strong>Updated!</strong> Now includes automatic numbering.</p>
<p class="notice">To jump to the section with the code, <strong>click <a href="#the-code">here</a></strong>.</p>
<!--- Minimal-mistakes has traditional footnotes. See here for example: https://mmistakes.github.io/minimal-mistakes/docs/layouts/#fnref:sidebar-menu
The inline code looks like this:
To create a sidebar menu[^sidebar-menu] similar to the one found in the theme's documentation pages you'll need to modify a `_data` file and some YAML Front Matter.
[^sidebar-menu]: Sidebar menu supports 1 level of nested links.--->
<p>I really enjoy reading the blog <a href="https://fivethirtyeight.com/">FiveThirtyEight</a> for the data-driven reporting on politics, sports, and life. It is also the first place I recall encountering inline footnotes some time back in 2017. I remember being way too impressed when I clicked on a footnote, the remaining article text lowered creating a blank space, and the footnote appeared out of thin air. Now that I have my own blog, I want to give my reads the same pleasurable experience. Since I don’t know JavaScript, which FiveThirtyEight uses for their inline footnotes, my goal is to use HTML and CSS. In this post, I share what I’ve found for those who want to do something similar.</p>
<h2 id="fivethirtyeights-inline-footnotes">FiveThirtyEight’s Inline Footnotes</h2>
<p>The FiveThirtyEight inline footnote feature allows a reader to view a footnote immediately after the footnote number by clicking on the footnote number. The reader can then hide the inline footnote by clicking on the “x” which appears in place of the clicked footnote number or continue scrolling through the text without hiding the footnote. Here is a GIF demonstarting how a reader can toggle a footnote in a FiveThirtyEight <a href="https://fivethirtyeight.com/features/what-does-an-0-7-start-tell-you-about-an-nfl-coach/">article</a>:</p>
<p><img src="https://www.stevans.org/assets/images/posts/inline-footnotes/538-inline-footnote.gif" alt="image-center" class="align-center" /></p>
<p>To me, this is an elegant way to include ancillary information or commentary for the interested reader without slowing down the efficient reader and distracting from the main idea of the article.</p>
<p>To replicate this on my own blog, I did what most people do–I Googled it. I found the basic HTML and CSS code I needed from the user Unrelated’s <a href="https://stackoverflow.com/questions/40336366/in-line-footnotes-with-only-html-css-in-notes/40391190#40391190">answer</a> to the question “In-line footnotes with only HTML/CSS (in-notes?)” on Stack Overflow. To mimic the style of the feature on FiveThirtyEight, I added some lines in the CSS to adjust the aesthetics. Here is a GIF of how it looks on my blog:</p>
<p><img src="https://www.stevans.org/assets/images/posts/inline-footnotes/my-blog-inline-footnote.gif" alt="image-center" class="align-center" /></p>
<p>Click on the following footnote to try it for yourself. <input type="checkbox" id="cb1" /><label for="cb1"><sup></sup></label><span><br /><br />This is the footnote text.<br /><br /></span> This sentence (and the rest of the article) will move a few lines down. You can click the footnote number again to hide the footnote.</p>
<h2 id="the-code">The Code</h2>
<p>Here is a template of the HTML code to insert within the text where you want the footnote number to appear:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><input</span> <span class="na">type=</span><span class="s">"checkbox"</span> <span class="na">id=</span><span class="s">"cb1"</span> <span class="nt">/><label</span> <span class="na">for=</span><span class="s">"cb1"</span><span class="nt">><sup></sup></label><span><br><br></span>This is the footnote text.<span class="nt"><br><br></span></span>
</code></pre></div></div>
<p class="notice--danger"><strong>Warning:</strong> Make sure the string after <code class="language-plaintext highlighter-rouge">id=</code> is unique AND matches the string after <code class="language-plaintext highlighter-rouge">for=</code>. Otherwise, clicking on any footnote link will do nothing or it will open (or close) the first occuring footnote at the position of the first occuring footnote.</p>
<p class="notice"><strong>Note:</strong> To get the padding or space infront of the footnote number, simply add a space before the input code.</p>
<p>And here is the CSS code with comments:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/<span class="ge">* This creates the counter *</span>/
body {
counter-reset: cb_counter_var;
}
/<span class="err">*</span> This increments the counter value and defines
the displayed content <span class="err">*</span>/
sup::after {
counter-increment: cb_counter_var;
content: counter(cb_counter_var);
}
/<span class="ge">* This initially hides the footnote (i.e. span) *</span>/
input[type=checkbox] ~ span {
display:none;
}
/<span class="err">*</span> This styles the footnote text which appears
when the label is clicked <span class="err">*</span>/
input[type=checkbox]:checked ~ span {
display:inline;
font-size: 85%;
font-family:$monospace;
color: mix(#000, $text-color, 30%);
cursor:default;
}
/<span class="ge">* This permanently hides the checkbox *</span>/
input[type=checkbox]{
display:none;
}
/<span class="err">*</span> This styles the footnote label to appear
like a hyperlink <span class="err">*</span>/
input[type=checkbox] ~ label {
display:inline;
cursor:pointer;
color:$link-color;
text-decoration:underline;
}
/<span class="err">*</span> This styles the footnote label when the mouse
hovers over it <span class="err">*</span>/
input[type=checkbox] ~ label:hover {
text-decoration:underline;
cursor:pointer;
color:red;
}
/<span class="ge">* This styles the footnote label after it is clicked *</span>/
input[type=checkbox]:checked ~ label {
color:red;
text-decoration:none;
}
</code></pre></div></div>
<p class="notice"><strong>Note:</strong> If you’re using the Jekyll theme called Minimal Mistakes like I am, you can learn how to update the style sheet <a href="https://mmistakes.github.io/minimal-mistakes/docs/stylesheets/">here</a> in the official Docs.</p>
<p>My next footnote-related adventure is to figure out how to get the footnote numbering sequence to update programmatically… <strong>Update: The CSS code above includes automatic numbering after I learned about <a href="https://www.w3schools.com/css/css_counters.asp">CSS counters</a>.</strong> Here is a second footnote number as a demostration of the autmoatic numbering. <input type="checkbox" id="cb2" /><label for="cb2"><sup></sup></label><span><br /><br />Footnotes are cool.<br /><br /></span></p>Matt Stevansmatt@stevans.orgUpdated! Now includes automatic numbering.Using US Census data to investigate the drop in youth vote share in 20202020-04-01T00:00:00+00:002020-04-01T00:00:00+00:00https://www.stevans.org/youth-vote<h2 id="summary">Summary:</h2>
<p>In this blog post we investigate how the youth vote share fell between the Democratic primaries in 2016 and 2020, while the number of youth voters actually grew. We use data from the US Census Bureau to show that the aging population distribution of the US is not enough to completely explain the above phenomenon, assuming voting rates were the same in 2020 as in 2016. Finally, we show the fraction of the population aged 18-29 will continue to decline in years to come.</p>
<h2 id="introduction">Introduction:</h2>
<p>After Super Tuesday, young voters were frequently scapegoated in the media for Bernie Sanders’s loses in Super Tuesday contests, like in the US Today <a href="https://www.usatoday.com/story/news/politics/elections/2020/03/04/super-tuesday-bernie-sanders-youth-votes-fell-short-compared-2016/4947795002/">article</a> entitled “Many young voters sat out Super Tuesday, contributing to Bernie Sanders’ losses.” They reasoned that because a lot of <a href="https://nymag.com/intelligencer/2020/02/this-one-chart-explains-why-young-voters-back-bernie-sanders.html">polling</a> conducted before Super Tuesday showed Sanders being strongly favored by younger voters and strongly disfavored by older voters, for Bernie to have lost, young people must not have made it to the polls. In this blog post, we investigate the accuracy of this claim.</p>
<p>At first glance, exit (and entrance) polling from CNN seems to support this notion. In the following table, I show the percentage of votes cast in Democratic primaries in states that voted on or before March 10th in both <a href="https://www.cnn.com/election/2016/primaries/polls/">2016</a> and <a href="https://www.cnn.com/election/2020/entrance-and-exit-polls/">2020</a> by voters aged 18 to 29 years old. We see youth voters made up a smaller percentage of all voters in 2020 voters than in 2016 in 13 of the 15 states.</p>
<iframe title="Youth voter share in Democratic Primaries has dropped in terms of percentages" aria-label="Table" id="datawrapper-chart-xoMeU" src="//datawrapper.dwcdn.net/xoMeU/2/" scrolling="no" frameborder="0" style="background: #FFFFFF; width: 0; min-width: 100% !important; border: none;" height="774"></iframe>
<script type="text/javascript">!function(){"use strict";window.addEventListener("message",function(a){if(void 0!==a.data["datawrapper-height"])for(var e in a.data["datawrapper-height"]){var t=document.getElementById("datawrapper-chart-"+e)||document.querySelector("iframe[src*='"+e+"']");t&&(t.style.height=a.data["datawrapper-height"][e]+"px")}})}();
</script>
<!--- Added "background: #FFFFFF;" after style=" to secure a solid white background --->
<!--- <figure class="">
<img src="/assets/images/posts/youth-vote/youth_voter_turnout_wtitle.png"
alt="Three column table showing youth voter turnout in 2016 and 2020 and the difference."><figcaption>
Voter Turnout for 18-29 year-olds in Democratic Presidential Primaries. Sources: <a href="https://www.cnn.com/election/2016/primaries/polls/">[1]</a> <a href="https://www.cnn.com/election/2020/entrance-and-exit-polls/">[2]</a>
</figcaption></figure>
--->
<!--- <a href="/assets/images/youth_vote_post/youth_voter_turnout_wtitle.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" width="200" src="/assets/images/youth_vote_post/youth_voter_turnout_wtitle.png" caption="Voter Turnout for 18-29 year-olds in Democratic Presidential Primaries. Sources: [[1]](https://www.cnn.com/election/2016/primaries/polls/) [[2]](https://www.cnn.com/election/2020/entrance-and-exit-polls/)"/></a> --->
<!--- <figure class="figure">
<img src="/assets/images/youth_vote_post/youth_voter_turnout_wtitle.png" style="margin-left: 5em; margin-right: 5em; max-width: 200px;"
alt="Three column table showing youth voter turnout in 2016 and 2020 and the difference." align="middle"/>
<figcaption>
<p>Voter Turnout for 18-29 year-olds in Democratic Presidential Primaries. Sources: <a href="https://www.cnn.com/election/2016/primaries/polls/">[1]</a> <a href="https://www.cnn.com/election/2020/entrance-and-exit-polls/">[2]</a></p>
</figcaption>
</figure> --->
<!--- While the margin of error of exit polls is typically 3-4%, averages are more robust. The average difference in youth voter turnout in these 15 states is -2.4%. Weighted by the number of 2020 voters in each state, the weighted average is even larger, -2.8%. --->
<p>At the same time, when you consider the total number of voters aged 18 to 29 who cast votes in these 15 contests, you see that more of them voted this year than 4 years ago. For example, the youth vote share in Texas went down 5%, but the total number of youth voters increased by about 25,000. How can both changes be true? Of course, it means the number of older voters increased more between 2016 and 2020 than did the number of youth voters over the same time period. So while more young voters voted in many states in 2020 than did in 2016, the share of all votes casts by youth votes declined because the increase in older voters was even larger.</p>
<p>Could this be explained simply by the fact that the US population is <a href="https://www.census.gov/newsroom/blogs/random-samplings/2016/06/americas-age-profile-told-through-population-pyramids.html">aging</a>, with the bulk of Baby Boomers entering retirement age? Or did older voters’ propensity to vote increase, too. I also wondered how projected demographic changes would affect electorate compositions in future elections.</p>
<h2 id="the-data">The Data:</h2>
<p>To satify these curiosities, I turned to the Census Bureau. They estimate the population siz of the US (and have done so since the Bureau was founded in 1902) and publish their estimates <a href="https://www.census.gov/">online</a>. I was able to scrape together the US population broken down by single year of age since the year 1900. Given that the data was collected at various times over more than a century, the data isn’t perfectly homogenous in structure. For example, the only available data from 1900-1929 exclude members of the Armed Forces stationed overseas and the population residing in Alaska and Hawaii, the 1930-1959 numbers include Armed Forces overseas but exclude the population residing in Alaska and Hawaii, and the 2010s data has the most granular demographic breakdowns of the population of all decades (down to metropolitan division scales). I decided to use the data that include member of the Armed Forces stationed overseas in this analysis when available because, in principle, the Armed Forces could vote (although, in practice, I suspect it was difficult to impossible at the height of large wars). Methods for accessing the data vary, too; from excel spreadsheets on a webpage, like for <a href="https://www.census.gov/data/tables/time-series/demo/popest/pre-1980-national.html">1900-1979</a>, to an array of Census Population Estimates <a href="https://www.census.gov/data/developers/data-sets/popest-popproj/popest.html">APIs</a> for years 1990 to 2019.</p>
<p class="notice--primary"><strong>Check it out!</strong> The python code I used to corral and clean the data can be found in a Python Jupyter Notebook on my GitHub page <a href="https://github.com/stevans/youth-voters-census-data/blob/master/notebooks/Exploring_census_data.ipynb">here</a>.</p>
<p>With the US population per single year of age for every year from 1900 to 2020 (and projected out to 2060), I was able to calculate and plot the fraction of the population that falls in the category of youth voter (aged 18-29) since the beginning of the 20th Century.</p>
<figure class="">
<img src="/assets/images/posts/youth-vote/youth_fraction.png" alt="Plot of the fraction of the US population aged 18-29 from 1900-2060." /><figcaption>
Fraction of the US population aged 18-29 years old.
</figcaption></figure>
<p>To first order, the most obvious feature of this plot is the general trend downwards from 37% in 1900 to 21% in 2020. The next most significant feature is the large bump from about 1965 to 1995, when the Baby Boomer generation aged into and then out of the 18-29 age range. Also noticeable in the figure is a relatively small dip around 1918, when more than 4 million (mostly young) service people were overseas fighting in WWI. This is the only wartime period associated with such a dip in the figure because the census population worksheets I found include the population of Armed Forces overseas after 1940.</p>
<p>Since 1972, when the major parties officially tied their convention delegates to the outcomes of state primaries, the fraction of the population comprised of 18-29 year-olds has decreased from 29% to 21%. This means that young people were outnumbered by all other voters ~2:1 in 1970 and are outnumbered by ~4:1 today.</p>
<h2 id="can-the-aging-us-population-be-the-sole-explanation-for-the-drop-in-youth-vote-share-between-2016-and-2020">Can the aging US population be the sole explanation for the drop in youth vote share between 2016 and 2020?</h2>
<p>The short answer is, “No.” I found the population aged 18-29 grew by ~266,000 from March 1, 2016 to Mar 3, 2020 (by interpolating the yearly Census data) and the 30-and-over population has grown by ~8,774,000 in the same timespan. In terms of the youth fraction of the population, this results in an decrease of ~0.7% from 21.6% on Super Tuesday in 2016 (March 1) to 20.9% in on Super Tuesday in 2022 (March 3). Furthermore, if all voter demographic groups voted at the same rates as in 2016 (historical voting rate estimates from the Census Bureau were found <a href="https://www.census.gov/data/tables/time-series/demo/voting-and-registration/voting-historical-time-series.html">here</a>), the youth voter fraction would appear to decrease by only ~0.5% between 2016 and 2020 simply due to changes in the population age distribution. Since this is less than the 2.8% drop in the median youth vote share seen in 2020 exit polling compared to 2016, something in addition to age distribution change is needed to fully explain how the youth vote share could decrease in terms of percentage while increase in total number. As mentioned earlier, the simple explanation is that voters over 30 voted at a higher rate than they had four years ago.</p>
<h2 id="how-will-projected-demographic-changes-affect-electorate-compositions-in-future-elections">How will projected demographic changes affect electorate compositions in future elections?</h2>
<p>Looking at the youth population fraction plot at years after 2020, we see a steady decline to about 17% in 2060. That’s 4.6% less than it is today. This suggests that all future presidential campaign relying on voters aged 18-25 will have an even taller task than Bernie had in 2020.</p>Matt Stevansmatt@stevans.orgSummary: In this blog post we investigate how the youth vote share fell between the Democratic primaries in 2016 and 2020, while the number of youth voters actually grew. We use data from the US Census Bureau to show that the aging population distribution of the US is not enough to completely explain the above phenomenon, assuming voting rates were the same in 2020 as in 2016. Finally, we show the fraction of the population aged 18-29 will continue to decline in years to come.Does the wavelength of a photon change during its travel through intergalactic space?2020-03-11T00:00:00+00:002020-03-11T00:00:00+00:00https://www.stevans.org/energy-of-photons<p class="notice">This is a post I wrote back in 2015 for <strong><a href="http://www.askanastronomer.org">AskAnAstronomer.org</a></strong>, a website where astrnonomers answer user-submitted space-related questions.</p>
<p>Here is George’s question:</p>
<blockquote>
<p>“When I took physics in college, I seem to remember a formula that related the wavelength of light to its energy. Could the wavelength of a photon of light be altered by a loss of energy, however small, over its path through intergalactic space?</p>
</blockquote>
<blockquote>
<p>“If that were true, looking at a trip of millions / billions of light years… wouldn’t that have some effect on our measurements of redshift, distances, and the expansion rate of the universe?”</p>
</blockquote>
<p>The answer to your first question is yes, for all intents and purposes, photons traveling through intergalactic space could, and in fact do, lose energy due to the <a href="https://en.wikipedia.org/wiki/Metric_expansion_of_space">expansion of spacetime</a> in the Universe. As the photons lose energy their wavelengths become longer. As you recall, there is a formula that relates a photon’s energy to its wavelength: E = h*c/lambda, or in words, the energy of photon equals Planck’s constant times the speed of light divided by the photon’s wavelength. <a href="https://en.wikipedia.org/wiki/Photon_energy">Click here</a> for a more detailed discussion of this equation.</p>
<p>As for your second question (Does this effect impact our measurements of redshifts, distances, and the expansion of the universe?), in principle, yes it has an impact, but in practice, even if it’s not taken into account, the impact is very small.</p>
<p>In astronomy, <a href="https://en.wikipedia.org/wiki/Redshift">redshift</a> measurements are actually the measurement of the energy lost by photons emitted from a distant source due to the expansion of the universe. Astronomers measure this loss of energy by, for example, measuring the shift in the wavelength of spectral lines in galaxy spectra. They are called “redshifts” because the spectral lines of distant galaxies are always shifted to longer wavelengths or to the “red”.</p>
<p>As for measurements of distances, astronomers use many <a href="https://en.wikipedia.org/wiki/Cosmic_distance_ladder">techniques</a> to measure distances which are affected by photon “energy loss” to a varying degree. Some methods like the <a href="https://en.wikipedia.org/wiki/Parallax">parallax technique</a> do not depend on the energy of photons at all and other methods depend on the energy of photons to a large degree like using <a href="https://en.wikipedia.org/wiki/Cepheid_variable">Cepheid variable stars</a> and their period-luminosity relation or using the <a href="https://en.wikipedia.org/wiki/Cosmic_distance_ladder#Type_Ia_light_curves">light curves of Type Ia supernovae</a>. But even for the latter methods the effect of photon “energy-loss” on the distance measurement is about one percent or less, while other factors affect the distance measurement to a larger extent. These other factors include: light contamination (or blending) from other stars, the varying extinction (or dimming) effect of <a href="https://en.wikipedia.org/wiki/Cosmic_distance_ladder#Classical_Cepheids">dust near the Cepheid</a>, and/or the uncertainty in our models of how Cepheid variable stars and supernovae work. These effects result in distance measurements that are uncertain by 5-17%!</p>
<p>So, in conclusion, photons can lose energy while traveling through space, astronomers measure this loss of energy by measuring redshifts, and our measurements of distance (and therefore <a href="https://en.wikipedia.org/wiki/Hubble%27s_law">the expansion of the universe</a>) are impacted by this effect but only slightly and if it’s not taken into account by the astronomers doing the measuring.</p>
<p>Regards,</p>
<p>Matt Stevans<br />
<em>UT Austin</em></p>Matt StevansThis is a post I wrote back in 2015 for AskAnAstronomer.org, a website where astrnonomers answer user-submitted space-related questions.