Lump Sum vs. Dollar Cost Averaging

Posted on Mon 28 November 2016 in Finance

Question¶

If you receive a $100,000 windfall, what are the tradoffs between investing those funds immeadiately versus dollar-cost averaging the investment over time?

Background¶

Dollar-cost averaging (DCA) and lump-sum investing (LSI) have long been debated. If you believe the market generally goes up, LSI allows you to gain exposure to the market as soon as possible. DCA on the otherhand, minimizes potential feelings of regret (i.e. you invest your lump sum right before a market crash). A Vanguard study concluded that "an LSI approach has outperformed a DCA approach approximately two-thirds of the time".

Experiment¶

Let's imagine we come into a $100,000 windfall on any day between the funds existence (Nov 13, 2000 to Nov 28, 2016). We have two strategies:

Lump-sum invest the $100,000 all at once on the date you receive the windfall.
Dollar-cost average the \$100,000 in 10 equal portions of \$10,000 every 30 days starting from the data of the windfall, for a total investment period of 300 days. I arbitrally chose this, because it felt good.

If I had to invest a windfall in one fund it would be the Vanguard Total Stock Market Index Fund (VTSAX), which seeks to mimic the U.S. Stock Market.

Assumptions¶

Ignoring value of dollar changing over time
Final investment value is based on the closing price of the last day (Nov 28, 2016)

Getting financial data¶

I'm importing data from Quandl, a great resource for financial and economic data that feeds right into Python and Pandas.

In [1]:

import quandl
vtsax = quandl.get("YAHOO/FUND_VTSAX")
print(vtsax.head())
print("")
print(vtsax.tail())

                 Open       High        Low      Close  Volume  Adjusted Close
Date                                                                          
2000-11-13  30.219999  30.219999  30.219999  30.219999     0.0       22.587535
2000-11-14  31.010000  31.010000  31.010000  31.010000     0.0       23.178010
2000-11-15  31.200001  31.200001  31.200001  31.200001     0.0       23.320024
2000-11-16  30.700001  30.700001  30.700001  30.700001     0.0       22.946305
2000-11-17  30.559999  30.559999  30.559999  30.559999     0.0       22.841663

                 Open       High        Low      Close  Volume  Adjusted Close
Date                                                                          
2016-11-28  55.419998  55.419998  55.419998  55.419998     0.0       55.419998
2016-11-29  55.490002  55.490002  55.490002  55.490002     0.0       55.490002
2016-11-30  55.360001  55.360001  55.360001  55.360001     0.0       55.360001
2016-12-01  55.130001  55.130001  55.130001  55.130001     0.0       55.130001
2016-12-02  55.160000  55.160000  55.160000  55.160000     0.0       55.160000

We'll be using Adjusted Close because it accounts for stocks splits, dividends and distribution. Let's plot the data using matplotlib.

In [2]:

%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
from matplotlib import style
style.use('fivethirtyeight')

vtsax['Adjusted Close'].plot(figsize=(20,10))
ax = plt.subplot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}'.format(x))) # y-axis in dollars
plt.title('VTSAX Historical Stock Price')
plt.ylabel('Stock Price ($)')

Out[2]:

As expected. The market generally goes up. We can see the bubble in October 2007 and the crash into 2009, followed by the raging bull market (with a couple of hiccups) since then.

Calculate Lump Sum¶

Let's say we invest our $100,000 windfall on anyone of those dates. How much would the investment be worth today (Nov 28, 2016)?

In [3]:

today_price = vtsax['Adjusted Close'][-1] # The final value of our stock (Nov 28, 2016)
initial_investment = 100000 # Our $100,000 windfall

num_stocks_bought = initial_investment / vtsax['Adjusted Close']
lumpsum = num_stocks_bought * today_price

lumpsum.plot(figsize=(20,10), color='r')
ax = plt.subplot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}'.format(x)))
plt.title("Lump Sum - Today's value of $100,000 invested on date")
plt.ylabel('Stock Price ($)')

Out[3]:

In the graph above, the x-axis represents the date made our $100,000 lump-sum investment and the y-axis is the value of that investment today.

In [4]:

print("If we timed our lump-sum perfectetly, $100,000 invested on {} would be worth ${:,.2f} today.".format(lumpsum.idxmax().strftime('%b %d, %Y'), lumpsum.max()))
print("")
print("Alternatively, if we invested our lump-sum at the peak of the 2007-2009 crash, $100,000 invested on {} would still be worth ${:,.2f} today.".format(lumpsum[:-(9*252)].idxmin().strftime('%b %d, %Y'), lumpsum[:-(9*252)].min()))

If we timed our lump-sum perfectetly, $100,000 invested on Oct 09, 2002 would be worth $406,177.34 today.

Alternatively, if we invested our lump-sum at the peak of the 2007-2009 crash, $100,000 invested on Oct 09, 2007 would still be worth $174,065.69 today.

Let's run the same experiment with dollar-cost averaging now. We will invest 10 equal poritons of \$10,000 (totalling our $100,000 windfall) every 30 days, for a total investment period of 30 days.

In [5]:

import pandas as pd

def doDCA(investment, start_date):
    # gets dates of 10 investment dates (spaced 30 days apart) beginning on the day we receive our windfall
    investment_dates_all = pd.date_range(start_date,periods=10,freq='30D')
    
    # removes dates beyond the most recent data available
    investment_dates = investment_dates_all[investment_dates_all < vtsax.index[-1]]
    
    # since not all dates will be a business day, this returns the closest buseiness day
    closest_investment_dates = vtsax.index.searchsorted(investment_dates)
    
    # $10,000 investement every 30 days
    portion = investment/10.0
    
    # gets the number of stocks purchased over the 300 day investment period
    stocks_invested = sum(portion / vtsax['Adjusted Close'][closest_investment_dates])
    
    # adds cash value in for cash that has not yet been invested (start dates <300 days from today)
    uninvested_dollars = portion * sum(investment_dates_all >= vtsax.index[-1])
    
    # value of stocks today
    total_value = today_price*stocks_invested + uninvested_dollars
    return total_value

# Generates DCA series for every initial investment date
dca = pd.Series(vtsax.index.map(lambda x: doDCA(initial_investment, x)), index=vtsax.index)

Below we plot the DCA data, where the x-axis is the date we started dollar-cost averaging (and continued for 300 days in 30 day incrments from the start date, and the y-axis represents the value of the investment today.

In [6]:

dca.plot(figsize=(20,10), color='b')
ax = plt.subplot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}'.format(x)))
plt.title('Dollar-Cost Average - Value of $10,000 invested on date')
plt.ylabel('Investment Value ($)')

Out[6]:

In [7]:

dca.plot(figsize=(20,10), color='b', label="DCA")
plt.plot(lumpsum, color='r', label="Lump-Sum")
plt.legend()
ax = plt.subplot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}'.format(x)))
plt.title('Lump-Sum vs Dollar-Cost Averaging')
plt.ylabel('Investment Value ($)')

Out[7]:

As you can see our DCA graph takes the same general shape as our lump-sum graph but is smoothed out. Any x-value in which the LSI (red) line is above the DCA (blue) value, LSI would have payed off.

Comparison of lump-sum vs. dollar-cost averaging¶

Both the LSI and DCA strategies have been tested for every day between Nov 13, 2000 and Nov 28, 2016.

Let's plot three graphs with a shared x-axis.

VTSAX historical closing price
A LSI and DCA comparison
Difference between LSI and DCA as diff = (lumpsum - dca)

In [8]:

# Difference between lump sum and DCA
diff = (lumpsum - dca)

fig, (ax1, ax2, ax3) = plt.subplots(3,1, sharex=True, figsize=(20,17))

# VTSAX Actual
vtsax['Adjusted Close'].plot(ax=ax1)
ax1.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}'.format(x))) # Y axis in dollars
ax1.set_xlabel('')
ax1.set_title('VTSAX Historical Stock Price')
ax1.set_ylabel('Stock Value ($)')

# Comparison
dca.plot(ax=ax2, color='b')
lumpsum.plot(ax=ax2, color='r')
ax2.axhline(initial_investment, linestyle="--", color="black")
ax2.text(vtsax.index[50],initial_investment*1.1, "Initial Investment")
ax2.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}K'.format(x*1e-3))) # Y axis $1,000s
ax2.legend(["DCA", "Lump-Sum"])
ax2.set_title('Comparison Lump Sum vs. Dollar Cost Averaging - Value of $100K invested on date')
ax2.set_ylabel('Investment Value ($)')

# Difference
ax3.fill_between(diff.index, y1=diff, y2=0, color='red', where=diff>0)
ax3.fill_between(diff.index, y1=diff, y2=0, color='blue', where=diff<0)

ax3.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: '${:,.0f}K'.format(x*1e-3))) # Y axis $1,000s
ax3.set_ylabel('Difference ($)')
ax3.set_title('Difference (Lump Sum - Dollar Cost Average)')
ax3.legend(["Lump-Sum > DCA", "DCA > Lump-Sum"]);

Let's first begin by checking on what percent of days lump-sum investing investing returns more the DCA.

In [9]:

print("Lump-sum returns more than DCA %.1f%% of all the days" % (100*sum(diff>0)/len(diff)))
print("DCA returns more than lump-sum %.1f%% of all the days" % (100*sum(diff<0)/len(diff)))

Lump-sum returns more than DCA 70.0% of all the days
DCA returns more than lump-sum 30.0% of all the days

So 70% of the time, a one-time lump-sum investment results in a higer final investment value! Nearly spot on with Vanguard's claim that lump-sum investing wins out two-thirds of the time.

But maybe that's not the whole story...maybe when DCA wins out it really wins out. Let's look at the average performance of lump-sum and DCA

In [10]:

print("Mean difference: Average dollar improvement lump-sum returns vs. DCA: ${:,.0f}".format(sum(diff) / len(diff)))
print("Mean difference when lump-sum > DCA: ${:,.0f}".format(sum(diff[diff>0]) / sum(diff>0)))
print("Mean difference when DCA > lump-sum: ${:,.0f}".format(sum(-diff[diff<0]) / sum(diff<0)))

Mean difference: Average dollar improvement lump-sum returns vs. DCA: $3,631
Mean difference when lump-sum > DCA: $13,818
Mean difference when DCA > lump-sum: $20,116

This is interesting. Remember we made \$100,000 investment. So for every possible day in the last 16 years, a lump-sum investment of $100K would have returned on average \$3,631 more than dollar cost averaging, or 3.63%. That's substantial!

However, when DCA was the better strategy than lump-sum (~30% of the time), it returned $20,116 more on average, or 20.1%, compared to 13.8% when lump-sum outperformed DCA.

Phrased antoher way, would you rather:

Have a 30% chance to earn an additional 20.1% on your initial investment, while having a 70% chance of foregoing an additional profit of 13.8% on your $100K investment.
Have a 70% chance to earn an additional 13.8% on your initial investment, while having a 30% chance of foregoing an additional profit of 20.1% on your \$100K investment.

Option #1 being our DCA strategy (10 investment periods spaced 30 days apart) with Option #2 being our lump-sum strategy. You can use this framework to simulate other investment strategies.

Other ways of looking at the results...to be continued¶

I hope to dive more into this in the future. The averages I presented above probably don't tell the whole story. Standard deviation is a consideration in evaluating risk.

In retrospect, when market crashes are imminent, DCA seems to pull ahead. We obiviously can't predict a market crash, but their are indicators when a crash is more likely (i.e. trailing P/E ratios, Purchasing Managers' Index (PMI)). What if we were to bring these variables into our analysis.

In [11]:

rel_area_lump_sum_wins = sum(diff[diff>0]) / sum(abs(diff))
print("A measure of risk to reward of lump sum would be to take the area above where the difference is greater than zero and divide it by the total area of the deviation from zero")
rel_area_lump_sum_wins

A measure of risk to reward of lump sum would be to take the area above where the difference is greater than zero and divide it by the total area of the deviation from zero

Out[11]:

0.61561750593338926