# An Introduction to Stock Market Data Analysis with Python

This post is the first in a two-section arrangement on stock information investigation utilizing Python, in view of a talk I gave regarding the matter for MATH 3900 (Data Science) at the University of Utah. In these posts, I will examine nuts and bolts, for example, acquiring the information from Yahoo! Money using pandas, imagining stock information, moving midpoints, building up a moving-normal hybrid technique, backtesting, and benchmarking. The last post will incorporate practice issues. This first post talks about themes up to presenting moving midpoints.

NOTE: The information in this post is of an overall sort containing information and assessments from the creator’s point of view. None of the substance of this post ought to be viewed as money related exhortation. Moreover, any code composed here is given with no type of assurance. People who decide to utilize it do as such at their own hazard.

## Introduction

Propelled arithmetic and insights has been available in money for quite a while. Preceding the 1980s, banking and fund were notable for being “exhausting”; speculation banking was particular from business banking and the essential job of the business was taking care of “straightforward” (at any rate in contrast with today) money related instruments, for example, advances. Deregulation under the Reagan organization, combined with an inundation of scientific ability, changed the business from the “exhausting” business of banking to what it is today, and from that point forward, money has joined different sciences as an inspiration for numerical examination and headway. For instance one of the greatest late accomplishments of arithmetic was the determination of the Black-Scholes equation, which encouraged the evaluating of investment opportunities (an agreement giving the holder the option to buy or offer a stock at a specific cost to the guarantor of the alternative). That said, bad measurable models, including the Black-Scholes equation, hold some portion of the fault for the 2008 money related emergency.

Lately, software engineering has joined propelled arithmetic in altering account and trading, the act of purchasing and selling of money related resources to make a benefit. Lately, exchanging has gotten commanded by PCs; calculations are answerable for settling on quick split-second exchanging choices quicker than people could make (so rapidly, the speed at which light ventures is a restriction when planning frameworks). Additionally, machine learning and information mining methods are developing in popularity in the money related part, and likely will keep on doing as such. Indeed, a huge piece of algorithmic exchanging is high-recurrence exchanging (HFT). While calculations may beat people, the innovation is still new and playing in a broadly tempestuous, high-stakes field. HFT was liable for marvels, for example, the 2010 streak crash and a 2013 streak crash prompted by a hacked Associated Press tweet about an assault on the White House.

This talk, in any case, won’t be about how to crash the securities exchange with terrible numerical models or exchanging calculations. Rather, I mean to furnish you with fundamental instruments for taking care of and breaking down securities exchange information with Python. I will likewise examine moving midpoints, how to build exchanging techniques utilizing moving midpoints, how to plan leave methodologies after entering a position, and how to assess a system with backtesting.

DISCLAIMER: THIS IS NOT FINANCIAL ADVICE!!! Furthermore, I have ZERO understanding as a merchant (a great deal of this information originates from a one-semester seminar on stock exchanging I took at Salt Lake Community College)! This is absolutely early on information, insufficient to get by exchanging stocks. Individuals can and do lose cash exchanging stocks, and you do as such at your own hazard!

## Getting and Visualizing Stock Data

### Getting Data from Yahoo! Finance with pandas

Before we play with stock information, we have to get it in some useful arrangement. Stock information can be gotten from Yahoo! Finance, Google Finance, or various different sources, and the pandas package gives simple access to Yahoo! Finance and Google Finance information, alongside different sources. In this talk, we will get our information from Yahoo! Finance.

The accompanying code shows how to make straightforwardly a DataFrame object containing stock data. (You can peruse more about distant information access here.)

 1 2 3 4 5 6 7 8 9 10 11 12 13 `import` `pandas as pd` `import` `pandas.io.data as web   ``# Package and modules for importing data; this code may change depending on pandas version` `import` `datetime`   `# We will look at stock prices over the past year, starting at January 1, 2016` `start ``=` `datetime.datetime(``2016``,``1``,``1``)` `end ``=` `datetime.date.today()`   `# Let's get Apple stock data; Apple's ticker symbol is AAPL` `# First argument is the series we want, second is the source ("yahoo" for Yahoo! Finance), third is the start date, fourth is the end date` `apple ``=` `web.DataReader(``"AAPL"``, ``"yahoo"``, start, end)`   `type``(apple)`
``````C:Anaconda3libsite-packagespandasiodata.py:35: FutureWarning:The pandas.io.data module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.FutureWarning)pandas.core.frame.DataFrame
``````
 1 `apple.head()`
Open High Low Close Volume Adj Close
Date
2016-01-04 102.610001 105.370003 102.000000 105.349998 67649400 103.586180
2016-01-05 105.750000 105.849998 102.410004 102.709999 55791000 100.990380
2016-01-06 100.559998 102.370003 99.870003 100.699997 68457400 99.014030
2016-01-07 98.680000 100.129997 96.430000 96.449997 81094400 94.835186
2016-01-08 98.550003 99.110001 96.760002 96.959999 70798000 95.336649

How about we quickly examine this. Open is the cost of the stock toward the beginning of the exchanging day (it need not be the end cost of the past exchanging day), high is the most significant expense of the stock on that exchanging day, low the least cost of the stock on that exchanging day, and closethe cost of the stock at shutting time. Volume indicates what number of stocks were traded. Adjusted close is the end cost of the stock that changes the cost of the stock for corporate activities. While stock costs are viewed as set generally by traders, stock splits (when the organization makes each surviving stock worth two and parts the value) and dividends (payout of organization benefits per share) additionally influence the cost of a stock and ought to be represented.

### Visualizing Stock Data

Since we have stock information we might want to visualize it. I initially show how to do so utilizing the matplotlib package. Notice that the apple DataFrame object has an accommodation method, plot(), which makes making plots simpler.

 1 2 3 4 5 6 7 8 `import` `matplotlib.pyplot as plt   ``# Import matplotlib` `# This line is necessary for the plot to appear in a Jupyter notebook` `%``matplotlib inline` `# Control the default size of figures in this Jupyter notebook` `%``pylab inline` `pylab.rcParams[``'figure.figsize'``] ``=` `(``15``, ``9``)   ``# Change the size of plots`   `apple[``"Adj Close"``].plot(grid ``=` `True``) ``# Plot the adjusted closing price of AAPL`
``````Populating the interactive namespace from numpy and matplotlib
``````

A linechart is fine, however there are at any rate four factors required for each date (open, high, low, and close), and we might want to have some visual method to see every one of the four factors that doesn’t require plotting four separate lines. Money related information is regularly plotted with a Japanese candle plot, so named in light of the fact that it was first made by eighteenth century Japanese rice dealers. Such an outline can be made with matplotlib, however it requires significant exertion.

I have made a capacity you are free to use to all the more effectively make candle graphs from pandas data casings, and use it to plot our stock information. (Code is based off this model, and you can peruse the documentation for the capacities involved here.)

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 `from` `matplotlib.dates ``import` `DateFormatter, WeekdayLocator,` `    ``DayLocator, MONDAY` `from` `matplotlib.finance ``import` `candlestick_ohlc`   `def` `pandas_candlestick_ohlc(dat, stick ``=` `"day"``, otherseries ``=` `None``):` `    ``"""` `    ``:param dat: pandas DataFrame object with datetime64 index, and float columns "Open", "High", "Low", and "Close", likely created via DataReader from "yahoo"` `    ``:param stick: A string or number indicating the period of time covered by a single candlestick. Valid string inputs include "day", "week", "month", and "year", ("day" default), and any numeric input indicates the number of trading days included in a period` `    ``:param otherseries: An iterable that will be coerced into a list, containing the columns of dat that hold other series to be plotted as lines`   `    ``This will show a Japanese candlestick plot for stock data stored in dat, also plotting other series if passed.` `    ``"""` `    ``mondays ``=` `WeekdayLocator(MONDAY)        ``# major ticks on the mondays` `    ``alldays ``=` `DayLocator()              ``# minor ticks on the days` `    ``dayFormatter ``=` `DateFormatter(``'%d'``)      ``# e.g., 12`   `    ``# Create a new DataFrame which includes OHLC data for each period specified by stick input` `    ``transdat ``=` `dat.loc[:,[``"Open"``, ``"High"``, ``"Low"``, ``"Close"``]]` `    ``if` `(``type``(stick) ``=``=` `str``):` `        ``if` `stick ``=``=` `"day"``:` `            ``plotdat ``=` `transdat` `            ``stick ``=` `1` `# Used for plotting` `        ``elif` `stick ``in` `[``"week"``, ``"month"``, ``"year"``]:` `            ``if` `stick ``=``=` `"week"``:` `                ``transdat[``"week"``] ``=` `pd.to_datetime(transdat.index).``map``(``lambda` `x: x.isocalendar()[``1``]) ``# Identify weeks` `            ``elif` `stick ``=``=` `"month"``:` `                ``transdat[``"month"``] ``=` `pd.to_datetime(transdat.index).``map``(``lambda` `x: x.month) ``# Identify months` `            ``transdat[``"year"``] ``=` `pd.to_datetime(transdat.index).``map``(``lambda` `x: x.isocalendar()[``0``]) ``# Identify years` `            ``grouped ``=` `transdat.groupby(``list``(``set``([``"year"``,stick]))) ``# Group by year and other appropriate variable` `            ``plotdat ``=` `pd.DataFrame({``"Open"``: [], ``"High"``: [], ``"Low"``: [], ``"Close"``: []}) ``# Create empty data frame containing what will be plotted` `            ``for` `name, group ``in` `grouped:` `                ``plotdat ``=` `plotdat.append(pd.DataFrame({``"Open"``: group.iloc[``0``,``0``],` `                                            ``"High"``: ``max``(group.High),` `                                            ``"Low"``: ``min``(group.Low),` `                                            ``"Close"``: group.iloc[``-``1``,``3``]},` `                                           ``index ``=` `[group.index[``0``]]))` `            ``if` `stick ``=``=` `"week"``: stick ``=` `5` `            ``elif` `stick ``=``=` `"month"``: stick ``=` `30` `            ``elif` `stick ``=``=` `"year"``: stick ``=` `365`   `    ``elif` `(``type``(stick) ``=``=` `int` `and` `stick >``=` `1``):` `        ``transdat[``"stick"``] ``=` `[np.floor(i ``/` `stick) ``for` `i ``in` `range``(``len``(transdat.index))]` `        ``grouped ``=` `transdat.groupby(``"stick"``)` `        ``plotdat ``=` `pd.DataFrame({``"Open"``: [], ``"High"``: [], ``"Low"``: [], ``"Close"``: []}) ``# Create empty data frame containing what will be plotted` `        ``for` `name, group ``in` `grouped:` `            ``plotdat ``=` `plotdat.append(pd.DataFrame({``"Open"``: group.iloc[``0``,``0``],` `                                        ``"High"``: ``max``(group.High),` `                                        ``"Low"``: ``min``(group.Low),` `                                        ``"Close"``: group.iloc[``-``1``,``3``]},` `                                       ``index ``=` `[group.index[``0``]]))`   `    ``else``:` `        ``raise` `ValueError(``'Valid inputs to argument "stick" include the strings "day", "week", "month", "year", or a positive integer'``)`     `    ``# Set plot parameters, including the axis object ax used for plotting` `    ``fig, ax ``=` `plt.subplots()` `    ``fig.subplots_adjust(bottom``=``0.2``)` `    ``if` `plotdat.index[``-``1``] ``-` `plotdat.index[``0``] < pd.Timedelta(``'730 days'``):` `        ``weekFormatter ``=` `DateFormatter(``'%b %d'``)  ``# e.g., Jan 12` `        ``ax.xaxis.set_major_locator(mondays)` `        ``ax.xaxis.set_minor_locator(alldays)` `    ``else``:` `        ``weekFormatter ``=` `DateFormatter(``'%b %d, %Y'``)` `    ``ax.xaxis.set_major_formatter(weekFormatter)`   `    ``ax.grid(``True``)`   `    ``# Create the candelstick chart` `    ``candlestick_ohlc(ax, ``list``(``zip``(``list``(date2num(plotdat.index.tolist())), plotdat[``"Open"``].tolist(), plotdat[``"High"``].tolist(),` `                      ``plotdat[``"Low"``].tolist(), plotdat[``"Close"``].tolist())),` `                      ``colorup ``=` `"black"``, colordown ``=` `"red"``, width ``=` `stick ``*` `.``4``)`   `    ``# Plot other series (such as moving averages) as lines` `    ``if` `otherseries !``=` `None``:` `        ``if` `type``(otherseries) !``=` `list``:` `            ``otherseries ``=` `[otherseries]` `        ``dat.loc[:,otherseries].plot(ax ``=` `ax, lw ``=` `1.3``, grid ``=` `True``)`   `    ``ax.xaxis_date()` `    ``ax.autoscale_view()` `    ``plt.setp(plt.gca().get_xticklabels(), rotation``=``45``, horizontalalignment``=``'right'``)`   `    ``plt.show()`   `pandas_candlestick_ohlc(apple)`

With a candle diagram, a dark candle demonstrates a day where the end cost was higher than the open (an increase), while a red candle shows a day where the open was higher than the nearby (a misfortune). The wicks demonstrate the high and the low, and the body the open and close (shade is utilized to figure out which end of the body is the open and which the nearby). Candle outlines are well known in money and a few techniques in technical analysisuse them to settle on exchanging choices, contingent upon the shape, shading, and position of the candles. I won’t spread such methodologies today.

We may wish to plot different money related instruments together; we might need to think about stocks, contrast them with the market, or take a gander at different protections such as exchange-exchanged assets (ETFs). Afterward, we will likewise need to perceive how to plot a money related instrument against some pointer, similar to a moving normal. For this you would prefer to utilize a line diagram than a candle outline. (How might you plot various candle outlines on head of each other without jumbling the graph?)

Beneath, I get stock information for some other tech organizations and plot their balanced near one another.

 1 2 3 4 5 6 7 8 9 `microsoft ``=` `web.DataReader(``"MSFT"``, ``"yahoo"``, start, end)` `google ``=` `web.DataReader(``"GOOG"``, ``"yahoo"``, start, end)`   `# Below I create a DataFrame consisting of the adjusted closing price of these stocks, first by making a list of these objects and using the join method` `stocks ``=` `pd.DataFrame({``"AAPL"``: apple[``"Adj Close"``],` `                      ``"MSFT"``: microsoft[``"Adj Close"``],` `                      ``"GOOG"``: google[``"Adj Close"``]})`   `stocks.head()`
AAPL GOOG MSFT
Date
2016-01-04 103.586180 741.840027 53.696756
2016-01-05 100.990380 742.580017 53.941723
2016-01-06 99.014030 743.619995 52.961855
2016-01-07 94.835186 726.390015 51.119702
2016-01-08 95.336649 714.469971 51.276485
 1 `stocks.plot(grid ``=` `True``)`

What’s going on with this outline? While absolute cost is significant (pricy stocks are hard to buy, which influences not just their instability but your ability to exchange that stock), when exchanging, we are more worried about the overall difference in a benefit as opposed to its absolute cost. Google’s stocks are significantly more costly than Apple’s or Microsoft’s, and this distinction causes Apple’s and Microsoft’s stocks to show up substantially less unpredictable than they really are.

One arrangement is utilize two distinct scales while plotting the information; one scale will be utilized by Apple and Microsoft stocks, and the other by Google.

 1 `stocks.plot(secondary_y ``=` `[``"AAPL"``, ``"MSFT"``], grid ``=` `True``)`

A “superior” arrangement, however, is plot the data we really need: the stock’s profits. This includes changing the information into something more helpful for our motivations. There are various transformations we could apply.

One transformation is think about the stock’s arrival since the start of the time of intrigue. At the end of the day, we plot:

$text{return}_{t,0} = frac{text{price}_t}{text{price}_0}$

This will require transforming the data in the `stocks` object, which I do next.

 1 2 3 4 `# df.apply(arg) will apply the function arg to each column in df, and return a DataFrame with the result` `# Recall that lambda x is an anonymous function accepting parameter x; in this case, x will be a pandas Series object` `stock_return ``=` `stocks.``apply``(``lambda` `x: x ``/` `x[``0``])` `stock_return.head()`
AAPL GOOG MSFT
Date
2016-01-04 1.000000 1.000000 1.000000
2016-01-05 0.974941 1.000998 1.004562
2016-01-06 0.955861 1.002399 0.986314
2016-01-07 0.915520 0.979173 0.952007
2016-01-08 0.920361 0.963105 0.954927
 1 `stock_return.plot(grid ``=` `True``).axhline(y ``=` `1``, color ``=` `"black"``, lw ``=` `2``)`

This is a significantly more useful plot. We would now be able to perceive how productive each stock was since the start of the period. Moreover, we see that these stocks are exceptionally related; they for the most part move a similar way, a reality that was hard to find in different diagrams.

On the other hand, we could plot the difference in each stock every day. One approach to do so is plot the rate increment of a stock when contrasting day \$t\$ with day \$t + 1\$, with the recipe:

$text{growth}_t = frac{text{price}_{t + 1} - text{price}_t}{text{price}_t}$

But change could be thought of differently as:

$text{increase}_t = frac{text{price}_{t} - text{price}_{t-1}}{text{price}_t}$

These recipes are not the equivalent and can prompt varying ends, yet there is another approach to show the development of a stock: with log differences.

$text{change}_t = log(text{price}_{t}) - log(text{price}_{t - 1})$

(Here, $log$ is the natural log, and our definition does not depend as strongly on whether we use $log(text{price}_{t}) - log(text{price}_{t - 1})$ or $log(text{price}_{t+1}) - log(text{price}_{t})$.)

The benefit of utilizing log differences is that this distinction can be deciphered as the rate change in a stock however doesn’t rely upon the denominator of a portion.

We can acquire and plot the log differences of the information in stocks as follows:

 1 2 3 4 5 `# Let's use NumPy's log function, though math's log function would work just as well` `import` `numpy as np`   `stock_change ``=` `stocks.``apply``(``lambda` `x: np.log(x) ``-` `np.log(x.shift(``1``))) ``# shift moves dates back by 1.` `stock_change.head()`
AAPL GOOG MSFT
Date
2016-01-04 NaN NaN NaN
2016-01-05 -0.025379 0.000997 0.004552
2016-01-06 -0.019764 0.001400 -0.018332
2016-01-07 -0.043121 -0.023443 -0.035402
2016-01-08 0.005274 -0.016546 0.003062
 1 `stock_change.plot(grid ``=` `True``).axhline(y ``=` `0``, color ``=` `"black"``, lw ``=` `2``)`

Which transformation do you like? Taking a gander at returns since the start of the period make the general pattern of the protections being referred to significantly more evident. Changes between days, however, are what further developed strategies really consider when displaying the conduct of a stock. so they ought not be overlooked.

## Moving Averages

Outlines are valuable. Truth be told, a few dealers base their systems on the whole off diagrams (these are the “experts”, since exchanging techniques based off discovering designs in outlines is a piece of the exchanging precept known as technical examination). How about we currently consider how we can discover patterns in stocks.

A -day moving average is, for a series and a point in time , the normal of the past \$q\$ days: that is, if denotes a moving normal procedure, at that point:

$MA^q_t = frac{1}{q} sum_{i = 0}^{q-1} x_{t - i}$latex

Moving midpoints smooth an arrangement and distinguishes patterns. The larger is, the less responsive a moving normal procedure is to transient fluctuations in the series . The thought is that moving normal procedures help distinguish patterns from “noise”. Fast moving midpoints have smaller and all the more intently follow the stock, while slow moving midpoints have larger , bringing about them reacting less to the fluctuations of the stock and being more steady.

pandas provides usefulness for effectively processing moving midpoints. I show its utilization by making a 20-the very beginning (month) moving normal for the Apple information, and plotting it nearby the stock.

 1 2 `apple[``"20d"``] ``=` `np.``round``(apple[``"Close"``].rolling(window ``=` `20``, center ``=` `False``).mean(), ``2``)` `pandas_candlestick_ohlc(apple.loc[``'2016-01-04'``:``'2016-08-07'``,:], otherseries ``=` `"20d"``)`

Notice how late the moving normal starts. It can’t be processed until 20 days have passed. This constraint turns out to be more serious for longer moving averages. Since I might want to have the option to register 200-day moving averages, I will reach out the amount AAPL information we have. All things considered, we will in any case to a great extent center around 2016.

 1 2 3 4 5 `start ``=` `datetime.datetime(``2010``,``1``,``1``)` `apple ``=` `web.DataReader(``"AAPL"``, ``"yahoo"``, start, end)` `apple[``"20d"``] ``=` `np.``round``(apple[``"Close"``].rolling(window ``=` `20``, center ``=` `False``).mean(), ``2``)`   `pandas_candlestick_ohlc(apple.loc[``'2016-01-04'``:``'2016-08-07'``,:], otherseries ``=` `"20d"``)`

You will see that a moving normal is a lot of smoother than the actua stock information. Furthermore, it’s a difficult indicator; a stock should be above or beneath the moving normal line all together for the line to alter course. Therefore, crossing a moving normal signals a potential change in pattern, and should draw consideration.

Merchants are generally keen on various moving midpoints, for example, the 20-day, 50-day, and 200-day moving midpoints. It’s anything but difficult to analyze numerous moving midpoints on the double.

 1 2 3 4 `apple[``"50d"``] ``=` `np.``round``(apple[``"Close"``].rolling(window ``=` `50``, center ``=` `False``).mean(), ``2``)` `apple[``"200d"``] ``=` `np.``round``(apple[``"Close"``].rolling(window ``=` `200``, center ``=` `False``).mean(), ``2``)`   `pandas_candlestick_ohlc(apple.loc[``'2016-01-04'``:``'2016-08-07'``,:], otherseries ``=` `[``"20d"``, ``"50d"``, ``"200d"``])`

The 20-day moving normal is the most touchy to neighborhood changes, and the 200-day moving normal the least. Here, the 200-day moving normal shows an overall bearish trend: the stock is slanting descending after some time. The 20-day moving normal is now and again bearish and at other times bullish, where a positive swing is expected. You can likewise observe that the intersection of moving normal lines demonstrate changes in pattern. These intersections are what we can utilize as trading signs, or signs that a money related security is altering course and a beneficial exchange may be made.