Photo by Yiorgos Ntrahas on Unsplash
Tutorial - Correlation between Indian Bank Stocks (feat. Nifty Bank)
Hello everyone, this article is about the tutorial on how to do the correlation analysis of Indian Bank stocks which was one done in this article . Where we compared major private and public banking stocks listed on the stock exchange. This is my first tutorial article, please share it if you like it. Let's get right into it.
I will be using Python for data analysis for this one.
Install Dependencies
First, we need to install the required dependencies
pip install pandas
pip install numpy
pip install yfinance
pip install seaborn
Import Dependencies
Now, we will import these required dependencies into our code.
import pandas as pd
import numpy as np
import yfinance as yf
import seaborn as sns
Fetching Stock Price Data
I have taken the below stocks for correlation analysis and stored them in a list. You can see I have added their ticker symbols ending with ".NS" these are picked up from Yahoo Finance since we will be fetching data from Yahoo Finance.
tickers = ['HDFCBANK.NS', 'ICICIBANK.NS', 'AXISBANK.NS', 'SBIN.NS', 'KOTAKBANK.NS', 'INDUSINDBK.NS', 'BANKBARODA.NS']
Now we will fetch data for these stocks from Yahoo Finance using yfinance
. I am fetching data of 10 years with auto_adjust=True
for getting close price data adjusted for corporate actions. Then storing them into another list called data
.
data = []
for ticker in tickers:
ytick = yf.Ticker(ticker)
df = ytick.history(period="10y", auto_adjust=True, threads = True)
df = df[df['Close'] > 0]
data.append(df)
Manipulating Data
Now, we have a list with multiple panda data frames. For creating a correlation matrix, we need to merge them into a single data frame with only the closing price data of all the stocks.
For this to achieve, we will first use the zip
function to zip all the data frames and then use a dictionary to associate the data frame with the stock using the dict
function. Finally, we will use the concat
function of pandas to concatenate all these data frames from dict
with appropriate columns.
mergedDf = pd.concat(dict(zip(tickers, data)), axis=1)
The new data frame should look something like this
Now since here we have multiple columns of data, but we don't need it, that's why we will only keep Close
column data.
We will get the Level 1 values and filter only with the Close
column.
closeDf = mergedDf.loc[:,mergedDf.columns.get_level_values(1).isin(['Close'])]
It will now look like this
But we also don't need these two levels of headers now. So we can drop Level 1 and only keep symbol tickers as the column header.
closeDf.columns = closeDf.columns.droplevel(1)
It will now look like this
Correlation Analysis
Now we have close price data for all stocks in a single data frame. We now need to calculate daily returns so that we can then create a correlation matrix from that data. But, simple pct_change
returns won't work as every stock has a different base as well, so we will need to calculate daily log-returns of all stocks. We can do this by doing the following:
logretDf = np.log(closeDf.pct_change() + 1)
It will now look like this
Now we have a data frame from which we can get various insights regarding correlation, standard variation etc. We can show the correlation matrix by
logretDf.corr()
You will see something like this
This shows us the correlation percentage of all stocks with each other over the past 10 years of data.
To visualize this better we can use seaborn
to plot it in a graphical fashion like below
sns.heatmap(logretDf.corr())
You will see something like this
This will show us a much easier view of the correlation between various banking stocks.
If you got this far, means now you can do correlation analysis of any number of stocks. If you like what I write, please do share it on social media. Till next time, peace.!