Pair Trading - Exploring The Low Risk Statistical Arbitrage Trading Concepts

ncube

Well-Known Member
#61
Almost every pair is coming out to be significant (even with significance =0.01),wipro:tatasteel,wipro:drreddy etc.
Is some normalisation /detrend etc. to be done on data?Asking this since I do not understand cointegration well enough.
I use the daily returns of the stocks for calculation which are usually stationary. In the load_data function I am using the last 200 day closing price for each stocks. So over a 200 day period you would find many stock pairs co-integrating. For this value I get 2000+ pairs daily with significance value of less than 0.01 among nifty 500 stocks. I select few among those with the highest significance and use 20 days loopback for zScore calculation. These numbers are based on our preference, I look for pairs which mean revert over medium term. If you reduce the number of days from 200 to 100 you may find fewer pairs. You can change it if you want, but based on my experience it will not impact much for day trading the pairs.

By the way did you check the significance on the stockdata file I had shared or have you made any updation ? I am getting the significance value as follows for the pairs you have quoted:
WIPRO:TATASTEEL --> 0.039
WIPRO-DRREDDY --> 0.041

Also do we have any inbuilt funtion in statsmodels that will create a matrix of significance ?
Btw nice thread.
One can write a loop in python to run through all the pair combinations, however I do not recommend it as it will not provide much value. I would suggest you to identify few pairs and monitor those regularly. It more important that you understand the behavior of the pairs to trade it efficiently.

ok so pct_change has been used which will normalise the data.Am I correct?
Yes, I use the pct_change function to get the daily returns of the stocks, there are different ways to calculate the cointegration significance values, like price ratios, price spread etc, However based on my analysis I found returns spread giving me better results.
 
Last edited:

VJAY

Well-Known Member
#62
I use the daily returns of the stocks for calculation which are usually stationary. In the load_data function I am using the last 200 day closing price for each stocks. So over a 200 day period you would find many stock pairs co-integrating. For this value I get 2000+ pairs daily with significance value of less than 0.01 among nifty 500 stocks. I select few among those with the highest significance and use 20 days loopback for zScore calculation. These numbers are based on our preference, I look for pairs which mean revert over medium term. If you reduce the number of days from 200 to 100 you may find fewer pairs. You can change it if you want, but based on my experience it will not impact much for day trading the pairs.

By the way did you check the significance on the stockdata file I had shared or have you made any updation ? I am getting the significance value as follows for the pairs you have quoted:
WIPRO:TATASTEEL --> 0.039
WIPRO-DRREDDY --> 0.041


One can write a loop in python to run through all the pair combinations, however I do not recommend it as it will not provide much value. I would suggest you to identify few pairs and monitor those regularly. It more important that you understand the behavior of the pairs to trade it efficiently.


Yes, I use the pct_change function to get the daily returns of the stocks, there are different ways to calculate the cointegration significance values, like price ratios, price spread etc, However based on my analysis I found returns spread giving me better results.
Dear ncube bro,
Why am getting different figure for above wipro:tatasteel & wipro-drreddy?!!!!!!!!its not near your figure.I missing something?

Untitled.png
 

ncube

Well-Known Member
#63
I wanted to check the behavior of few popular banking pairs which are tracked by bigger players. so traded the following 2 pairs today, Only CANBK trade was executed. Most of the times for these types of pairs I am observing that the stock which was supposed to move less tends to move more than the other stock on daily basis but over few days it normalizes. If interested you guys can analyze the 30 min charts for these stocks.
CANBK-SYNDIBANK
ORIENTBANK-SBIN

Screenshot_2018-07-30-17-13-57-811_com.zerodha.kite.png Screenshot_2018-07-30-17-18-23-542_com.zerodha.kite.png
 
Last edited:

ncube

Well-Known Member
#64
Dear ncube bro,
Why am getting different figure for above wipro:tatasteel & wipro-drreddy?!!!!!!!!its not near your figure.I missing something?

View attachment 26977
Please note S1= TATASTEEL, S2= WIPRO is not same as S1=WIPRO, S2=TATASTEEL, this will make lot of difference for positional trading, however for day trading it can be ignored if the cointegration is significant.

Now this 0.70 value is too high for this sequence of pair, I am getting the value 0.02. Have you done any changes to the stockdata_20180726.csv file? Is there min 200 days data? Did you change the value in load_data function? If you reduce it from 200 then the pValue would be high.
 

VJAY

Well-Known Member
#65
Please note S1= TATASTEEL, S2= WIPRO is not same as S1=WIPRO, S2=TATASTEEL, this will make lot of difference for positional trading, however for day trading it can be ignored if the cointegration is significant.

Now this 0.70 value is too high for this sequence of pair, I am getting the value 0.02. Have you done any changes to the stockdata_20180726.csv file? Is there min 200 days data? Did you change the value in load_data function? If you reduce it from 200 then the pValue would be high.
No.... nothing done for orginal things ...am just playing in changing S1 & S2 ...:( just added fridays close price in excel sheet
 
Last edited:

ncube

Well-Known Member
#66
No.... nothing done for orginal things ...am just playing in changing S1 & S2 ...:(
Can you post the 1st cell content from your PairTrading.ipynb file (The import library cell contents) and also share me your stockdata_20180726.csv file.
 

ncube

Well-Known Member
#68
For the above mentioned pair I am getting the following values. I have not made any changes to the values of the csv file.

View attachment 26979
Strange, not sure why each is getting different pValues, atleast in your case the plot_pairs values are same as mine:
Y = TATASTEEL Price : 534.85
X = WIPRO Price : 271.4
zScore: 0.445

But looks like Vijay has made some changes to the stockdata_20180726.csv file as WIPRO price in his output is 274.5

Can you also post the content from your 1st cell, just want to check if any changes in the functions.

Also try running these 2 commands in a new cell in Jupyter notebook, just to verify that the statsmodel version is correct:
import statsmodels.api as sm
sm.version.version

Output expected is '0.8.0'
 
Last edited:

VJAY

Well-Known Member
#69

VJAY

Well-Known Member
#70
Strange, not sure why each is getting different pValues, atleast in your case the plot_pairs values are same as mine:
Y = TATASTEEL Price : 534.85
X = WIPRO Price : 271.4
zScore: 0.445

But looks like Vijay has made some changes to the stockdata_20180726.csv file as WIPRO price in his output is 274.5

Can you also post the content from your 1st cell, just want to check if any changes in the functions.

Also try running these 2 commands in a new cell in Jupyter notebook, just to verify that the statsmodel version is correct:
import statsmodels.api as sm
sm.version.version

Output expected is '0.8.0'
Yes something wrong done in excell I replaced it with your new file...now seem ok but still p value is different

Untitled.png