Musing #58: Mutual Fund (SIP) Portfolio Overlap Analyser



Being from a finance background, I made it a point to invest in SIPs early on. Over the years, while the investment amount has increased steadily, the number of funds being invested in has remained more or less constant. Hence, I need not emphasis how important it is to know where exactly the money is going.

Too often, the choice of a fund is made simply on returns and diversification is achieved by selecting a different fund class. However, it provides no indication of the extent of value creation. I prefer to keep an eye out on what's happening with my portfolio and it is not only when selecting a new fund but also for keeping tabs on what's going on with the existing investments.


My search for websites/files providing this information yielded a few options that were quite limited in nature, dispensing basic overlap information between two or three funds. Unable to find the requisite information, I decided to go on my own and create an Excel workbook that provides overlap analysis for up to six funds. The other target I had set for myself was to do so without the use of VBA, so the only permission required is to access the external data source - moneycontrol.com.

The workbook is structured in to distinct sheets for input and detailed analysis. The 'Input' sheet is pretty straightforward and is essentially a two-step process requiring the funds and investment amount to be entered along with the selection of the fund that would form the basis of checking the overlap. It would be a good idea to read through the notes prior to using the workbook. The sheet has some safeguards built in to alert the user about inconsistent inputs, like missing investment values/funds and failure to refresh the 'base fund' selection. At the same time, it is robust enough to still function immaculately when any of the selected funds are deleted.


Note that although the sheet includes funds with equity holdings from various classes, some of them do not have their holdings listed on moneycontrol.com which may cause an error illustrated above. As such, there is nothing that can be done about it. Also, to state the obvious, the default funds selected in the sheet are for illustration and are not suggestive.


The 'Analysis' sheet provides the primary analysis of the portfolio. Besides listing the fund class and the equity holdings of each fund, it provides the percentage overlap of the base fund with all the other funds in the portfolio, both, in terms of the number of stocks and the value invested. The charts in turn provide 'Top 10' visualisations for individual stocks as well as the different sectors.


The 'Detail' sheet provides the tabular information that form the basis of the analysis and lists all the values as against only the Top 10 in the charts.


The 'MFx' sheets list the holdings of each fund, as retrieved from moneycontrol.com and is subsequently used for the overlap calculations.


Finally, the 'List' sheet is a list of the funds retrieved from moneycontrol.com and covers the various equity fund classes. It is easy to add any new funds to the list in the specified format and the information can be scraped en masse from the MoneyControl site.

As is often the case, I have created something to primarily fulfil my needs but with the intention of sharing it with other netizens. Consequently, I am open to any suggestions for improvement which you may leave in the comments section.

Link: Download from Google Drive

Musing #51: The philosophical difference between Formula 1 and Formula E


Compared to last year and the year before, I have decided to change tack and throw Formula E in to the mix this year. While the renders were shared in January, Formula E physically unveiled its Gen2 car at the Geneva Motor Show earlier today. I find it to be an attractive design, specifically as it is something that Formula 1 is unlikely to mimic anytime soon; unless its American owner, Liberty Media, manages to miraculously convince the teams that the show is more important than the performance.

This brings me to the point of why Formula E finds it favourable to adopt such a radically different design compared to Formula 1, marketing reasons aside. I cannot profess to be an aerodynamicist but over two decades of following motorsports has led me to be more appreciative of its technical aspects. Also, it feels satisfying to be able to tap in to my years of studying physics and engineering, and leverage it to satisfy the curiosity of a random commenter on the Internet.

Formula 1 is considered to be the epitome of motorsports and rightly so. It is all about harnessing the ultimate performance from the machine and achieving the ultimate lap time, much of which is accomplished by being fast through the turns. Hence, F1 cars are set up to have the highest possible downforce so that the turns can be taken as fast as possible while ensuring that the high drag that comes with it doesn't impact the straightline speed as much.

Unfortunately, most of the downforce in modern F1 cars is generated using aerodynamic structures and appendages which leaves a significant disturbed air flow for the car following behind. A consequence of this approach is the poor racing that we see in F1 these days. The much-maligned Drag Reduction System (DRS) overcomes this specific obstacle for the following car, though it seems the wider cars and even more intricate aerodynamic structures have rendered it less powerful (and thankfully so).

On the contrary, Formula E being a spec chassis series, isn't focused on ultimate performance. The philosophy here is to accelerate quickly out of the corners using the instant torque from the electric motors, reach the top speed as quickly as possible down the straights and then coast for the remainder of the straights, before breaking hard while already cornering to aid the charging of batteries using the Kinetic Energy Recovery System (KERS). The power absorbed by drag increases with the cube of speed increase, so less drag results in less energy expense over a lap, while ensuring higher top speeds along the straights. The less disturbed air of a low drag/downforce setup certainly helps the following car but a side benefit of this, coupled with the low-grip, all-weather Michelin tyres, and instant torque is that the cars are incredibly difficult to handle around the corners due to which we see a lot more driver errors in Formula E compared to F1's cornering on rails.

I hope Season 5 of Formula E brings in better uninterrupted racing, made possible by having a single car complete the race. However, I hope that some strategic element of a pit-stop is retained, e.g., allowing for quick, short recharges for additional power at the expense of lost time. Formula 1 and E aside, 2018 is looking to be another cracking year for motorsports with a competitive MotoGP field littered with manufacturers and the the new low-downforce IndyCar.

Musing #49: IMDb Movie Selector


The process of selecting a movie is far less enjoyable that the act of watching it. Hence, I created an app in Qlik Sense to select a movie on the basis of any of the following criteria:

  • Rating
  • Year
  • Runtime
  • Votes
  • Genre
  • Principal Cast
  • Release Year
  • Title Type
  • Adult Content
It uses the dataset files provided by IMDb on their website, which is not to say that it is without issues. However, I have done my best to remove illegible values from each of the columns. There are some interesting insights immediately visible on the loading of the app like:
  • As of Jan 27, IMDb has 4.78 million titles listed with a total of 794.5 million votes
  • The rating curve is a right-tail distribution indicating titles tend to be rated higher than lower
  • The average score of all titles is 6.94, so anything rated 7+ should be considered better than average
  • Drama and Comedy constitute the bulk of titles by far
  • The number of titles has been increasing exponentially over the decades, but it has exploded in this century
There are of course a lot more insights to glean out of this data, so if you are intending to use the app, you need to download the following 4 files from the aforementioned IMDb website. These are updated on a daily basis, so make sure to grab the latest ones whenever you are using the app.
  • title.basics.tsv.gz
  • title.principals.tsv.gz
  • title.ratings.tsv.gz 
  • name.basics.tsv.gz
Each of these files, on extraction, will contain a data.tsv file which is in a tab-separated format. For this app, I renamed each of these files as per the source (eg. title.basics.tsv) and loaded them from 'D:\Data\IMDB' folder. So, if you are doing things differently, be sure to adjust the data library path and file names in the data load editor.

I would have loved to host this app on the web but my server certainly isn't up to the task of handling the work load. Hence, feel free to download the qvf, dataset files and set about on your own journey of discovery. I will leave you with a couple more screenshots indicating the look and usage of this app.


Musing #45: Creating a visual dashboard using Qlik Sense


Lately, I have been in to Qlik Sense and was looking to create a single page dashboard for publicly available information. The most logical place to head to was Kaggle and keeping in line with my previous post concerning US-India, I decided to go with the H1-B Visa Petition data set.