Berkshire Hathaway Quantitive Text Analysis

In my continuing effort to #showYourWork here is my final project for my Data Management and Visualizations course. Hopefully this doesn't trigger a plagerism record since Turn It In web crawls as part of its process.

TO: Senior Director Business Analysis
FROM: Carlton Matthews
DATE: 24 April 2016
SUBJECT: Using Python for Textual Analysis


Introduction

At our last information exchange meeting, you put to the team to determine ways to increase our analytical toolkit. As a financial services business much of what we do involves numbers and other hard data points. However, during our discussion, we wondered if there were ways to determine sound investments based on more than just hard financial data.

I recommended that we look at other sources of information related to companies we are investigating. Specifically, I wondered if we could look through the annual reports looking at the CEOs letter to identify any interesting patterns. Over the last week I have spent time developing a program using the Python language to look at a series of CEO letters from Berkshire Hathaway Incorporated a publicly traded company headed by Warren Buffett. Every year Mr. Buffett writes a letter to the shareholders as part of their annual report.

Processing the Raw Text

I pulled CEO letters in 10 year intervals starting in 1977 and ending in 2016. This gave me a total word count 53219. Using Python to extract the text was a simple exercise however turning that data into something meaningful was a challenge. First the more commonly used words in the English language should be excluded from our analysis. These words are known as stop words and there are collections that exist online for use for this application. Here is some of the list that I used for my analysis.

'beside', 'besides', 'between', 'beyond', 'bill', 'both', 'bottom', 'but', 'by', 'call', 'can', 'cannot', 'cant', 'during', 'each'

After processing the text files across the time periods I was give a list of over 200 words. Each of these words was used at least 10 times within that year’s letter. To further reduce the number of words I began to combine root words with their tenses. I combined business and businesses as an example. While this still left us with a list of 227 words now some interesting patterns were beginning to emerge.

Quantitative Textual Analysis

Berkshire Hathaway Word Cloud The above figure shows the top 50 words used within the CEO letters. I could have removed Berkshire as you would expect it to be used multiple times but it serves as a visual descriptor of the data. From here we can see the most used words are business, company and earnings. This makes sense as Berkshire Hathaway grows by buying portions of companies. From their “Owners Manual” one of their stated goals is, “… directly owning a diversified group of businesses that generate cash and consistently earn above-average returns on capital.” (Buffett, Berkshire Hathaway Inc. - An Owners Manual, 1999))

Another way to represent this data can be seen below. This chart shows the top 20 words stacked by year.

Top 20 Words Stacked Bar Chart As with the first visualization the top words are business, company and earnings. Later on I will examine these three terms but I want to note that from 1977 to 1987 there was significant growth in both the company and the size of the CEO letter. The 1977 letter is a little more than 2300 words while the 1987 letter is just over 12000. During that time period Berkshire Hathaway also increased their stock value from $138 in 1977 to $2,950 in 1987, primarily through increasing their business holdings. (Pritchard, 2008)

Most Used Words

This leads us to the most important set of words in our dataset. These three words are used more times than any other words within the CEO letters. The words, which have already been discussed, are business, company and earnings. In the below chart we see one representation of how these words have been used over time. The dark blue area of each bar represent 1977, followed by 1987 in orange, ’97 in green, 2007 is read and the 2015 words in purple. Business includes both the singular and plural usage of the word. For a company that is in the business of holding stocks of other businesses you expect to see multiple instances. From 1977 to 1987, as mentioned previously, there were a lot of businesses added the the Berkshire Hathaway portfolio. That shows in the spike in the usage from 1977 to 1987.
Top 3 Words Stacked Bar Chart In 1977 there was a lot of discussion of the company and the companies that make up their portfolio. As more companies were added which caused growth in Berkshire as a company the usage increased with a peak in the most recent letter. As expected, as the company grew the earnings also grew which is seen in the usage of the the word earnings from 1977-2015. Below we see the same data represented. This time however we are looking at each word and their word counts over time. It paints the same picture as described above. This style of graph best illustrates the spike in the usage of the word business.
Top 3 Words Line Graph

Conclusion

After walking through this process. I think that we can use this type of analysis to supplement financial models to better understand our investments. Berkshire Hathaway was a good choice for this analysis because provides a long history and publically accessible data to glean from. If you think that this is a worthwhile endeavor, I can begin enhancing the program to ingest and process even larger volumes data. Berkshire Hathaway could be used as test case again as they have all their CEO letters publically available.

References

Buffett, W. E. (1978, March 14). 1977 Shareholder Letter from the CEO. Retrieved April 18, 2016, from Berkshire Hathaway Inc.: http://www.berkshirehathaway.com/letters/1977.html

Buffett, W. E. (1988, February 29). 1987 Shareholder Letter From the CEO. Retrieved April 20, 2016, from Berkshire Hathaway Inc.: http://www.berkshirehathaway.com/letters/1987.html

Buffett, W. E. (1998, February 27). 1997 Shareholder Letter From the CEO. Retrieved April 20, 2016, from Berkshire Hathaway Inc.: http://www.berkshirehathaway.com/letters/1997.html

Buffett, W. E. (2008, February 27). 2007 Shareholder Letter From the CEO. Retrieved April 20, 2016, from Berkshire Hathaway Inc.: http://www.berkshirehathaway.com/letters/2007ltr.pdf

Buffett, W. E. (2016, February 27). 2015 Shareholder Letter From the CEO . Retrieved April 20, 2016, from Berkshire Hathaway Inc.: http://www.berkshirehathaway.com/letters/2015ltr.pdf

Buffett, W. E. (1999, January 30). Berkshire Hathaway Inc. - An Owners Manual. Retrieved April 20, 2016, from Berkshire Hathaway Inc.: http://www.berkshirehathaway.com/owners.html

Pritchard, J. (2008, April 02). A Look at Berkshire Hathaway’s Annual Market Returns From 1968 – 2007. Retrieved April 20, 2016, from All Financial Matters: http://allfinancialmatters.com/2008/04/02/a-look-at-berkshire-hathaways-annual-market-returns-from-1968-2007/

Appendix: Python Code

See project on Github