How would you best divvy up $1000 among 4 stocks using Scalding ?

Scalding solution in github

You think Scalding, and you think Scala code for word-count, page-rank, classic big-data MR jobs that benefit from distributed parallel computing on a Hadoop cluster. But this here ain't your grandfather's map-reduce :)

Problems of this nature were first formalized and solved by Dr. Harry Markowitz at my alma-mater the University of Chicago, way back in 1952! When Harry defended his PhD. thesis, his dissertation advisor MILTON FRIEDMAN is said to have remarked "This is not economics!". However, cooler heads prevailed & Harry became Dr. Markowitz!


Since then, Portfolio Selection has been studied to death - you could divvy up the $1000 under a second using the frontcon(..) function in Matlab, or using CPLEX, or AMPL. So then, why Scalding ? Well, the other day I was buttering a toast while my wife unwrapped a toy for our 5 month old. The toy needed 2AA batteries, and I needed a screwdriver to pry open the battery compartment. Well, I don't have a screwdriver lying around, so what did I do ?


Trick question! You see, I was buttering the toast, so I used the butter-knife to unscrew the battery compartment and thus disaster was averted and the kid got his toy. Moral of the story: Sometimes, you can use Scalding to do all sorts of weird math-y stuff its probably not intended to do. Its a lot of fun! Besides, trying age-old problems using shiny new tools like Scalding brings all sorts of insights to the table.


Lets define the problem formally: 
Divide $1000 among 4 stocks - Kroger, Abbott Labs, Dollar Tree and Monster Beverage, in the best possible way. 
Why these 4 stocks ? Well, you'll find out towards the very end of this article!


val cash = 1000.0 // money at hand
val error = 1 // its ok if we cannot invest the last dollar
val (kr,abt,dltr,mnst) = (27.0,64.0,41.0,52.0) // share prices
val stocks = IndexedSeq( kr,abt,dltr,mnst)



So the above Scala snippet simply says we have $1000 & 4 stocks priced at $27,$64, $41 and $52, sitting in a sequence.


Lets now invoke the weightedSum function in Scalding. This would give us a pipe containing all tuples that satisfy the constraint ie. all posible combinations that add up to $1000, within an error bound of $1.


weightedSum( stocks, cash,error).write( Tsv("invest.txt"))

Lets examine the content of the sink:
$ cat invest.txt 
0 0 13 9
0 1 0 18
0 1 19 3
0 2 1 16
0 2 20 1
0 3 7 10
0 4 8 8
0 5 9 6
0 6 15 0
0 8 3 7
0 9 4 5
0 10 5 3
0 14 0 2
1 0 6 14
1 1 12 8
1 2 13 6
1 3 0 15
1 3 14 4
1 4 1 13
1 5 2 11
1 6 8 5
1 7 9 3
1 8 10 1
1 11 4 2
1 12 5 0
2 0 18 4
2 1 5 13
2 1 19 2
2 2 6 11
2 3 7 9
2 4 13 3
2 5 14 1
2 6 1 10
2 7 2 8
2 8 3 6
2 9 9 0
3 0 11 9
3 1 12 7
3 2 18 1
3 3 0 14
3 4 6 8
3 5 7 6
3 6 8 4
3 9 2 5
3 10 3 3
3 11 4 1
4 0 4 14
4 1 5 12
4 2 11 6
4 3 12 4
4 4 13 2
4 5 0 11
4 6 1 9
4 7 7 3
4 8 8 1
4 12 3 0
5 0 16 4
5 1 17 2
5 2 4 11
5 3 5 9
5 4 6 7
5 5 12 1
5 7 0 8
5 8 1 6
5 9 2 4
6 0 9 9
6 1 10 7
6 2 11 5
6 5 5 6
6 6 6 4
6 7 7 2
6 10 1 3
6 11 2 1
7 0 2 14
7 0 16 3
7 1 3 12
7 2 4 10
7 3 10 4
7 4 11 2
7 7 0 7
7 8 6 1
8 0 9 8
8 1 15 2
8 2 16 0
8 3 3 9
8 4 4 7
8 5 5 5
8 9 0 4
8 10 1 2
9 0 2 13
9 1 8 7
9 2 9 5
9 3 10 3
9 6 4 4
9 7 5 2
9 11 0 1
10 0 14 3
10 1 1 12
10 1 15 1
10 2 2 10
10 3 3 8
10 4 9 2
10 5 10 0
11 0 7 8
11 1 8 6
11 2 14 0
11 4 2 7
11 5 3 5
11 6 4 3
12 0 0 13
12 1 1 11
12 2 7 5
12 3 8 3
12 4 9 1
12 7 3 2
12 8 4 0
13 0 12 3
13 1 13 1
13 2 0 10
13 3 1 8
13 4 2 6
13 5 8 0
14 0 5 8
14 1 6 6
14 2 7 4
14 5 1 5
14 6 2 3
14 7 3 1
15 0 12 2
15 2 0 9
15 3 6 3
15 4 7 1
15 8 2 0
16 0 5 7
16 1 11 1
16 4 0 6
16 5 1 4
17 1 4 6
17 2 5 4
17 3 6 2
17 6 0 3
17 7 1 1
18 0 10 2
18 4 5 1
19 0 3 7
19 1 4 5
19 6 0 2
20 2 3 4
20 3 4 2
21 0 8 2
21 1 9 0
22 0 1 7
22 1 2 5
22 2 3 3
23 0 8 1
23 3 2 2
23 4 3 0
24 0 1 6
24 1 7 0
25 1 0 5
25 2 1 3
25 3 2 1
26 0 6 1
26 4 1 0
27 1 0 4
28 3 0 1
29 0 4 1
34 0 2 0
37 0 0 0


So it looks like we have 169 unique ways to divvy up the cash. Lets take one of the 169 tuples. How about (5,4,6,7) ?
So 5 shares of Kroger @ $27 = $135
4 shares of Abbott Labs @ $64 = $256
6 shares of Dollar Tree @ $41 = $246
7 shares of Monster @ $52 = $468


135+256+246+468 = $1001


But is (5,4,6,7) the best-posible tuple among the 169 choices ?


Lets define the best: We want the tuple that has the smallest risk. The 60-day 4x4 correlation matrix for these stocks is given below: 

KR       ABT    DLTR   MNST
0.448  0.177  0         0.017
0.177  0.393  0.177  0.237
0         0.177  0.237  0.06 
0.017  0.237  0.06    0.19 

What this says is rather simple: We have the variances along the diagonal and the covariances elsewhere. So Kroger has a variance of 0.448, and the 60-day correlation between Kroger and Abbott is 0.177, and so forth. ( Compute 60-day correlations here. )


Given a vector of weights W, the net risk imposed by this matrix = W'CW
To define these matrices & perform matrix transposes & multiplications, lets use COLT.

import cern.colt.matrix.DoubleFactory2D
val data = Array(Array(0.448, 0.177, 0.0, 0.017), Array(0.177, 0.393, 0.177, 0.237), Array(0.0, 0.177, 0.237, 0.06), Array(0.017, 0.237, 0.06, 0.19))
val corr = DoubleFactory2D.dense.make(data)


corr: 
4 x 4 matrix
0.448 0.177 0     0.017
0.177 0.393 0.177 0.237
0     0.177 0.237 0.06 
0.017 0.237 0.06  0.19 


We now take each of the 169 tuples & compute their net risks. The tuple with the least risk is what we are after.

import cern.colt.matrix.linalg.Algebra
import cern.colt.matrix.DoubleFactory1D
import java.util.StringTokenizer

val alg = Algebra.DEFAULT
val file = scala.io.Source.fromFile("invest.txt").getLines.toList


// convert the tab-delimited weights to a row vector
def getWeights(s:String) = {  
     val stok = new StringTokenizer(s, "\t")
     val weights = (1 to 4).map( x=> stok.nextToken.toDouble).toArray
     DoubleFactory1D.dense.make(weights)
}

// compute risks per tuple and sort by risk
file.map(line=> { 
     val w = getWeights(line)
     ( line, alg.mult( alg.mult(corr,w), w))
}).sortBy(x=>x._2)

res11: List[(String, Double)] = List(
(1 0 6 14,56.776), 
(4 0 4 14,56.824), 
(2 1 5 13,57.544), 
'(4 1 5 12,58.552), 
(2 2 6 11,59.646),
.....
(37 0 0 0,613.312)



So the best possible way to invest your $1000 is the (1,0,6,14) tuple.
But if you want to buy atleast 1 share of each stock, (2,1,5,13) is the way to go.
Ofcourse, the absolute worst thing you could do is (37,0,0,0) ie. put all your money in 37 shares of Kroger. With this worst tuple, you get no diversification and take on 10 times the risk of the best tuple.


So ( putting on my Jim Cramer hat here) I am going to recommend the following portfolio mix:
2 shares of Kroger : $54
1 share of Abbott  : $64
5 shares of Dollar Tree : $205
13 shares of Monster: $676
Total : $1001

Go forth and invest!


Okay, enough of finance., back to CS. 
How does all this work inside of Scalding ?

The secret sauce is obviously the algorithm behind weightedSum(...). This is a homegrown sophisticated-brute-force algorithm that I conjured up out of thin air :)
Seriously, this was the very first Scalding algorithm I coded up, and I have not found any reference to this algorithm elsewhere in the literature, so I am going to claim credit for it :))

Lemme walk you thru my algorithm: Say you want all possible ways to purchase 3 stocks, whose prices are  ( $1, $2, $5), to sum to $10.

Make 3 pipes, one per stock -
Pipe 1 has $0, $1, $2, $3, $4, $5,$6,$7,$8,$9, $10
Pipe 2 has $0, $2, $4, $6, $8 , $10
Pipe 3 has $0, $5, $10

So I've just made pipes that hold multiples of the share price.
Now cross the pipes two at a time.
So crossing pipe1 and pipe2 gives us a pipe with 2 columns.
(0,0)
(0,2)
(0,4),
.....
(1,0)
(1,2)
(1,4)
....
.....
(10,10)


a total of 66 tuples. 
Now create a column that holds the sum of the two columns, and throw away tuples that exceed 10.
I call this "progressive filtering" 
This gets rid of bad tuples like (1,10), (2,9),(2,10),(3,8),....


Now take this pipe with good-tuples, and cross with the third pipe.
Once again, apply progressive filtering.
This gets rid of bad tuples like (1,9,5), (1,9,10),(2,8,5),(2,8,10),....


This combination of "cross-multiply with progressive-filter" is what makes weightedSum sophisticated.


But why is weightedSum(...) brute-force ?
Well, with 4 stocks , the smallest stock being the $27 Kroger, and $1000 at stake, we had 169 possible candidates.
Where did they come from ?
Well, 1000/27 ~  40.
40^4 = 2,560,000
So in the worst case, we had a total of 2.5 MILLION fourples....but we managed to narrow them down to 169 valid candidates. ( Note that due to Progressive Filtering, we probably generate less than 2.5 million fourples....under a million ballpark )

In general, the problem scales like O( (cash/minWeight)^numStocks )

So it is certainly exponential, and hence the brute-force aspect of it.
However, given Scalding's awesome distributed capabilities, and the parallelized nature of our algorithm ( remember - 1 pipe per stock, so n stocks lead to n pipes crossed & progressively filtered in parallel ), we perform quite well....in fact, this particular problem benchmarks under 2 seconds! Now, as the number of stocks grow, or if we have more cash to play with, this leads to exponentially more bad tuples...but unlike in Scala where we'd simply run out of memory unless we resort to Dantzig, Scalding will happily chug along & compute the best possible tuple !

Mystery: You might be wondering why we picked the 4 stocks - Kroger, Abbott Labs, Dollar Tree and Monster Beverages. One of several criticisms against Portfolio Selection using mean-variance optimization techniques is that it isn't stable. The variance of stocks aren't fixed, and if you run this same algorithm a month from now, the correlation matrix would have surely changed, and hence the tuples. That does not mean it's a useless algorithm. In fact, while in grad school I interviewed with BlackRock, who were using what was essentially a souped-up Markowitz optimization for a $100B fund with 100s of stocks in them. "So won't the prices and matrices change everyday ? " I asked. Well, yes! So Blackrock runs the algorithm throughout the day and rebalances the portfolio twice a day - at 11am and 2pm !!!

For this exercise, I used Google Stock Screener to pick 4 stocks with a low beta ( risk) and high average volume that had correlation data available in AIStockCharts.
The lowest possible betas with corr data were KR,ABT,DLTR and MNST. 
Mystery solved!

Scalding solution in github
 


Comments

03/05/2013 3:40am

Can't we come up with some algorithm which would help us to do this in an easy way?

Reply
10/15/2013 4:39am

Thanks for the nice blog. It was very useful for me. Keep sharing such ideas in the future as well

Reply
04/12/2013 3:02am

I was totally unaware of the concept of Scalding solution in github. Thanks to the above post.

Reply
06/28/2013 4:58am

I recently came across your blog and have been reading along. I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Reply
07/03/2013 5:00am

You idea for using Scalding has opened up immense possibilities for me. I love to play with numericals and is very much excited to use Scalding on my project. Please tell me where to download the software from.

Reply
07/12/2013 5:17am

This is a very significant blog. Thanks for sharing your thoughts. Keep up the good job in posting very good topics.

Reply

This is the best way to share the information and we can get easily.

Reply
07/17/2013 11:59pm

I agree with @up comment. I found lot of useful content here, thanks :)

Reply
07/18/2013 5:27am

These information helps me consider some useful things, keep up the good work.

Reply

Dissertation writing service that is rendered by our dissertation writers is aimed at offering affordable dissertation writing help of any kind to those who strive to receive only the highest grades and wish to get qualified assistance prior to submitting the paper.

Reply
07/24/2013 6:15am

Thanks a lot for enjoying this beauty article with me. I am appreciating it very much! Looking forward to another great article. Good luck to the author! all the best!

Reply
07/25/2013 4:56am

I found this blog post searching for something related to social media. Your story is sad, although is very well written. I am sorry for the loss of your friend

Reply
07/29/2013 5:50am

I recently came across your blog and have been reading along. I thought I would leave my first comment.

Reply

I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful.

Reply
08/05/2013 6:13am

Excellent is the only word I can give u for this wonderful blog, keep it up. I will come back again to read some more interesting things on this topic.

Reply
08/09/2013 11:20pm

I am appreciating it very much! Looking forward to another great article. Good luck to the author! all the best!

Reply
08/14/2013 8:37am

You imagine Scalding, therefore you consider Scala signal intended for word-count, page-rank, basic big-data MR work in which take advantage of distributed parallel processing with a Hadoop bunch. Nevertheless this specific the following ain't the grandfather's map-reduce.

Reply
08/15/2013 4:50am

I am writing a research paper and collecting information on this topic.

Reply
08/15/2013 6:51am

Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I will be subscribing to your feed and I hope you post again soon.!

Reply
08/19/2013 6:45am

I have been previously exploring for quite a while for one pleasant articles or blog posts on the subject of this subject matter . Checking out in Yahoo and google I eventually uncovered this blog post.

Reply
08/23/2013 4:54am

I agree with your suggestions. Thanks for sharing all, this is helpful for many people.

Reply
08/30/2013 7:18am

I'm no longer positive where you are getting your information, however good topic

Reply
08/31/2013 7:17am

You always write fabulous. This is third time I am reading any of your blog and again finding it inspirational one.

Reply
08/31/2013 10:58am

That proves once again that I don't know much about stocks. However I am determined to learn all about them and this seems to be a good place to start with. I am glad I found your post.

Reply
09/06/2013 5:54am

I recently came across your blog and have been reading along. I thought I would leave my first comment. I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.

Reply
09/08/2013 6:31am

The toy essential 2AA electric batteries, along with My partner and i essential a screwdriver in order to pry open the actual power supply pocket.

Reply
09/09/2013 2:38am

This is my first visit but i am really very impressed from this blog therefore, thanks for sharing the information

Reply
09/16/2013 3:44am

Hi, Nice work, I've bookmarked this page and have a feeling I'll be returning to it regularly.

Reply

We can calculate division by multiplication in such a case. This approach is useful in computers that do not have a fast division instruction. Thank you.

Reply

In Roman times, children were regarded as not culpable for crimes, a position later adopted by the Church.

Reply
09/22/2013 1:42am

Hi, Neat post. There is a problem with your site in internet explorer, would test this?IE still is the market leader and a large portion of people will miss your great writing due to this problem.

Reply
09/27/2013 10:44am

Excellent solution to a fairly common problem.

Reply
10/02/2013 11:26pm

Seriously, this was the very first Scalding algorithm I coded up, and I have not found any reference to this algorithm elsewhere in the literature, so I am going to claim credit for it :))

Reply
10/04/2013 2:59am

We Awesome, You have written an excellent blog that has convinced me to read this!

Reply
10/04/2013 5:18am

Nicely presented information in this post, I prefer to read this kind of stuff. The quality of content is fine and the conclusion is fine.

Reply
10/08/2013 1:37pm

This is a truly fantastic read for me, Must admit that you are one of the greatest bloggers I ever saw.Cheers for posting this educational post. My kind regards.

Reply
10/09/2013 5:21am

Es gibt in der Tat sind wirklich Variationen nach Marken stylesvon Produkte sind wirklich vorhanden einfach mal Entwickler. Jeder styles neigen zudes Überstands und auch bestimmtein Sie vor allem eine ArtEindruck. Derdie meisten grundlegende Ideen diein suchen Was passt Was auch immer sie Dieerforderlich zur Ralph Lauren billig Strumpfwaren ista breite term dieumfasst eine Vielzahl von Bekleidung. Dieserist herausragende Optionfür Personen wer wünschein Schock die mate.

Reply
10/09/2013 5:24am

Frankreich Mai nur die selbstvon bekannt Fashion, Designer in dem Gebiet. Der Design neigen zudes Überstands und ziemlich sicherin Sie Auswirkungen. IHoffnung sie entdecken sie die lohnt sich Entwicklung, Rat über eine Frau praktische Jacke Ralph Lauren verschiedene für sehrBold Aktivitäten füraFlirt am Nachmittag oder vielleicht Nacht Suche. Der fest ist ein weiterer sehrwichtigMarkierung für Artvon Bekleidung.

Reply

I have read your article, I am very much impressed because you way of explanation quite good and very informative. And one more thing I have got to know that everyone has a different style to write the article, but I must say your article sounds very good.

Reply

Enigma: You potency be wondering cause we picked the 4 inventorys - Kroger, Abbott Labs, Dollar Shrub moreover Beast Liquors. Unit of diverse disapprovals into Portfolio Option using import-dispute optimization approachs is that it isn't lasting. The disagreement of equips aren't constant, also if you flow this very algorithm a month from forthwith, the relationship cast would acquire surely changed, plus since the tuples.

Reply
10/15/2013 6:21am

This is a truly amazing analysis for me, Must recognize that you are one of the biggest blog page writers I ever saw.Cheers for publishing this academic publish. My type regards.

Reply
10/21/2013 5:52am

The variance of stocks aren't fixed, and if you run this same algorithm a month from now, the correlation matrix would have surely changed, and hence the tuples.

Reply
10/21/2013 2:49pm

Nice blog about. I really appreciate the blog writing. Thanks for the wonderful post.

Reply
10/23/2013 9:20pm

I want to confirm that, your post is so interesting. It contains a lot of important and useful information. I got a lot of great things. Thank you so much!

Reply
10/30/2013 10:58pm

Finally, an issue that I am passionate about. I have looked for information of this caliber for the last several hours. Your site is greatly appreciated.

Reply

Simply, admirable what you have done here. It is pleasing to look you express from the heart and your clarity on this significant content can be easily looked. Remarkable post and will look forward to your future update.

Reply

This is a smart blog. I mean it. You have so much knowledge about this issue, and so much passion. You also know how to make people rally behind it, obviously from the responses. Youve got a design here thats not too flashy, but makes a statement as big as what youre saying. Great job, indeed.

Reply
10/31/2013 3:16am


This is very attention-grabbing, You're a very skilled blogger. I have joined your feed and sit up for in quest of extra of your wonderful post. Additionally, I have shared your website in my social networks

Reply
11/01/2013 5:47pm

Thank you for another essential article. Where else could anyone get that kind of information in such a complete way of writing? I have a presentation incoming week, and I am on the lookout for such information.

Reply

Couldn?t be written any better. Reading this post reminds me of my old room mate! He always kept talking about this. I will forward this article to him. Pretty sure he will have a good read. Thanks for sharing!

Reply
11/01/2013 5:48pm

I was very encouraged to find this site. I wanted to thank you for this special read. I definitely savored every little bit of it and I have you bookmarked to check out new stuff you post.

Reply
11/01/2013 5:48pm

Simply, admirable what you have done here. It is pleasing to look you express from the heart and your clarity on this significant content can be easily looked. Remarkable post and will look forward to your future update.

Reply
11/20/2013 11:17pm

Grateful to have come across this blog. The information you posted is useful in many ways. I really need to start jotting down everything.

Reply
01/14/2014 6:35pm

Billige Nike Air Max Sko Danmark,Tilbud Dame|Herre Air Max Sko Til Salg Online

Reply
01/14/2014 6:35pm

Billiga Nike Free Run Skor Sverige,Köpa Dam|Herr Nike Free Run Löparskor Rea

Reply
01/23/2014 11:42pm


This is very interesting blog ilike it very much.

Reply

It is possible to avail of a loan even before your lock-in period is over; however, it is quite hard to achieve this. One thing that could help you here is your post-bankruptcy credit report. If it is flawless, then you might have a chance to be considered for the loan. Besides, you would need to make a large deposit of 3-5% of the total loan amount.

Reply
03/19/2014 9:16am

Interesting website, if I'm being honest with myself.

Reply
04/01/2014 12:50am

Been reading this site for awhile now, always has really good posts and topics please keep it up! loads of blogs are going under lately from lack of new posts etc

Reply
04/21/2014 11:18pm

I was searching that topic from few days its increase my knowledge

Reply
05/02/2014 12:29am

Excellent and decent post. I found this much informative, as to what I was exactly searching for. Thanks for such post and please keep it up.

Reply
06/03/2014 11:32pm

I really like the way you start and conclude your thoughts. Thank you so much for this information

Reply

You can only perceive real beauty in a person as they get older

Reply



Leave a Reply