One day the bean counter said to the boss - Unless you fire one employee, we are not going to be profitable this year.

The boss loudly said: Find the oldest guy who has been at this company for the longest time and is still not making six figures. We'll fire him!

Impeccable management logic here: The guy has been with the company for the longest time. The guy is the oldest. He is still not making six figures. Obviously a sure loser. Fire this loser and save money!

Three programmers Tom, Dick & Harry overheard their boss and got to work immediately.

The employee database obviously looks like this:


So we see there are 10,000 employees at this company.
They are between 25 & 35 years old.
They have been with the company anywhere from 1 to 6 years.
They make between $85,000 to $150,000.

Tom was a PHP/JS programmer. He worked on rapidly prototyped web-apps and didn't give a rat's ass about efficiency.
Tom reasoned: This is just a nested sort. Let me first sort on the tenure. That will tell me who has been with the company for the longest time. Then I'll sort on age. That tells me who the oldest bloke is. Then I'll sort on salary. That tells me who is making the least money. That should be it!

In Scala, Tom's code would look like:


Dick was a back-end systems programmer. He thought deep and hard about algorithms & Big O performance and such.
 So Dick said - there's no need to sort anything. We just want the oldest guy with the longest tenure who makes the least salary. That just an O(n) problem, NOT an O(nlog(n)) problem. We just need the min tuple.

In Scala, Dick's code looks like this -


Harry was a math major who went about looking for mathematical abstractions in the simplest of code.
Harry took the day off and thought long & hard about this problem while lounging in the bathtub.
Suddenly, it occured to Harry - Hey, this is a monoid!
Naked Harry happily danced the mathematician's Eureka Dance, and set about coding.

In Scala, Harry's code looks like:


In scalaz, it would look like this:



The next day, Tom, Dick & Harry met with the boss.
They happily announced - "We've found the unlucky bastard! He is employee number 7435. He has been with the company for 6 years. He is 35 years old. He makes $ 85,079.  Fire him and we'll be profitable again!"

The boss was very impressed. But he wanted to know how they came up with this solution.

Tom said - This is just a nested sorting problem.

Dick said - There is no need to sort at all! Its just a min-tuple problem. Takes O(n) to find the right answer.

Harry said - You are both trivially right. The most important insight is that this is a monoid!

"But what is a monoid ?" the boss asked.

"Well, you have a set with an associative plus and an identity. That's a monoid!", announced Harry.

Nobody understood anything, so Harry said - "Think of the employee database as a set of employees. I claim two employees can be added!"

"What do you mean ? How do you add an employee to another employee ?" they asked.

"Well", Harry said, "If you are asked to add two employees, return the guy who is older. But if both of them are the same age, then return the guy who has been with the company  the longest. But if both guys have been working for us for the same number of years, then return the guy with the lowest salary. But if they both earn the same as well, then just randomly pick one over the other"

"And how does all that work ?", they asked.

"Well, lets define an identity employee. If you add something to this identity, you get back the thing you added. So we create a dummy employee who has been with us for 0 years, so that everybody else has been here longer than the identity. Then the addition works out".

"OK. I still don't see how all this works", said the boss.

"Well, if we seed a catamorphism with the identity employee and aggregate over the set, we collapse to the unlucky employee 7435, who has been here the longest, is the oldest & makes the least money", Harry triumphantly announced.

Everybody stared at Harry, absolutely speechless.


Long story short:  Harry was immediately fired. Dick was made the CTO and Tom was appointed Dick's secretary. 

Bonus Problem: What happened to employee number 7435 ?
Solution at the very end of this post.
-
-
-
-


-
-

-
-

-
-

-
-

-
-

-
-
-

-
-
-
-
-
-
-

-
-
-

















Solution: Employee number 7435 was the boss! You can't fire the boss!! He took the least amount of salary home for tax avoidance purposes. The bulk of his earnings came from equity.
 
 
How would you best divvy up $1000 among 4 stocks using Scalding ?

Scalding solution in github

You think Scalding, and you think Scala code for word-count, page-rank, classic big-data MR jobs that benefit from distributed parallel computing on a Hadoop cluster. But this here ain't your grandfather's map-reduce :)

Problems of this nature were first formalized and solved by Dr. Harry Markowitz at my alma-mater the University of Chicago, way back in 1952! When Harry defended his PhD. thesis, his dissertation advisor MILTON FRIEDMAN is said to have remarked "This is not economics!". However, cooler heads prevailed & Harry became Dr. Markowitz!


Since then, Portfolio Selection has been studied to death - you could divvy up the $1000 under a second using the frontcon(..) function in Matlab, or using CPLEX, or AMPL. So then, why Scalding ? Well, the other day I was buttering a toast while my wife unwrapped a toy for our 5 month old. The toy needed 2AA batteries, and I needed a screwdriver to pry open the battery compartment. Well, I don't have a screwdriver lying around, so what did I do ?


Trick question! You see, I was buttering the toast, so I used the butter-knife to unscrew the battery compartment and thus disaster was averted and the kid got his toy. Moral of the story: Sometimes, you can use Scalding to do all sorts of weird math-y stuff its probably not intended to do. Its a lot of fun! Besides, trying age-old problems using shiny new tools like Scalding brings all sorts of insights to the table.


Lets define the problem formally: 
Divide $1000 among 4 stocks - Kroger, Abbott Labs, Dollar Tree and Monster Beverage, in the best possible way. 
Why these 4 stocks ? Well, you'll find out towards the very end of this article!


val cash = 1000.0 // money at hand
val error = 1 // its ok if we cannot invest the last dollar
val (kr,abt,dltr,mnst) = (27.0,64.0,41.0,52.0) // share prices
val stocks = IndexedSeq( kr,abt,dltr,mnst)



So the above Scala snippet simply says we have $1000 & 4 stocks priced at $27,$64, $41 and $52, sitting in a sequence.


Lets now invoke the weightedSum function in Scalding. This would give us a pipe containing all tuples that satisfy the constraint ie. all posible combinations that add up to $1000, within an error bound of $1.


weightedSum( stocks, cash,error).write( Tsv("invest.txt"))

Lets examine the content of the sink:
$ cat invest.txt 
0 0 13 9
0 1 0 18
0 1 19 3
0 2 1 16
0 2 20 1
0 3 7 10
0 4 8 8
0 5 9 6
0 6 15 0
0 8 3 7
0 9 4 5
0 10 5 3
0 14 0 2
1 0 6 14
1 1 12 8
1 2 13 6
1 3 0 15
1 3 14 4
1 4 1 13
1 5 2 11
1 6 8 5
1 7 9 3
1 8 10 1
1 11 4 2
1 12 5 0
2 0 18 4
2 1 5 13
2 1 19 2
2 2 6 11
2 3 7 9
2 4 13 3
2 5 14 1
2 6 1 10
2 7 2 8
2 8 3 6
2 9 9 0
3 0 11 9
3 1 12 7
3 2 18 1
3 3 0 14
3 4 6 8
3 5 7 6
3 6 8 4
3 9 2 5
3 10 3 3
3 11 4 1
4 0 4 14
4 1 5 12
4 2 11 6
4 3 12 4
4 4 13 2
4 5 0 11
4 6 1 9
4 7 7 3
4 8 8 1
4 12 3 0
5 0 16 4
5 1 17 2
5 2 4 11
5 3 5 9
5 4 6 7
5 5 12 1
5 7 0 8
5 8 1 6
5 9 2 4
6 0 9 9
6 1 10 7
6 2 11 5
6 5 5 6
6 6 6 4
6 7 7 2
6 10 1 3
6 11 2 1
7 0 2 14
7 0 16 3
7 1 3 12
7 2 4 10
7 3 10 4
7 4 11 2
7 7 0 7
7 8 6 1
8 0 9 8
8 1 15 2
8 2 16 0
8 3 3 9
8 4 4 7
8 5 5 5
8 9 0 4
8 10 1 2
9 0 2 13
9 1 8 7
9 2 9 5
9 3 10 3
9 6 4 4
9 7 5 2
9 11 0 1
10 0 14 3
10 1 1 12
10 1 15 1
10 2 2 10
10 3 3 8
10 4 9 2
10 5 10 0
11 0 7 8
11 1 8 6
11 2 14 0
11 4 2 7
11 5 3 5
11 6 4 3
12 0 0 13
12 1 1 11
12 2 7 5
12 3 8 3
12 4 9 1
12 7 3 2
12 8 4 0
13 0 12 3
13 1 13 1
13 2 0 10
13 3 1 8
13 4 2 6
13 5 8 0
14 0 5 8
14 1 6 6
14 2 7 4
14 5 1 5
14 6 2 3
14 7 3 1
15 0 12 2
15 2 0 9
15 3 6 3
15 4 7 1
15 8 2 0
16 0 5 7
16 1 11 1
16 4 0 6
16 5 1 4
17 1 4 6
17 2 5 4
17 3 6 2
17 6 0 3
17 7 1 1
18 0 10 2
18 4 5 1
19 0 3 7
19 1 4 5
19 6 0 2
20 2 3 4
20 3 4 2
21 0 8 2
21 1 9 0
22 0 1 7
22 1 2 5
22 2 3 3
23 0 8 1
23 3 2 2
23 4 3 0
24 0 1 6
24 1 7 0
25 1 0 5
25 2 1 3
25 3 2 1
26 0 6 1
26 4 1 0
27 1 0 4
28 3 0 1
29 0 4 1
34 0 2 0
37 0 0 0


So it looks like we have 169 unique ways to divvy up the cash. Lets take one of the 169 tuples. How about (5,4,6,7) ?
So 5 shares of Kroger @ $27 = $135
4 shares of Abbott Labs @ $64 = $256
6 shares of Dollar Tree @ $41 = $246
7 shares of Monster @ $52 = $468


135+256+246+468 = $1001


But is (5,4,6,7) the best-posible tuple among the 169 choices ?


Lets define the best: We want the tuple that has the smallest risk. The 60-day 4x4 correlation matrix for these stocks is given below: 

KR       ABT    DLTR   MNST
0.448  0.177  0         0.017
0.177  0.393  0.177  0.237
0         0.177  0.237  0.06 
0.017  0.237  0.06    0.19 

What this says is rather simple: We have the variances along the diagonal and the covariances elsewhere. So Kroger has a variance of 0.448, and the 60-day correlation between Kroger and Abbott is 0.177, and so forth. ( Compute 60-day correlations here. )


Given a vector of weights W, the net risk imposed by this matrix = W'CW
To define these matrices & perform matrix transposes & multiplications, lets use COLT.

import cern.colt.matrix.DoubleFactory2D
val data = Array(Array(0.448, 0.177, 0.0, 0.017), Array(0.177, 0.393, 0.177, 0.237), Array(0.0, 0.177, 0.237, 0.06), Array(0.017, 0.237, 0.06, 0.19))
val corr = DoubleFactory2D.dense.make(data)


corr: 
4 x 4 matrix
0.448 0.177 0     0.017
0.177 0.393 0.177 0.237
0     0.177 0.237 0.06 
0.017 0.237 0.06  0.19 


We now take each of the 169 tuples & compute their net risks. The tuple with the least risk is what we are after.

import cern.colt.matrix.linalg.Algebra
import cern.colt.matrix.DoubleFactory1D
import java.util.StringTokenizer

val alg = Algebra.DEFAULT
val file = scala.io.Source.fromFile("invest.txt").getLines.toList


// convert the tab-delimited weights to a row vector
def getWeights(s:String) = {  
     val stok = new StringTokenizer(s, "\t")
     val weights = (1 to 4).map( x=> stok.nextToken.toDouble).toArray
     DoubleFactory1D.dense.make(weights)
}

// compute risks per tuple and sort by risk
file.map(line=> { 
     val w = getWeights(line)
     ( line, alg.mult( alg.mult(corr,w), w))
}).sortBy(x=>x._2)

res11: List[(String, Double)] = List(
(1 0 6 14,56.776), 
(4 0 4 14,56.824), 
(2 1 5 13,57.544), 
'(4 1 5 12,58.552), 
(2 2 6 11,59.646),
.....
(37 0 0 0,613.312)



So the best possible way to invest your $1000 is the (1,0,6,14) tuple.
But if you want to buy atleast 1 share of each stock, (2,1,5,13) is the way to go.
Ofcourse, the absolute worst thing you could do is (37,0,0,0) ie. put all your money in 37 shares of Kroger. With this worst tuple, you get no diversification and take on 10 times the risk of the best tuple.


So ( putting on my Jim Cramer hat here) I am going to recommend the following portfolio mix:
2 shares of Kroger : $54
1 share of Abbott  : $64
5 shares of Dollar Tree : $205
13 shares of Monster: $676
Total : $1001

Go forth and invest!


Okay, enough of finance., back to CS. 
How does all this work inside of Scalding ?

The secret sauce is obviously the algorithm behind weightedSum(...). This is a homegrown sophisticated-brute-force algorithm that I conjured up out of thin air :)
Seriously, this was the very first Scalding algorithm I coded up, and I have not found any reference to this algorithm elsewhere in the literature, so I am going to claim credit for it :))

Lemme walk you thru my algorithm: Say you want all possible ways to purchase 3 stocks, whose prices are  ( $1, $2, $5), to sum to $10.

Make 3 pipes, one per stock -
Pipe 1 has $0, $1, $2, $3, $4, $5,$6,$7,$8,$9, $10
Pipe 2 has $0, $2, $4, $6, $8 , $10
Pipe 3 has $0, $5, $10

So I've just made pipes that hold multiples of the share price.
Now cross the pipes two at a time.
So crossing pipe1 and pipe2 gives us a pipe with 2 columns.
(0,0)
(0,2)
(0,4),
.....
(1,0)
(1,2)
(1,4)
....
.....
(10,10)


a total of 66 tuples. 
Now create a column that holds the sum of the two columns, and throw away tuples that exceed 10.
I call this "progressive filtering" 
This gets rid of bad tuples like (1,10), (2,9),(2,10),(3,8),....


Now take this pipe with good-tuples, and cross with the third pipe.
Once again, apply progressive filtering.
This gets rid of bad tuples like (1,9,5), (1,9,10),(2,8,5),(2,8,10),....


This combination of "cross-multiply with progressive-filter" is what makes weightedSum sophisticated.


But why is weightedSum(...) brute-force ?
Well, with 4 stocks , the smallest stock being the $27 Kroger, and $1000 at stake, we had 169 possible candidates.
Where did they come from ?
Well, 1000/27 ~  40.
40^4 = 2,560,000
So in the worst case, we had a total of 2.5 MILLION fourples....but we managed to narrow them down to 169 valid candidates. ( Note that due to Progressive Filtering, we probably generate less than 2.5 million fourples....under a million ballpark )

In general, the problem scales like O( (cash/minWeight)^numStocks )

So it is certainly exponential, and hence the brute-force aspect of it.
However, given Scalding's awesome distributed capabilities, and the parallelized nature of our algorithm ( remember - 1 pipe per stock, so n stocks lead to n pipes crossed & progressively filtered in parallel ), we perform quite well....in fact, this particular problem benchmarks under 2 seconds! Now, as the number of stocks grow, or if we have more cash to play with, this leads to exponentially more bad tuples...but unlike in Scala where we'd simply run out of memory unless we resort to Dantzig, Scalding will happily chug along & compute the best possible tuple !

Mystery: You might be wondering why we picked the 4 stocks - Kroger, Abbott Labs, Dollar Tree and Monster Beverages. One of several criticisms against Portfolio Selection using mean-variance optimization techniques is that it isn't stable. The variance of stocks aren't fixed, and if you run this same algorithm a month from now, the correlation matrix would have surely changed, and hence the tuples. That does not mean it's a useless algorithm. In fact, while in grad school I interviewed with BlackRock, who were using what was essentially a souped-up Markowitz optimization for a $100B fund with 100s of stocks in them. "So won't the prices and matrices change everyday ? " I asked. Well, yes! So Blackrock runs the algorithm throughout the day and rebalances the portfolio twice a day - at 11am and 2pm !!!

For this exercise, I used Google Stock Screener to pick 4 stocks with a low beta ( risk) and high average volume that had correlation data available in AIStockCharts.
The lowest possible betas with corr data were KR,ABT,DLTR and MNST. 
Mystery solved!

Scalding solution in github
 
 
Heavily inspired by Manuel Rotter's HackerNews post: Conways Game of Life in Clojure. You should definitely check that out.In fact, I intersperse his Clojure code with my Scala code, so you can compare & contrast the two.

Scala's green italics, Clojure's plain blue.


Fire up the Scala REPL.


Welcome to Scala version 2.9.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_04).
Type in expressions to have them evaluated.
Type :help for more information.
scala>


1. Define a function to create a rectangular world with live cells "X", dead cells "  ".
 

In Clojure:

(defn create-world
   [w h & living-cells]
  (vec (for [y (range w)]
         (vec (for [x (range h)]
                (if (contains? (first living-cells) [y x]) "X" " "))))))



A straightforward one-liner in Scala:

def cw (w:Int, h:Int, living:List[(Int,Int)]) = { List.tabulate [String] (w,h) ( (x,y) => if ( living.contains(x,y)) "X" else "  " )}

Note: create-world isn't a legit scala identifier 'cause the minus sign...so I've named it cw.

2. Create a 16 cell square with alives along the diagonals.



(create-world 4 4 #{[0 0], [1 1], [2 2], [3 3]}) =>
 [["X" " " " " " "] [" " "X" " " " "] [" " " " "X" " "] [" " " " " " "X"]]

cw (4, 4, List((0,0),(1,1),(2,2),(3,3)) )
List[List[String]] = List(List(X, " ", " ", " "), List(" ", X, " "," "), List(" ", " ", X, " "), List(" ", " ", " ", X))



3. Determine all the neighbors of a cell

(defn neighbours
  [[x y]]
  (for [dx [-1 0 1] dy [-1 0 1] :when (not= 0 dx dy)]
    [(+ dx x) (+ dy y)]))



def nbrs (x: Int,y: Int)=for (dx<-List(-1, 0, 1);dy<-List(-1,0,1) if (!(dx==0 && dy==0))) yield (x+dx, y +dy)

Note: If the above Scala syntax is unfamiliar, we're using sequence comprehensions with guards


What are all the neighbors of the location (1/1)? 
(neighbours [1 1]) => 
([0 0] [0 1] [0 2] [1 0] [1 2] [2 0] [2 1] [2 2])


nbrs(1,1)
List[(Int, Int)] = List((0,0), (0,1), (0,2), (1,0), (1,2), (2,0), (2,1),(2,2))



4.  Write a higher-order function that takes three arguments: A function to determine the neighbors, a predicate which determines whether a cell shall be given life, and one predicate that determines whether a cell survives to the next generation. 
It then returns a function which calculates the next generation according to the given rules. 

(defn stepper
  [neighbours birth? survive?]
  (fn [cells]
    (set (for [[loc n] (frequencies (mapcat neighbours cells))
               :when (if (cells loc) (survive? n) (birth? n))]
           loc))))



type nbr_type = (Int,Int)=>List[(Int,Int)] 
type birth_type = Int => Boolean 
type survive_type = Int => Boolean
type nextgen_type = List[(Int,Int)] => List[(Int,Int)] 


def stepper( nbrs:nbr_type, birth:birth_type, survive:survive_type): nextgen_type = {
    ((alive:List[(Int,Int)]) => {
    val dead = alive.map( c=> nbrs(c._1,c._2)).flatten.removeDuplicates.filterNot( x=> alive.contains(x))
    List(  (alive, survive), (dead, birth) ).map( x => 
    (x._1.map ( a=> (a,nbrs(a._1,a._2).intersect( alive ).size )).filter( a=> x._2(a._2)).map( a => a._1))).flatten
 })}


Usage: To get the  second state of a blinker pattern, we give our stepper the coordinates of the livings cells for the vertical position of the blinker ( --- )   

((stepper neighbours #{3} #{2 3}) #{[1 0] [1 1] [1 2]}) =>
 #{[2 1] [1 1] [0 1]}


stepper( nbrs  _, List(3).contains _, List(2,3).contains _)( List((1,0),(1,1),(1,2)))
res19: List[(Int, Int)] = List((1,1), (0,1), (2,1))



Voila !!! Identical results! And we're done!!!





But how does that piece of Scala code work ?!! 
Okay, this one's quite hard if you've never done this sort of thing before. So lemme dumb this down a bit: A higher-order function is a function that takes a function as input and/or returns a function as output. Most imperative languages either don't provide for higher-order functions, or do so in a convoluted fashion so nobody uses the facility. otoh, the hallmark of a functional language is the ease with which you can create higher-order functions.

Now, the stepper function is a higher order function takes three input functions and returns one output function.

1. The first input function must determine the neighbors of a cell, given the cell's x & y locations. The locations are in an Int tuple, and the neighbors would be a list of 8 Int tuples. 
So its type would be

type nbr_type = (Int,Int)=>List[(Int,Int)]

2. The second input function is a predicate. It must take in a list of neighbor counts, and based on the count, return a boolean if a cell comes alive. For example - Conway says "if a cell has three live neighbors it comes alive, as if by reproduction".
Given a neighbor count, the predicate would return true only if the count was equal to 3, by doing a check against the contains() function of List(3)

type birth_type = Int => Boolean


3. The third input function is a predicate. It must take in a list of neighbor counts, and based on the count, return a boolean if a cell survives another generation. For example - Conway says "Any live cell with 2 or 3 live neighbors lives on to the next generation".
Given a neighbor count, the predicate would return true only if the count was either 2 or 3, by doing a check against the contains() function of List(2,3) 

type survive_type = Int => Boolean


BTW: survive_type is identical to birth_type, so just use one of them, as I've done below.

4. The output function must calculate the next generation given the previous generation. A generation is simply a list of integer tuples. So the mapping is pretty straightforward.

type nextgen_type = List[(Int,Int)] => List[(Int,Int)]

BTW: You can omit nextgen_type because return types are inferred in Scala, and I've done so below.


So, how does the stepper function actually work ? 
Let's first define a dumb stepper. The dumb stepper simply returns the current generation unchanged as the next generation. The identity function, so to speak. 

def stepper( nbr:nbr_type, birth:birth_type, survive:survive_type): nextgen_type = {
    val nextgen = ((currgen:List[(Int,Int)]) => {currgen})
    nextgen
}


Now let's give it the desired smarts. 

Well, you give it a list of integer tuples that indicate the cells alive in the current generation ( alive )

 ((alive:List[(Int,Int)]) => 

The dead cells are the unique neighbors of the alive cells. We first get these neighbors, which is a list of lists, flatten it out & remove the duplicates and filter out the neighbors that happen to be alive.

val dead = alive.map( c=> nbrs(c._1,c._2)).flatten.removeDuplicates.filterNot( x=> alive.contains(x)) 

At this point you have a partition = two sets = the alive set & the dead set.

Given a live cell from the alive set, you find its live neighbors by doing an intersection of its neighbors-set with alive-set.
You count its live neighbors by invoking size.
You run this count by the survive predicate, and that tells you if this live cell survives another generation.

Given a dead cell from the dead set, you find its live neighbors by doing an intersection of its neighbors-set with alive-set.
You count its live neighbors by invoking size.
You run this count by the birth predicate, and that tells you if this dead cell is reborn in the next generation.

So you see, you do the same exact thing for both sets in the partition, except for one tiny difference. 
The alive set is checked against the survive predicate, and the dead cells against the birth predicate. 


So start out by bundling each set with its appropriate predicate.
 List(  (alive, survive), (dead, birth) ) 


Now iterate over this list, find the neighbors, do the intersect, count, count-check etc...
 List(  (alive, survive), (dead, birth) ).map( x => 
 (x._1.map ( a=> (a,nbrs(a._1,a._2).intersect( alive ).size )).filter( a=> x._2(a._2)).map( a => a._1))).flatten 


5. At this point, we're done! We could define yet another convenience wrapper to iterate over stepper...this last function just helps us to determine the evolution according to Conway's Game of Life by giving it the size of the world, an initial pattern and the number of the generation we want to see.

(defn conway
  "Generates world of given size with initial pattern in specified generation"
  [[w h] pattern iterations]
   (->> (iterate conway-stepper pattern)
        (drop iterations)
        first
        (create-world w h)
        (map println)))




The Scala version of this convenience wrapper is left as an exercise for the patient reader :)




in 17 lines of Clojure, courtesy Manuel Rotter :

  1. (defn create-world
  2.   [w h & living-cells]
  3.   (vec (for [y (range w)]
  4.          (vec (for [x (range w)]
  5.                 (if (contains? (first living-cells) [y x]) "X" " "))))))
  6.  
  7. (defn neighbours
  8.   [[x y]]
  9.   (for [dx [-1 0 1] dy [-1 0 1] :when (not= 0 dx dy)]
  10.     [(+ dx x) (+ dy y)]))
  11.  
  12. (defn stepper
  13.   [neighbours birth? survive?]
  14.   (fn [cells]
  15.     (set (for [[loc n] (frequencies (mapcat neighbours cells))
  16.                :when (if (cells loc) (survive? n) (birth? n))]
  17.            loc))))

in 10 lines of Scala, by Krishnan Raman

  1. def cw (w:Int, h:Int, curr:List[(Int,Int)]) = { List.tabulate [String] (w,h) ( (x,y) => if ( curr.contains(x,y)) "X" else "  " )}
  2. def nbrs (x: Int,y: Int)=for (dx<-List(-1, 0, 1);dy<-List(-1,0,1) if (!(dx==0 && dy==0))) yield (x+dx, y +dy)
  3. type nbr_type = (Int,Int)=>List[(Int,Int)] 
  4. type birth_type = Int => Boolean 

  5. def stepper( nbrs:nbr_type, birth:birth_type, survive:birth_type) = {
  6.   ((alive:List[(Int,Int)]) => {
  7.   val dead = alive.map( c=> nbrs(c._1,c._2)).flatten.removeDuplicates.filterNot( x=> alive.contains(x))
  8.   List(  (alive, survive), (dead, birth) ).map( x => 
  9.    (x._1.map ( a=> (a,nbrs(a._1,a._2).intersect( alive ).size )).filter( a=> x._2(a._2)).map( a => a._1))).flatten
  10.   })}
 
 
I invented/discovered/came-upon "Inverse Fizzbuzz"  while mucking about with Scala & Partial Functions.

While Fizzbuzz answers the question  "Can Shipper code  ? ", Inverse Fizzbuzz tells you "How well does Shipper code ?"

After posting the Hacker News piece,  Inverse Fizzbuzz was featured on a codegolf.

There were a dozen participants - with solutions in Ruby, Perl, Python, Scheme, PHP, Javascript, Groovy, and yes , Scala. After the golf ended, the results were somewhat bizarre. The Perl solution by tybalt89 won, with a score of 10,000.
Here's the winning entry - practically a one-liner.

#!perl -lp
s/\w{5}/E/g+y/F-z /F/d>8&&s/\B/ */g;$_=" F BF FB F E"x22~~/$_/&&1+"@-"." @+"

Now guess the solution that came in dead last ? Scala ! That bothered me a little bit. But what bothered me the most was the cheating :)

Cheating! What do I mean by that ?  Take a look at the test case
Here's what the outputs look like: look at the item in red.
9 72
3 3
3 12
3 45

6 10
5 6
3 96
12 102


12 15
5 5
3 93
15 177
3 27
5 33
9 115
9 48
9 87
15 18
9 30

9 180
6 9
15 15
9 135
12 18
12 96
10 15
3 55
-----

So if somebody told you that the largest fizzbuzz string mapped to 180, then you can cheat by simply generating all possibles upto/beyond 180, use memoization & simply compare the test input to what you've precomputed. That's what most of the entries had done.

So for example, the Perl solution has " F BF FB F E"x22 ie. 15 times 22 > 180
In the Python solution : see the number 182
In the Haskell solution : the number 171
In the Scala solutionSeq.tabulate(181)
In the Javascript solution : the number 182

I decided I wouldn't cheat. Instead I will solve Inverse Fizzbuzz from first principles, using a Finite State Machine => Regex  approach.
Note: I'm not interested so much in the numbers on the RHS as much as making sure if a sequence of strings actually corresponds to a valid "Inverse Fizzbuzz" . So the goal is to test language memberships. Number gymnastics can come later.

Inverse FizzBuzz - Given a list of strings , what's the shortest contiguous sequence of numbers that produces that list when you run fizbuzz ( over those numbers ) ?
( or in pictures, how do you go from stuff on the left to stuff on the right ? )

Say fizz = f, buzz = b, fizzbuzz = g

The finite state machine that produces the language of "Inverse Fizzbuzz" must have 8 states and must look like below

Its not too hard to see why.
"Fizz" qualifies - ergo, go from start state q0 via arc "f" to terminal state q1
"Buzz" qualifies - go from start state q0 via arc "b" to terminal state q2
"Fizz Buzz" qualifies -  go from start state q0 via arc "f" to  q1, then via arc "b" to terminal state q2.
"Fizz Buzz Fizz" qualifies - go from start state q0 via arc "f" to  q1, then via arc "b" to q2, then via "f" to terminal state q3.
Clearly "Buzz Buzz" does not qualify, so there's no way to produce "bb" via the machine below.
And so on and so forth... ( If you have a simpler FSM, please do email me )

Getting rid of the multiple terminal nodes will introduce lambda-transitions ( NFA-epsilons )
and you get the machine below:

Picture

Now you reform transitions - ie. put lambda-transitions between states with no transitions.
That produces the monster below:

Picture

Finally, the Collapse mode: get rid of the non-final and non-initial states systematically, and JFLAP leaves you with :

Picture

So then, lets turn to the trusted Scala REPL to try out the regex.

scala> val p = "f|b|fb|f|(b|fb)f|f|(f|(b|fb)f)f|b|(f|(f|(b|fb)f)f)b|f|(b|(f|(f|(b|fb)f)f)b)f|(g|(f|(b|(f|(f|(b|fb)f)f)b)f)g)(fbffbfg)*(f|fb|fbf|fbff|fbffb|fbffbf){0,1}".r
p: scala.util.matching.Regex = f|b|fb|f|(b|fb)f|f|(f|(b|fb)f)f|b|(f|(f|(b|fb)f)f)b|f|(b|(f|(f|(b|fb)f)f)b)f|(g|(f|(b|(f|(f|(b|fb)f)f)b)f)g)(fbffbfg)*f|fb|fbf|fbff|fbffb|fbffbf){0,1}

So p is the regex correponding to Inverse Fizzbuzz.
Lets get the matching engine out of p so we can try various language matches upon it.

scala> val m = p.pattern.matcher _
m: java.lang.CharSequence => java.util.regex.Matcher = <function1>

scala> m ("f") matches
res25: Boolean = true

scala> m ("b") matches
res26: Boolean = true

scala> m ("fb") matches
res27: Boolean = true

scala> m ("ff") matches
res28: Boolean = true

scala> m ("bf") matches
res29: Boolean = true

scala> m ("bb") matches
res30: Boolean = false

scala> m ("bff") matches
res31: Boolean = true

scala> m ("bfg") matches
res32: Boolean = true

scala> m ("g") matches
res33: Boolean = true

scala> m ("gb") matches
res34: Boolean = false

scala> m ("gf") matches
res35: Boolean = true

So that's that!
Inverse FizzBuzz as a regex =
"f|b|fb|f|(b|fb)f|f|(f|(b|fb)f)f|b|(f|(f|(b|fb)f)f)b|f|(b|(f|(f|(b|fb)f)f)b)f|(g|(f|(b|(f|(f|(b|fb)f)f)b)f)g)(fbffbfg)*(f|fb|fbf|fbff|fbffb|fbffbf){0,1}""

JFLAP FTW!
Picture
Postscript: The "Regular Expression Exploration" Tool produced by Microsoft Research consumes a RegEx and produces an automaton. When fed the Inverse Fizzbuzz Regex, it produced the alongside automaton with 81 states !!! ( Thanks to sureshv at HN

 
 
It is 2016, not 2012.

Nothing much has changed. 

After a series of embarassing public disasters involving OOP & mutable state, the world has finally realized the wisdom of John Hughes and switched to functional programming. Google has purchased Typesafe & Scala runs natively on Android. Not to be outdone, Apple has banned all iOS apps not coded in Haskell. The Windows ecosystem is now powered entirely by F#. 

Other than that, nothing much has changed.

The canonical Mr. Shipper graduates from UPenn and lands his first whiteboard interview at the big G.

Google: So Mr. Shipper, do you know FizzBuzz ?
Shipper: Are you kidding me ? This is 2016, not 2012. Everyone and their grandmother knows Fizzbuzz.
Google: OK, lets try a slightly modified fizzbuzz.

Problem: How do you map the list on the left to the list on the right ?
Shipper: So lets see. You are mapping some stuff on the left list to other stuff on the right list.
Google: Yeah
Shipper: But you aren't mapping every element, just the multiples of 3 or 5.
Google: Yeah 
Shipper: Looks like a partial function to me.
Google: Partial Function ? What's that ?
Shipper: Lets start with a predicate that defines who qualifies, 

scala> val isFizzBuzz = ((x:Int) => x%3 ==0 || x%5 ==0)

Google: Fine. So how do you use that ?
Shipper: I'll make another lambda that defines the mapping.

scala> val toFizzBuzz = ((x:Int) => (x%3,x%5) match {     
               case (0,0) =>   "fizzbuzz"
               case (0,_) => "fizz" 
               case _ => "buzz"     
          })


Google: OK. I see you know how to pattern match...
Shipper: Dude, this is 2016, not 2012 ! Everyone & their grandmother...
Google: Yeah, yeah. Got it. Now tell me what this partial function is ?
Shipper: The partial function just bundles together the predicate and the mapping. So

scala> val fizzbuzz = new PartialFunction[Int,String] {     
                  def isDefinedAt(x:Int) = isFizzBuzz(x)
                 def apply(x:Int) = toFizzBuzz(x)   
            }


Now I just collect the results over which this partial function is defined.

scala> (1 to 100).collect( fizzbuzz )

res1: scala.collection.immutable.IndexedSeq[String] = Vector(fizz, buzz, fizz,fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz,
buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizz,buzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz)

Google: Bravo! That is neat.
Shipper: Yeah!
Google: But tell me, Mr. Shipper, why do you need this partial function ? Can't you do something simpler ?
Shipper: You mean like map filter ?
Google: OK, lets try that...
Shipper: So I'd first filter with my predicate and then apply my mapping.

scala> (1 to 100).filter( isFizzBuzz ).map( toFizzBuzz )
res2: scala.collection.immutable.IndexedSeq[String] = Vector(fizz, buzz, fizz,fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz,
buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizz,buzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz) 

Google: Good! But there's yet another way to do this.
Shipper: Hmmm...
Google: Think Sets. How do you define a set in math ?
Shipper: You mean, like a list comprehension?
Google: You got it!
Shipper:

scala> for( x<- (1 to 100); if( isFizzBuzz(x))) yield toFizzBuzz(x)
res3: scala.collection.immutable.IndexedSeq[String] = Vector(fizz, buzz, fizz,fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz,
buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizz,buzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz) 


Google: Excellent! So you know how to do fizzbuzz atleast 3 different ways.
Shipper: Do I get the job ?
Google: Patience, Mr. Shipper. Patience! This is Google. We can't hire everybody who walks through the door and solves a fizzbuzz. This is 2016, after all. Everybody would get hired. 
Shipper: So..
Google: So we here at Google have come up with a new test. Its called the Inverse FizzBuzz.
Shipper: Inverse FizzBuzz !
Google: Good! Looks like you never heard of it.
Shipper: Never!
Google: Well, we just came up with it last week. But its not so hard. You know what an inverse is.
Shipper: Inverse...
Google: Yeah, like 5 plus what is 0 ?
Shipper: -5
Google: So minus 5 is ?
Shipper: Oh! Yeah. OK. Minus 5 is the additive inverse of 5.
Google: So what's the multiplicative inverse of 5 ?
Shipper: Hmmm. 0.2. Because 0.2 times 5 is 1.
Google: You got it! What's the inverse sine of a half ?
Shipper: 30 degrees, because sine(30) = 0.5
Google: Good. What's the inverse logarithm of a half ?
Shipper: The square root of e.
Google: Very Good! So lets talk about the inverse of a fizzbuzz.
Shipper: OK. How do you define Inverse FizzBuzz ?
Google: Very simple. Given a list, give me the shortest contiguous sequence of numbers that produces that list when you run fizbuzz 
Shipper : Huh ?!
Google: Inverse FizzBuzz - Given a list, what's the shortest contiguous sequence of numbers that produces that list when you run fizbuzz ?
Shipper: I don't follow.
Google: Lets do a few simple examples. What's the shortest sequence that produces { fizz }
Shipper: Hmm. { 1,2,3 }
Google: Come on Mr. Shipper. You can do better than that.
Shipper: OK. Its just {3}. 
Google: What's the shortest sequence that produces { buzz }
Shipper: { 5 }
Google: What's the shortest sequence that produces { fizz, buzz }
Shipper: { 3, 4, 5 }
Google: Nope!
Shipper: OK. Wait....its {9,10}
Google: Excellent! What's the shortest sequence that produces { buzz, fizz }
Shipper: { 5,6 }
Google: What's the shortest sequence that produces { fizz, buzz, fizz }
Shipper: { 3, 4, 5, 6 }
Google: What's the shortest sequence that produces { buzz, fizz, buzz }
Shipper: Hmmm. I don't think that list is valid...
Google: Good. So your program will have to detect that. Lets do a few more. What produces { fizz, fizz } ?
Shipper: { 6,7,8,9 }
Google: What produces {fizz, fizz, buzz }
Shipper: { 6,7,8,9,10}
Google: OK. So here's the examples we've seen so far. This is what your lambda must do.

Google: Now write a lambda for Inverse FizzBuzz !
Shipper: Well, the solution is not unique. The sequence {fizz} maps to { 3 }, but it could also map to {6}. Or {9} ...
Google: Yeah, well, I don't care about uniqueness. Just the shortest sequence.
Shipper: So what's the problem again ?
Google: Given a list, what's the shortest contiguous sequence of numbers that produces that list when you run fizbuzz ? 
Shipper: I honestly don't know.
Google: Well, what do you do when you don't know how to solve a problem ?
Shipper: Try everything. Brute force.
Google: Exactly!
Shipper: But brute force! That would be like an infinite number of input strings.
Google: Lets say we only care about the first 100 numbers. 1 to 100. Just like before. How many possible combinations on those ?
Shipper: Well, I could run fizzbuzz on 1. Then  {1,2} Then {1,2,3}, Then {1,2,3,4} etc. upto {1,2,....100}
Google: So that's a 100 combinations.Shipper: But then I could run fizzbuzz on 2. Then  {2,3} Then {2,3,4}, Then {2,3,4,5} etc. upto {2,....100} 
Google: So that's how many ?
Shipper: 99. Ok I got it. So its like 100 + 99 + 98 + ...1 = (101)*100/2 = 5050 combinations.
Google: So five thousand plus. That's a small number.
Shipper: Right. But we kill duplicates. And we only keep the mapping that produces the shortest sequence. 
Google: Yeah, so your final list will be much smaller than 5050.
Shipper: OK. I think I can do that quite easily.


scala> val all = (1 to 100).map(x=> (x to 100).map( y=> (x,y) )).flatten
all: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((1,1), (1,2),(1,3), (1,4), (1,5), (1,6), (1,7), (1,8), (1,9), (1,10), (1,11), (1,12), (1,13),
 (1,14), (1,15), (1,16), (1,17), (1,18), (1,19), (1,20), (1,21), (1,22), (1,23), (1,24), (1,25), (1,26), (1,27), (1,28), (1,29), (1,30), (1,31), (1,32), (1,33), (1,34), (1,35), (1,36), (1,37), (1,38), (1,39), (1,40), (1,41), (1,42), (1,43),......(2,2),(2,3),......

scala> all.size
res22: Int = 5050

Google: So we have the 5050 possible ranges.
Shipper: Yes. Now I just run fizzbuzz on every one of those 5050 ranges.


scala> val results = all.map( x=> (x._1 to x._2).collect( fizzbuzz ))

Shipper: Now I'll line up the inputs with the outputs.
scala> val io = all.zip( results )



Shipper: Now I'll group by the output. That should give me a mapping from the list of strings to a list of sequence of numbers.

scala> val inverses = io.groupBy( x=> x._2)


Google: Lets first see if you are on the right track so far. Count the keys in your mapping. It should be much smaller than 5050.


scala> inverses.keys.size
res36: Int = 302

Google: Good! Only 302 unique keys, not 5050. Now how would you find the smallest value the key maps to ?
Shipper: Lemme first test with "fizz".


scala> inverses(Vector("fizz"))
res28: scala.collection.immutable.IndexedSeq[((Int, Int), scala.collection.immutable.IndexedSeq[String])] = Vector(((1,3),Vector(fizz)), ((1,4),Vector(fizz)), ((2,3),Vector(fizz)), ((2,4),Vector(fizz)), ((3,3),Vector(fizz)), ((3,4),Vector(fizz)), ((6,6),Vector(fizz)), ((6,7),Vector(fizz)), ((6,8),Vector(fizz)), ((7,9),Vector(fizz)), ((8,9),Vector(fizz)), ((9,9),Vector(fizz)), ((11,12),Vector(fizz)), ((11,13),Vector(fizz)), ((11,14),Vector(fizz)), ((12,12),Vector(fizz)), ((12,13),Vector(fizz)), ((12,14),Vector(fizz)), ((16,18),Vector(fizz)), ((16,19),Vector(fizz)), ((17,18),Vector(fizz)), ((17,19),Vector(fizz)), ((18,18),Vector(fizz)), ((18,19),Vector(fizz)), ((21,21),Vector(fizz)), ((21,22),Vector(fizz)), ((21,23),Vector(fizz)), ((22,24),Vector(fizz)), ((23,24),Vector(fizz)), ((24,24),V...

Shipper: OK. That gives me all the ranges my key maps to. I just need to find the smallest range. Lets try minBy.
scala> inverses(Vector("fizz")).minBy(x=>x._1._2 - x._1._1)
res33: ((Int, Int), scala.collection.immutable.IndexedSeq[String]) = ((3,3),Vector(fizz))
Shipper: Hey, I think I got it!

scala> val answer = inverses.keys.map( k=> (k,inverses(k).minBy(x => x._1._2 - x._1._1)._1)).toMap

answer: scala.collection.immutable.Map[scala.collection.immutable.IndexedSeq[String],(Int, Int)] = Map(Vector(fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz) -> (3,33), Vector(buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz) -> (10,93), Vector(fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz, fizz, buzz, fizz, fizz, buzz, fizz, fizzbuzz) -> (15,75), Vector(fizz, buzz, fizz, fizzbuzz) -> (9,15), Vector(fizzbuzz, fizz, buzz, fizz, fizz, b...

Google: Lets check the answers to the 7 examples.
scala> answer(Vector("fizz"))
(3,3)
scala> answer(Vector("buzz"))
(5,5)
scala> answer(Vector("fizz","fizz","buzz"))
(6,10)
scala> answer(Vector("fizz","buzz"))
(9,10)
scala> answer(Vector("buzz","fizz"))
(5,6)
scala> answer(Vector("fizz","buzz","fizz"))
(3,6)
scala> answer(Vector("fizz","fizz"))
(6,9)

Shipper: Voila!
Google: Absolutely. Voila!

Shipper: So do I get the job now ?
Google: Well, you've used a brute force approach. Tell you what. Come up with a non-brute force answer & the job's yours.

Shipper: Thinking.....Thinking.....Thinking.....


PS: Its actually not that hard :)


So, to recap:
Question: Inverse FizzBuzz - Given a list, what's the shortest contiguous sequence of numbers that produces that list when you run fizbuzz ?

Brute force solution:
val all = (1 to 100).map(x=> (x to 100).map( y=> (x,y) )).flatten
val results = all.map( x=> (x._1 to x._2).collect( fizzbuzz ))
val io = all.zip( results )
val inverses = io.groupBy( x=> x._2) 
val answer = inverses.keys.map( k=> (k,inverses(k).minBy(x => x._1._2 - x._1._1)._1)).toMap