There was a whiteboard, a laptop & a notebook, so they could use whatever makes them comfortable.

-------

Tom's interview: The chief data scientist John Doe (JD) walked in.

JD. Lets talk about groceries.

Tom. Ok

JD. So you walk into a grocery store with a grocery bag and some cash, to buy groceries for a week.

Now, a couple problems -

1. your bag can hold ten pounds.

2. You have $100

3. You need about 2000 calories a day, so a weekly shopping trip is about 14,000 calories.

4. You must purchase at least 4 ounces of each grocery item.

Here's your dataset -

--- Calories Per Pound ---

Ham, 650 cals,

Lettuce, 70 cals

Cheese, 1670 cals

Tuna, 830 cals

Bread, 1300 cals

----

---- Price Per Pound ----

Ham, $4

Lettuce, $1.5

Cheese, $5

Tuna, $20

Bread, $1.20

----

Take your time, and list the number of ways you can buy your groceries.

JD walked out of the room.

Tom thought for a while.

Then he grabbed the laptop, opened up his favorite editor & wrote some code. When he was done -

JD: Very good! So how many solutions could you find ?

Tom: 164

JD: So of the 164 solutions, which is the most expensive way to shop ?

Tom: That's easy!

JD: Great! So you end up spending $82

Can you tell me what's the least fatty way to eat.

Tom: Hmm.. I guess I would cut down on the cheese,tuna & ham.

So that gives you 4.75 pounds of bread with roughly the same amount of cheese & only costs $35. Interesting!

Ok, so Tom.

Here's what I like about you.

You came in, you had absolutely no idea what I was going to ask.

I posed a problem.

You wrote a little bit of code & got actual results.

Your results were correct & interesting & you didn't fumble around with syntax or anything.

In about 45 minutes with some 45 lines of code, you actually have something interesting!

Tom: Thanks! I code every single day, so coding is never a problem.

This is just a for-comprehension, so it wasn't too hard.

JD: You code every single day ?

Tom: Yes. Always Be Coding.

JD: When do you get time to learn ? You know, math, machine learning, statistics, these things take a large block of free time to learn. If you are always coding, when do you learn new material ?

Tom: Uhhh... well, I am a self-taught developer. I learn by writing code. If I need to learn something, I will write some code. That's how I learn.

JD: So you can't pick up a math textbook & solve a few problems with paper & pen...

Tom: What is this, high school ? No, I generally write code.

JD: Sometimes we need to research. Read CS papers, mine the literature, most of that is just reading through the math, working out proofs

Tom: That's not a right fit for me personally. I like to code. If you give me a well-specified algorithm, I can code it up & make it efficient.

JD: Hmmm...ok. Say instead of ham,lettuce,bread,cheese,tuna, you know, instead of 5 items, you had a real grocery shop. Very conservatively, 1000 items.

Tom: ok

JD: OK ? How would you proceed in that case ?

Tom: What's wrong with my for-comp ?

JD: Well, you are brute-forcing the solution, that works well for 5 items but not 1000.

Tom: So it will just take a long time.

JD: That isn't a problem ?

Tom: No, we'll like, use map-reduce or something. Parallelize it.

JD: Are you seriously suggesting you can solve the brute-force version of this problem for 1000 items in some reasonable amount of time ?

Tom: You can't ? Why not ? Processors are quite fast.

JD: I don't think you quite understand. When you pick the ham, going from four ounces to ten pounds, how many choices do you make ?

Tom: Uhhh...

scala> (10-0.25)/0.25

res16: Double = 39.0

JD: So including the initial 0.25, 40 choices.

Tom: OK, so ?

JD: So with 40 choices per item, for 5 items, how many choices do you have ?

Tom: Hmmm...

scala> printf("%.2f", math.pow(40,5))

102400000.00

JD: See ? 102 million!

Tom: I see. So with 1000 items I'd have like...

scala> printf("%.2f", math.pow(40,1000))

Infinity

JD: Ha ha ha ha ha ha ha ha!

Tom: Oops! Did not see that coming. It actually said Infinity on the REPL!

JD: Ha ha ha! Yeah. So what will you do in this case ? Ha ha ha!

Tom: I don't know man. There's probably some fancy math to solve it. I see where you are going. But its not my cup of tea. I ship code. I'm not an R&D guy. I don't read papers or do serious math. Thank you.

JD: Thank you Tom. We'll ...uhh... get back to you.

Tom: Yeah whatever...

------

Dick's interview:

The chief data scientist John Doe (JD) walked in.

JD. Lets talk about groceries.

Dick. Why ?

JD. Uhhh...that's like what we do here. We are a grocery startup. So you walk into a grocery store with a grocery bag and some cash, to buy groceries for a week.

Now, a couple problems -

1. your bag can only hold ten pounds.

2. You only have $100

3. You need about 2000 calories a day, so...

Dick (interrupting...) I don't generally engage in algorithm pissing contests.

JD: I don't understand. These are the sort of problems we work on in this company...

Dick: Yeah, but lets talk about the real deal. Why am I here ?

JD: Real deal ?

Dick: You know, Hadoop. Data volume. Cluster size. Number of mappers & reducers. Setting up the data pipeline. You know, Thrift vs Protobufs. Tooling. Telemetry. Build systems. The dev process here. How do you guys write unit tests ? What's your logging system ? How do you do A/B tests ? Building Analytics Dashboards. Shipping. Code Reviews. You know, lets talk real life. Not some stupid grocery algorithm.

JD: Uh...Uhhh...You don't believe writing a little bit of code is a good idea ? Even pseudocode will do. I can help you get started...

Dick: Dude. Obviously there is some magic function that will spit out the right amount of groceries or whatever. It isn't something I can come up with in the span of this interview. It will take several man-months of coding and lots of unit tests to perfect.

JD: No, no, I assure you we can solve a toy version in this interview.

Dick: I don't play with toys. I work on distributed systems in 100+ node clusters. I'm a Cloudera certified data scientist. I've given keynotes at 3 different Big Data conferences...

JD: Ok Dick, we'll keep in touch. Thank you.

Dick: ???!!

-----------------

Harry's interview.

The chief data scientist John Doe (JD) walked in.

JD. Lets talk about groceries.

Harry. Suuuure.. I though this was data-analytics work.

JD. Yes, it is. We are a big data grocery startup. So you walk into a grocery store with a grocery bag and some cash, to buy groceries for a week.

Now, a couple problems -

1. your bag can hold ten pounds.

2. You have $100

3. You need about 2000 calories a day, so a weekly shopping trip is about 14,000 calories.

4. You must purchase at least 4 ounces of each grocery item.

Here's your dataset -

--- Calories Per Pound ---

Ham, 650 cals,

Lettuce, 70 cals

Cheese, 1670 cals

Tuna, 830 cals

Bread, 1300 cals

----

---- Price Per Pound ----

Ham, $4

Lettuce, $1.5

Cheese, $5

Tuna, $20

Bread, $1.20

----

Take your time, and list the number of ways you can buy your groceries.

Harry picked up the notebook & took out a pencil from behind his ear and quickly wrote some equations.

He showed JD his work -

X = (h l c t b ...) = 1xn vector

A = [ (1,650,4), (1,70,1.5), (1,1670,5),...] = nx3 matrix

B = (10 14000 100) = 1x3 vector

Solve XA = B

Now perturb A so that you get a solution vector at every single price point and weight.

JD: What's n ?

Harry: Obviously I'm assuming there will be you know, 1000s of items in the real-life grocery store. n is the number of grocery items.

JD: Very good! Ok, and why do I need to perturb A ?

Harry: Well, grocery prices will change every day. Sometimes even on the same day I've noticed people charge you more during the peak shopping hours. So the price column in A changes.

JD: Your insights about the real-life version of the problem are very good! But I'm afraid solving XA = B is the wrong approach. You won't get a list of ways to shop, just 1 way. Maybe not even that.

Harry: (thinking...)

OK, this matrix setup assumes exact solution ie. the weights add up to 10 pounds exactly and the prices add up to $100 exactly. In reality, the prices must be anything below $100 and the weights add up to anything below 10 pounds. So you are going to have a whole range of B's.

JD: True, but you don't know any of those B's ahead of time. So this approach won't get you anywhere.

Harry: Ok, ok, lets not do Gauss-Jordan. Lets do Dantzig.

JD: Good!

Harry: We'd need to buy an industrial strength QP solver. You know, a CPLEX or a GAMS. Or we could implement one inhouse using NAG.

JD: Yes, eventually. But right now, can you code up a solution using a Simplex solver of your choice ?

Harry: Sure. I'll use the open source Apache math library. That has a good Simplex Solver.

JD: Go ahead! There's some code on the laptop from a previous applicant. You can use that if you want.

JD left the room.

Harry took a full hour, and when he was done -

JD: So can you walk through the code real quick ?

Harry: Sure.

Basically, I construct a cartesian product of all the groceries using a for-comprehension, on line 25. That's a pretty large set.

I then apply ten constraints to that set.

The first constraint is that you only have $100. So that's your objective function on line 30.

The second constraint on line 31 simply says, don't look for negative weights, because, you know , you can't buy minus half pound of something.

The third & fourth constraints on line 32-33 restrict the calories to be about 14000.

The fifth constraint on line 34 says the sum of weights of all the groceries can be at most 10 pounds.

JD: What about the other five ?

Harry: Suppose you want to restrict ham to at least x pounds. Then you can build up a vector <1,0,0,0,....> where ham has a 1 and all the other groceries are a 0. So if you take the dot product of this vector with the weight vector <x,y,z,w,...>, the dot product is simply x. So you can build a constraint for the minimum amount of ham. I do the same for the other groceries. So that's lines 36-39.

JD: What do you do with these ten constraints ?

Harry: Well, you simply feed your constraints into a Simplex solver on line 43 & hail Mary. Hopefully it spits out a solution. Of course, there might not be any feasible solutions in many cases, so then it would throw an Exception, which we handle on line 44.

JD: I see. So how many solutions could you come up with ?

JD: Hmmm. 599 solutions. Seems much more reasonable. So you have a lot of experience working on quadratic programming ?

Harry: Not really. I don't like to program.

JD: What ?

Harry: I mostly read. I read a lot of ML literature....

JD: So you never code ?

Harry: Well, I write some code once in a while. To prove some results. But I don't, like, ship software, or use git, or whatever.

JD: So you don't know git

Harry: I know the name.

JD: Ha ha!

Harry: Well, I generally don't use a computer. I'm mostly a paper-pencil guy. Sometimes one needs to write code, and I get that. But its not my favorite part of the job.

JD: You understand that this is a programming role, though we do a lot of data science.

Harry: Yes, but data scientists aren't necessarily programmers. I don't particularly like to write code.

JD: Hmmm...

------------

Aftermath:

JD & the company's CEO had a huddle regarding the candidates -

JD: In a good economy, we would make offers to all 3.

CEO : All 3 ? Even Dick ?

JD: Well, Dick was, generally, being a dick :) But yes, if we were a later-stage company, we would need people like Dick who can build robust data pipelines and not worry so much about QP algorithms. But right now, we need somebody like Harry who can do the hard work of model building and perfect our secret sauce. And we need somebody like Tom, who can ship code really fast.

-------------

Conclusions: The company made a very generous offer to Harry. The company also asked Tom if he'd consider a junior role where he would be in a position to learn data science with a team of seasoned ML practitioners, while he helped them deploy & productionize their models.

Negotiations are in progress...