From An Experiment Hater to Lover

Writing weekly record made me tired. So I turn to only write my thoughts and inspiration here. Doing experiments always seems to be a nightmare for me. What changed my view in the end?

Though these weeks, surprisingly most of my time was spent on doing experiments. Actually, as I become more and more familiar with the codebase, implementing ideas in the system becomes faster and faster (1–2 days).

To be honest, I used to be an experiment-hater (not only in computer science). As for chemistry, I was choked by the gas produced by Potassium permanganate. As for biology, I still remember I failed to observe any cells under the microscope in an important exam. As for physics, following all the steps carefully, but I was still the one who last finished the experiment (failed for so many times). Experiments always seem to be a nightmare for me.

For computer science, for a long time, I still prefer writing codes instead of doing experiments. Whenever my mentor asks me to run experiments, I will always feel a sense of tired from my heart. “Come on, I don’t want to do that boring stuff.”

Things begin to change.

One day, Wail asked me to do a benchmark for the idea I have implemented. “Fine, I’ll do it”. It sounded like an easy and boring task. I wrote several queries which would serve as the benchmark casually and run them on my mac. The result was not bad. There is a minor improvement on a small dataset. After that, I was ambitious to try larger dataset and more partitions. The speed-up ratio will certainly grow!

Several days later, I got the access to the cluster (4 small blue boxes lie in Mike’s office indeed). I could not control myself to ssh it immediately and set everything ready. I distributed the data to 4 partitions which means there would be 4 processes run together.

Server check, data check, script check.

Then HERE WE GO!

Guess what, it is even slower than my mac despite its much better hardware. What’s worse, my new idea had no improvement compared to the master branch. Far from what I have expected, I fell into depression. But I believe there must be a reason for the wired phenomenon. And the motivation to find that reason made me stand up.

After cooling down, I began to think about the reason. Different results must result from different settings(as long as the codebase is the same). What is the difference between my mac and the cluster? It suddenly hit me that the partition number is different! On my mac, I put all the data into one partition while on the cluster, I distributed them into four partitions. I had a strong feeling that it is what made things different! That is interesting like what you trust most actually betrays you.

Going through more experiments, I finally found the reason for the wired phenomenon. It is because multiple partitions will bring IO competition (different partitions share the same cache and memory budget). Since I run on the hard disk, IO is the bound. That’s why when it comes to multiple partitions, performance becomes worse.

That’s why I always think computer science is beautiful. It is that everything has a reason. No matter how strange is the result, no matter how far it is from your expectation, there is a reason. Think about the experiment variables, think about the bottleneck of the program, the answer is always there.

The happiest and hardest part of the research is to find a reason through all the winding paths. Research is all about reasoning.

I finally turn from an experiments hater to lover. Experiments confuse me at first, then they inspire me, and finally brighten me.