Performance is premature optimization


I will burn in hell, but performance is premature optimization nowadays. Despite it is very interesting from an engineering perspective, from the practical point of view of someone who wants to follow the make-shit-happen startup mantra, my advice is not to worry much about it when it comes to choosing a programming language.

There are things that matter more than the technology stack you choose. In this post I will try to explain why; then you can vent your rage in the comments section.

Get Shit Done

Your project is not Twitter

It is not Facebook either, and it probably won’t. I am sorry.

Chances of your next project being popular are slim. Even if you are so lucky, you app will not be popular from day one. Even if you are popular enough, hardware is so cheap at that point that it could be considered free for all practical purposes (around one dollar per day for a 1CPU/1GB machine; go compare that with our wages).

Your project will fail

Face it. You are not alone, most projects fail and there is nothing wrong with it. They fail before performance becomes an issue. I do not know a single project that has failed solely due to a bad choice of a programming language.

So I think that, as a rule of thumb, it is a good idea to choose the technology that allows you to try and develop small components faster (nodejs, is that you?). You will have time to throw some of those components away and rebuild their ultra-efficient alternatives from scratch in the unlikely case of needing it.


You are not going to need performance; stop worrying and get shit done instead. I always have a Moët Et Chandon Dom Pérignon 1955 on the fridge to celebrate the day I face performance issues due to choosing X over Y.

Hold on

Function parameters in Python, Java and Javascript


This is a short post about how these programming languages compare with each other when it comes to declaring functions with optional parameters and default values. Feel free to leave alternatives in other languages in the comments.

Python. The good.

Python is my favorite. Use your parameters in any order and define their default values as part of the function signature itself.

def foo(arg1, arg2="default"):
    print "arg1:", arg1, "arg2:", arg2

The price to pay is that you can not define two methods with the same name in the same class.

def sum(a, b):
    return a + b

def sum(a, b, c):
    return a + b + c

I am not a Python expert, but it does not seem such a big deal.

Java. The ugly.

Java is more verbose, but you have strong types and simple refactoring in exchange.

public void foo(String arg1) {
    foo(arg1, "default");

public void foo(String arg1, String arg2) {
    System.out.printf("arg1: %s arg2: %s", arg1, arg2);

Javascript. The bad.

Javascript is a little more ugly.

function foo(arg1, arg2) {
    arg2 = arg2 || 'default';
    console.log('arg1 %s arg2 %s', arg1, arg2);

This is real code we use in Instant Servers, to have an optional first parameter:

CloudAPI.prototype.getAccount = function (account, callback, noCache) {
    if (typeof (account) === 'function') {
        callback = account;
        account = this.account;
    if (!callback || typeof (callback) !== 'function')
        throw new TypeError('callback (function) required');

It is pure crap.

Hold on
No Comments

Give your configuration some REST


I have built a simple configuration server to expose your app’s configuration as a REST service. Its name is rest-confidence (github). In this post I will try to explain its basics and three use cases where it could be useful:

  1. To configure distributed services.
  2. As a foundation for A/B testing.
  3. As a simple service directory.

Install and run a basic rest-confidence configuration server

The first step is installing the configuration server:

git clone
cd rest-confidence
npm install

After that, you are ready to edit your config.json configuration file. For example:

  "mongodb": {
    "host": "localhost",
    "user": "root"
  "redis": {
    "host": "redis-server",
    "port": 6379
  "logging": {
    "appender": {
      "type": "file",
      "filename": "log_file.log",
      "maxSize": 10240

Launch the configuration server (npm start) and you are done. You are now ready to start retrieving the values associated with any key, in a hierarchical way:

# curl http://localhost:8000/logging/appender


# curl http://localhost:8000/logging/appender/maxSize

Use case #1: Configure distributed services

In my last post I wrote about why I like nodejs, a great platform for building micro-service-based architectures. However, these kind of architectures also come with their own drawbacks. One of them is that they are more difficult to deploy and configure.

Micro Service Architecture

Micro Service Architecture. Image courtesy of James Hughes

With a centralized configuration server such as rest-confidence everything becomes easier. Instead of configuring hundreds of settings on each component, you only need to configure the URL of your configuration server. Your service will go there to look up any configuration property it needs.

Use case #2: A/B testing

A/B testing is a simple way to test different changes to your application and determine which ones produce positive results.

As a simplistic example, imagine you want to test an alternative color for your blue sign-up button, and check how it affects the conversion rate. You can define a $filter with a $range limit in your configuration:

  "color": {
    "$filter": "random",
    "$range": [
      { "limit": 10, "value": "red" }
    "$default": "blue"

So when you retrieve the “color” property value using a random filtering criteria, you’ll get different colors depending on the ranges.

# curl http://localhost:8000/?random=5

And with a different filtering value out of the range you will get the default value.

# curl http://localhost:8000/?random=15

Use case #3: Simple service directory

You can use rest-confidence as a simple service directory, that is, a centralized server that facilitates dynamic location of other services’ endpoints, based on different criteria.

  "myservice": {
    "$filter": "env",
    "production": {
      "url": {
        "$filter": "country",
        "ES": "",
        "UK": "",
        "$default": ""
    "development": {
      "url": "" 

With some criteria applied (for example, env=production and country=ES) you will get the proper service endpoint, or any other information you need:

# curl http://localhost:8000/myservice?country=ES&env=production

I hope you find it useful. There is also a nodejs client. Contributions are welcome.

Hold on
No Comments

Why is node.js so cool? (from a Java guy)


I confess: I am a Java guy

At least I used to be. Until I meet node.js. I still think the JVM is one of the greatest pieces of technology ever created by man, and I love the Spring Framework, the hundreds of Apache Java libraries or the over-six-hundred-page books about JEE patterns. It is great for big applications that are created by many developers, or applications that are made to last.


But many applications today are not made to last. Sometimes you just want to test something fast. Fail fast, fail cheap, keep it simple… the “be lean” mantra, you know.

Moreover, open source has completely changed the way we build applications, moving from developing tons of code in monolithic applications to assembling small programs that use third-party components as middlewares (nosql databases, queues, caches).

Second confession: I hate(d) Javascipt

Yes, Internet Explorer 4 made me hate Javascript. So the first time I heard about node.js and server-side Javascript I felt a shiver down my spine. It got worse when I started to play with the unfamiliar continuation-passing style, the asynchronous callback hell did not take long to appear.

Node is Asynchronous

A simple pattern: function(err, result) {}

But the absence of rules does not necessarily has to mean chaos. In fact, there is one pattern in node.js: your callbacks will have two arguments; the first argument will be an error object, the second one will be the result. This is your contract with the platform and, more important, with the community. Stick with it and you will be fine.

Using such a popular programming language plus this simple convention is what makes it so easy to start working with node.js. It makes building small modules that work together with other developers’ modules surprisingly easy. This is why we have more than 50K modules in the npm registry. Most of them are probably worthless, but natural selection also applies here, and this evolutionary process is much faster than the Java Community Process (JCP).

With node.js I feel like a productive anarchist. I get shit done.

You should also read “Broken Promises“, “Why is node.js becoming so popular” (quora), and watch Mikeal Rogers’ talk on why is node so successful (24 min).

Hold on

Big teams are not agile in the digital world


The post today is not so technical. I have been thinking about why many big corporations, with almost unlimited resources, are not able to deliver top quality products and services. Why companies with a small fraction of resources create new products faster?

I have found several sociopsychological causes, most of them related with an aspect of human activity: working in a team.

Diffusion of responsibility

Diffusion of responsibility is a sociopsychological phenomenon whereby a person is less likely to take responsibility for action or inaction when others are present. Considered a form of attribution, the individual assumes that others either are responsible for taking action or have already done so. The phenomenon tends to occur in groups of people above a certain critical size and when responsibility is not explicitly assigned. (wikipedia)

This is a harmful situation, where everybody’s responsibility becomes nobody’s responsibility and tasks are just words instead of real actions.

Analysis paralysis

Analysis paralysis is the state of over-analyzing (or over-thinking) a situation so that a decision or action is never taken […] rather than try something and change if a major problem arises. (wikipedia)

The perfect is the enemy of good in most cases, and the opportunity cost of decision analysis tends to be higher than taking some risks and launching a sub-optimal product. LinkedIn founder Raid Hoffman said “if you are not embarrassed by the first version of your product you’ve launched too late”.

See also “Performance is premature optimization

Inertia and Groupthink

Inertia is the resistance of any physical object to any change in its motion (including a change in direction). In other words, it is the tendency of objects to keep moving in a straight line at constant linear velocity, or to keep still

Groupthink is a psychological phenomenon that occurs within a group of people, in which the desire for harmony or conformity in the group results in an incorrect or deviant decision-making outcome. Group members try to minimize conflict and reach a consensus decision without critical evaluation of alternative ideas or viewpoints, and by isolating themselves from outside influences. (wikipedia)

Do you remember the monkey banana and water spray experiment? It is hard to change the culture in a big corporation. It is not easy to innovate and disrupt when the main reason to keep doing something is that “we have always done it that way” or by coercion.

The Milgram experiment on obedience to authority figures was a series of social psychology experiments, which measured the willingness of study participants to obey an authority figure who instructed them to perform acts that conflicted with their personal conscience. (wikipedia)

Group intercommunication

The number of communication paths between a team of N people is N x (N – 1)/2. This means that time spent communicating (this includes meetings) increases exponentially while total productivity will only grow linearly.

I like the idea of “two pizza teams” coined by Jeff Bezos: if you can’t feed a team with two pizzas, it’s too large.

When you’ve got a small group, you don’t need to constantly formalize things. You communicate and you know what’s going on. If you have a question about something, you ask someone. Formalized rules, deadlines, and documents start to seem silly. Everyone’s already on the same page anyway (37signals)

Fear of failure

Atychiphobia is the abnormal, unwarranted, and persistent fear of failure. As with many phobias, atychiphobia often leads to a constricted lifestyle, and is particularly devastating for its effects on a person’s willingness to attempt certain activities. (wikipedia)

I can think of at least four consequences of this fear of failure:

  • Overengineering: instead of keeping a solution simple engineers tend to overcomplicate a solution with unneeded features, taking precautions to ensure not to be blamed if something goes wrong (see also “scale later”).
  • Deliberate bad choices: “no one gets fired for buying IBM”. This applies to technological choices, selection of partners and support contracts that are slow, expensive and with questionable usefulness.
  • Pessimistic attitude as a defense mechanism. If you put yourself in the worst scenario, from that point on everything would be better.
  • Fear to say no to authority figures.

Emotional contagion

Emotional contagion is a process in which a person or group influences the emotions or behavior of another person or group through the conscious or unconscious induction of emotion states and behavioral attitudes. (wikipedia)

A whiner is somebody who complains a lot. This attitude is really infectious, and it spreads a negative karma almost impossible to erradicate. It diminishes passion and chances of success: “whether you think that you can, or that you can’t, you are usually right”.

“Little Eichmanns” is a phrase used to describe persons who participate in society in a way that, while on an individual scale may seem relatively innocuous even to themselves, taken collectively create destructive and immoral systems in which they are actually. (wikipedia)


Excessive hierarchy is also dangerous. Too many hierarchical levels can stop or slow down decisions. Even making operative decisions that should take hours, take weeks.

Add more layers and employees will also stop feeling identified with the company. This is some kind of emotional detachment, workers do not think they can make significative contributions to the company, collective responsibility is lost, and problems in the company become someone else’s problems.

Somebody Else’s Problem is a psychological effect where individuals/populations of individuals choose to dissociate themselves from an issue that may be in critical need of recognition. Such issues may be of large concern to the population as a whole but can easily be a choice of ignorance by an individual. (wikipedia)

When roles are too much compartmentalized, some people stop being able to wear many hats. I think this is because they start feeling that doing some tasks or getting their hands dirty would mean a step back in their professional careers, or just discredit. This is completely different in a small company, and clearly makes a difference in terms of speed.

I like passionated, small, flat, focused teams that really embrace agile and self-organization. Bureaucracy can kill agility. Big groups of people can be destructive for innovation and adaptation if not properly managed. The problem is even worse if objectives are not aligned in the company, but I will write about it in another post.

Hold on
No Comments

Playing around with Meteor


I have been playing around with meteor, an open-source platform for building web apps. The result is a 200 LOC game ladder with a live demo.

The platform is built on top of nodejs, what is great. In my opinion, it is not yet ready for production environments, but I am really impressed with how fast you can create simple web applications with live page updates, automatic data synchronization and many other niceties I have never seen before in any other web framework.

ELO algorithm

There is an open issue with the ranking algorithm. I am looking for a javascript implementation of the ELO algorithm. I am waiting for your pull requests!

Hold on
No Comments

Deploy virtual machines on Instant Servers cloud with Java


Instant Servers is the infrastructure as a service (IaaS) system I have been working on during the last months in Telefónica Digital.

The service offers a public REST API (Cloud API) that is super simple to use. However, in this post I will show you how to manage your infrastructure using a Java client, without dealing with HTTP requests.

Build the Cloud API client

Man does not live by nodejs alone. There is an instantservers project at github you can easily clone and compile (pull requests are also welcome). In the future it will be published as a proper maven artifact, so you can skip this point.

git clone
cd ./instantservers/instantservers-api-client
mvn install

That will generate an instantservers-api-client-1.0.0.M1.jar library you can use in your own applications.

Deploy your first virtual machine

To deploy a virtual machine on Instant Servers cloud you only need to choose a name for the machine, a package that corresponds to the hardware configuration (cpu, mem, disk) you need, and a dataset that represents the image or template you want to use (i.e. ubuntu 12.04, mongodb, smartos, etc).

Let’s code speak.

package net.guidogarcia;


public class InstantServersExample {
    // there are several datacenters, I use Madrid "eu-mad" in this example
    private static final String CLOUDAPI_URL =

    public static void main(String[] args) throws Exception {
        CloudAPIClient client =
                new CloudAPIClient("username", "password", CLOUDAPI_URL);

        Machine machine = new Machine();

        Machine deployed = client.createMachine(machine);
        System.out.printf("Machine id is %s", deployed.getId());

You will notice that virtual machines are up and running in a matter of seconds. This is due to the fact that the virtualization is based on rock solid Solaris zones.

You will need a username and a password to authenticate API calls, but you can sign up for Instant Servers for free (machines are still not free but you can try it for something like 6 cents per hour).

If anyone is interested in other API operations or about cloud computing in general, leave a comment and I will be happy to write more posts about it.

Hold on
No Comments

Node.js running on my Raspberry Pi. A benchmark.


Few weeks ago I could not resist the temptation to buy a Raspberry Pi, the super-cheap 35$ computer that comes with 256MB of RAM and a ARM CPU running at 700MHz and fits in your pocket (more information in wikipedia).

Raspberry Pi (wikipedia)

See how nice it looks. I am more of a software guy, so the first thing I did was to install node.js (v0.6.19) develop the simplest web server you can create in node (5 lines, it simply returns a 200 HTTP response code without any contents) and put the beast to work.

var http = require('http');
http.createServer(function (req, res) {

The benchmark

I was interested in testing the number of requests per second the application was able to handle running on the Raspberry in the most optimistic scenario. After having some problems running httperf and autobench on Mac OS, I finally went with apachebench (ab), that can be used to do simple load testings.

These are the results of sending 5120 requests to the node web server, at different concurrency levels, using the following command:

ab -n 5120 -c <concurrency>

raspberry benchmark results

Additional information: Each concurrency level has been executed three times from my laptop and using a wifi connection; the graph shows the average value. The Raspberry Pi was running the Raspbian “wheezy” image (downloads).

Open points

Almost 200 requests per second in this non real world application that does nothing. It is not bad, enough to develop and try the ideas I have in mind. To be honest, I still do not know why the performance drops so much when the concurrency is 512, or which part (my laptop vs the raspberry) is the bottleneck and why. Any ideas?

I have to measure other aspects like CPU and memory usage. In a quick glance, it seems that the CPU quickly goes over 90% usage even with small concurrency leves. I still appreciate this piece of hardware, but in the future I will try to overclock the processor. The memory was under 10%, what is not strange in this simple application.

I am also waiting for the Java Virtual Machine, that is supposed to be included in the default file system in future releases, to repeat the benchmarks (and probably see how it eats the memory).

It seems interesting, from a research point of view, to build a cluster and see how it scales. Donations for this purpose are highly appreciated :)

Hold on

Analysis of variance (ANOVA) applied to fraud detection


Fraud detection is a topic applicable to many sectors (financial, insurance, etc). The method explained in this post is applied in the market research field by Gather Precision (a great market research tool developed by Gather Estudios, BTW), as an early signal to detect frauds in opinion polls.

Imagine you have four field workers (Peter, John, Mary, Ann) taking surveys on the street. They spend different times gathering the data, and we want to discover if there are significant differences on the average levels. That would mean that at least one of them is taking too few or too much time completing the surveys.

Field worker = { Times in seconds for each survey he completes }
Peter = { 150, 200, 180, 230, 220, 250, 230, 300 }
John  = { 200, 240, 220, 250, 210, 190, 240 }
Mary  = { 100, 130, 150, 180, 140, 200, 110, 120 }
Ann   = { 200, 230, 150, 220, 210 }

This is one case where ANOVA comes to the rescue. According to wikipedia, in its simplest form, “ANOVA provides a statistical test of whether or not the means of several groups are all equal”, and that is exactly what we are looking for.

We can use Apache Commons Math to perform some statistical tests. It is an interesting Java library, not really focused on statistics, but pretty easy to use and that luckily contains ANOVA.

import java.util.ArrayList;
import java.util.List;

import org.apache.commons.math.*;

public class FraudDetector {
    private static final double SIGNIFICANCE_LEVEL = 0.001; // 99.9%

    public static void main(String[] args) throws MathException {
        double[][] observations = {
           { 150.0, 200.0, 180.0, 230.0, 220.0, 250.0, 230.0, 300.0 },
           { 200.0, 240.0, 220.0, 250.0, 210.0, 190.0, 240.0 },
           { 100.0, 130.0, 150.0, 180.0, 140.0, 200.0, 110.0, 120.0 },
           { 200.0, 230.0, 150.0, 220.0, 210.0 }

        final List<double[]> classes = new ArrayList<double[]>();
        for (int i=0; i<observations.length; i++) {

        OneWayAnova anova = new OneWayAnovaImpl();
        boolean rejectNullHypothesis =
                    anova.anovaTest(classes, SIGNIFICANCE_LEVEL);

        if (rejectNullHypothesis) {
            System.out.println("Significant differences were found");

The question I asked in stackoverflow includes some discussion and additional information about how to determine the rare cases.

One final thought. Despite I am not an expert in this field, I definitely think we should study more about statistics at the University.

Hold on

My experience in the 2nd Tuenti Programming Challenge


This week I have been participating in the 2nd edition of the Tuenti Programming Challenge. I felt a little rusty on my return to top-level competition, but despite I started the competition three days late, I was able to reach level 14 (stats). Not so bad.

The problems I prefer are those with an obvious brute force solution, but that can take advantage of a particular algorithm or data structure. Most of the problems in this edition belong to this category except, perhaps, the challenge 12 which I did not like because I found it too much tricky.

The crazy croupier

The challenge I enjoyed the most was the number 13, the crazy croupier. It is kind of a classical problem with minor variations, where you have to determine how many shuffles you need in a deck of N cards in order to come back to the original position if you cut it at the position L.

The easy –brute force– solution is to iterate and count until the cards are in their original positions, which can be a time-consuming task when the number of the cards is big (up to 10^6 in this case).

The second approach is to convert it to a permutations problem. Once you know where the card 1..N is located after the first shuffle, you can determine the number of shuffles each individual card needs to come back to its location. The least common multiple of the individual results is the total number of shuffles needed.

Show me the code

Here it is (download). I do not know how many participants chose Java to solve the problems. What I have seen so far are solutions in Python (here, here) that seem more compact and PHP (here). It is a pity that the official ranking does not show the execution times :)

 * Crazy Croupier - 2nd Tuenti Challenge - 12
 * Example:
 * Number of cards: N = 10
 * Number of cards in the first set: L = 3
 * Cards: 10 1 4 2 8 5 6 7 3 9
 * First set: 10 1 4
 * Second set: 2 8 5 6 7 3 9
 * Shuffled set: 4 9 1 3 10 7 6 5 8 2
public static void main(String[] args) {
  Scanner scanner = new Scanner(;

  // read number of cases
  int cases = Integer.parseInt(scanner.nextLine());

  for (int i=1; i<=cases; i++) {
    // read N L, for example 10 6
    String line = scanner.nextLine();
    String[] arguments = line.split(" ");

    // N = cards in the deck
    int N = Integer.parseInt(arguments[0]);

    // L cards in the first bunch (where to cut the deck)
    int L = Integer.parseInt(arguments[1]);

    long result = processCase(N, L);
    System.out.printf("Case #%d: %d\n", i, result);

 * Returns the number of shuffles required to return the deck
 * to its original order.
 * The algorithm will calculate the number of iterations that
 * each individual card need to come back to its position. The
 * solution will be the least common multiple (lcm) of the
 * individual results.
private static long processCase(int n, int cut) {
  int[] deck = new int[n]; // first deck shuffling result

  shuffleDeck(n, cut, deck);

  // cache source -> target positions for O(1) access
  final Map<Integer, Integer> permutations = new HashMap<Integer, Integer>();
  for (int i=0; i<deck.length; i++) {
    permutations.put(deck[i], i+1);

  // cache to avoid multiple lcd calculations of the same num, O(1) access
  Set<Long> calculatedLcm = new HashSet<Long>();
  long lcm = 0;
  for (int i=0; i<deck.length; i++) {
    long numberPermutations = 1; // we already did the first shuffling
    int currentPosition = i+1;

    // still no at the original position
    while (currentPosition != deck[i]) {
      currentPosition = permutations.get(currentPosition);

    if (calculatedLcm.contains(numberPermutations) == false) {
      lcm = lcm(lcm, numberPermutations);

  return lcm;

 * Shuffles the deck one time according to the algorithm.
private static void shuffleDeck(int n, int cut, int[] deck) {
  int min = Math.min(cut, n-cut); // number of cards shuffled

  // shuffle two bunch of cards
  for (int i=0; i<min; i++) {
    deck[2 * i] = cut - i;   // 1st bunch
    deck[2 * i + 1] = n - i; // 2nd bunch

  // put the rest of the cards in proper order, at the end
  for (int i=0; i<n - 2 * min; i++) {
    if (cut >= n / 2) { // first bunch is bigger
      deck[n - 1 - i] = i + 1;
    } else { // second bunch is bigger or equal
      deck[2 * min + i] = n - min - i;

 * Least common multiple. Probably a more efficient approach can be
 * found but it is good enough.
private static long lcm(long a, long b) {
  if (a == 0 || b == 0) {
    return Math.max(a, b);
  return a * b / gcd(a, b);

 * Greatest common divisor, see also {@link BigInteger#gcd(BigInteger)}
private static long gcd(long a, long b) {
  long mod = a % b;
  return mod == 0 ? b : gcd(b, mod);

What do you think about it?

Talent is out there

I like these competitions (Google Code Jam, ACM ICPC, etc) a lot and I think it is a great (and cheap) opportunity for technological companies to attract and recruit talent. There are a lot of great coders hidden out there.

Hold on