« Back to home

Is Go Good for Parallel Programming?

Go is a language designed for better concurrency. Although concurrency is not really parallelism, the Go's approach also makes it suitable for parallel programming. But just how exactly suitable is it in term of performance? Out of curiosity, I've done 2 simple experiments to access the performance of parallel handling in Go.

How parallel programming in Go works

Before getting to the experiment, let's take a look at the Go's approach to parallelism. Go has Goroutines - concurrently executing functions multiplexed across system threads. By default, a Go program uses only 1 system thread for all of its Goroutines, but that number can be easily changed by setting GOMAXPROCS.

Goroutines synchronize with each other via channels, a concept adopted from Communicating Sequential Process (CSP). This resembles message passing in frameworks like MPI and is the key behind parallel programming in Go. However, unlike MPI that follow distributed memory model, all Goroutines of a program share the same memory space. This allows channel communication to be more lightweight but makes the language not very suitable in distributed environments.

1st Experiment: Partitioning

This experiment is set out to measure the speedup obtained by adding more processors in a parallel computation in Go. We're gonna work on a simple theoretical problem of counting from 0 to N when N is a large number. We're not interested in the actual execution time but the speedup with respect to the number of processors. The code is as follow:

...
func count(start uint64, end uint64) uint64 {...}

func main() {
  ...
  runtime.GOMAXPROCS(np)
  c := make(chan uint64, np)
  var result uint64 = 0
  ...// start timing    
  for i := 0; i < np - 1; i++ {
    var start uint64 = uint64(i) * N / uint64(np)
    var end uint64 = uint64(i + 1) * N / uint64(np)
    go func(c chan uint64, i int, start uint64, end uint64) {
      var myresult uint64 = count(start, end)
      c <- myresult
    }(c, i, start, end)
  }

  // Main thread does final calculation and collect results
  result += count(uint64(np - 1) * N / uint64(np), N)
  for i := 0; i < np - 1; i++ {
    result += <- c
  }
  ...// end timing
}

The code is run on an Intel 2.8GHz Processor with 16 CPUs. I only measure the speedup up to 8 CPUs for various problem sizes. Below is the result:

image

This shows that given a large enough problem size to be partitioned, the Go runtime can make full use of the available processors to achieve nearly perfect speedup.

2nd Experiment: Communication

This experiment compares the communication overhead between Go and MPI. The program is a simulation of a master-slave architecture in which one process sends out computation tasks to its workers. This model is prominent in systems in which the computation is not known beforehand (e.g. a server serving real-time requests). The performance of the system depends on the consistency of the master when the number of parallel workers increase. In the implementation, the amount of work is kept consistent as the number of processors increase and the performance is evaluated based on the speedup attained.

Go:

…
func main() {
  …
  runtime.GOMAXPROCS(np)
  work_c, done_c, finish := make(chan int, np - 1), make(chan int, np - 1), make(chan int)
  n := 100 // amount of work to be sent out

  // Master Goroutine
  go func() {
    ...// start timing
    for i := 0; i < np - 1; i++ {
      work_c <- 1
    }
    for {
      <- done_c
      n --
      if n > 0 { work_c <- 1 } else { finish <- 1 }
    }
    ...// end timing
  }()

  // Slave Goroutines
  for i := 0; i < np - 1; i++ {
    go func(id int) {
      for {
        <- work_c
        time.Sleep(100 * time.Millisecond) // simulate computation
        done_c <- 1
      }
    }(i)
  }

  <- finish
}

MPI:

...
int main(int argc, char* argv[]) {
  int rank, size;
  MPI_Init (&argc, &argv);
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);
  MPI_Comm_size (MPI_COMM_WORLD, &size);
  int n = 100, buf = 1;

  if (rank == 0) {
    // Master process
    ...// start timing
    for (int i = 1; i < size; i++) {
      MPI_Send(&buf, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
    }
    while (1) {
      MPI_Status status;
      MPI_Recv(&buf, 1, MPI_INT, MPI_ANY_SOURCE, WORK_TAG, MPI_COMM_WORLD, &status);
      int slave_id = status.MPI_SOURCE;
      n --;
      if (n > 0) {
        MPI_Send(&buf, 1, MPI_INT, slave_id, WORK_TAG, MPI_COMM_WORLD);
      } else {
        break;
      }
    }
    ...// end timing
    // Send termination signal to slaves
    for (int i = 1; i < size; i++) {
      MPI_Send(0, 0, MPI_UNSIGNED_LONG, i, DIE_TAG, MPI_COMM_WORLD);
    }
  } else {
    // Slave process
    while (1) {
      MPI_Status status;
      MPI_Recv(&buf, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
      if (status.MPI_TAG == WORK_TAG) {
        struct timespec tim, tim2;
        tim.tv_sec = 0;
        tim.tv_nsec = 100 * 1000000L;
        nanosleep(&tim, &tim2);
        MPI_Send(&buf, 1, MPI_INT, 0, WORK_TAG, MPI_COMM_WORLD);
      } else {
        break;
      }
    }
  }

  MPI_Finalize();
  return 0;
}

The result obtained by both implementations can be seen in the following chart:

image

According to the measurement, Go is able to achieve much better speedup than MPI. This may be due to the fact that Goroutines and channel communication are more lightweight than message passing in MPI (just a hypothesis).

Conclusion

The 2 experiments, although simple, show that Go is a great candidate for parallel computing. The Go runtime is able to allocate OS threads efficiently to achieve perfect speedup in traditional partitioning problems. Not only that, Go also performs well in systems that require high interaction among processes. Therefore, for people about to learn parallel programming, Go can be a very good starting point. If you're interested in more complex parallel problems in Go, this is one of the papers on the topic I've found.

Comments

comments powered by Disqus