# [tuning] Run more 100k threads



## Mak-Di (May 30, 2010)

Hi all!

```
uname -a
FreeBSD domain.local 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009     
[email]root@mason.cse.buffalo.edu[/email]:/usr/obj/usr/src/sys/GENERIC  amd64
```


```
#include <stdio.h> 
#include <stdlib.h> 
#include <pthread.h> 
#include <string.h> 

void * thread_body(void * param) { 
  sleep(10); 
} 

int main(int argc, char *argv[]) { 
  pthread_t thread; 
  int i;
  for (i = 0; i < 120000; i++) {
    pthread_create(&thread, NULL, thread_body, NULL);
  }

  sleep(20);
  return (EXIT_SUCCESS); 
}
```
`gcc -pthread -o main main.c` (for compile)
When exec ./main I have only 1500 threads, after changed kern.threads.max_threads_per_proc (up to 512000) I had only 100k threads and no more 
What should I change to have more then 100k threads?
Thanks!


----------



## SirDice (May 30, 2010)

Why on earth do you want to run that many threads?


----------



## Mak-Di (May 30, 2010)

100k threads uses 60-70% of CPU and it's not effective


----------



## qsecofr (May 30, 2010)

what's the return code from pthread_create() when you hit your max?  might indicate a reason.

I've written a couple pthread apps on a non-unix platform.  it's implementation of pthread_create() has no hard limit except for available memory.  maybe that's something you're running into?

Also, on this other platform is a macro PTHREAD_THREADS_MAX which returns a theoretical max.  I don't know if its implementation exists for unix platform also - ymmv.


----------



## Mak-Di (May 30, 2010)

qsecofr said:
			
		

> what's the return code


It's 
	
	



```
35 'Resource temporarily unavailable'
```

qsecofr, thank you!


----------



## Alt (May 31, 2010)

Mak-Di said:
			
		

> 100k threads uses 60-70% of CPU and it's not effective


When you do sleep() its normal - thread gives cpu to other processes/threads.
If you put some mathematics into your thread_body() - you will nearly lockdown system with that many threads


----------



## SirDice (May 31, 2010)

I'm guessing the sheer number of context switches will eat up most of the cpu cycles. 
Which leads me to the same question again: Why do you need that many threads?

http://en.wikipedia.org/wiki/Context_switch


----------



## Mak-Di (Jun 1, 2010)

For a large number of simple arithmetic operations.

PS: /usr/src/lib/libthr/thread/thr_list.c, MAX_THREADS


----------



## SirDice (Jun 1, 2010)

Creating more and more threads will not increase the speed. There's a point where the context switching will actually slow down things. The trick is to tune the amount of threads so your CPU isn't overloaded. Having 100K threads all doing computational intensive tasks may actually be slower then using 100 threads. It all depends on your hardware (CPU, more then one core, etc.).


----------



## Mak-Di (Jun 1, 2010)

SirDice said:
			
		

> It all depends on your hardware (CPU, more then one core, etc.).


Sure, 2 x AMD Phenom II X6 1090T

Thanks a lot!


----------



## SirDice (Jun 1, 2010)

Tuning doesn't mean making things run fast. Tuning is the art of getting your application to make the most efficient use of your hardware.

It all really depends on hardware and what your application does. You may get the best results by having 4 threads each doing 100 calculations. On a different machine (different specs) this may turn out to be with 8 threads each doing 50 calculations.


----------



## qsecofr (Jun 1, 2010)

In addition to hardware, consider the nature of the application.  A socket server program handling communication data for hundreds/thousands of clients lends itself well to a more highly-threaded program - each client's communication data is separate.  But 1:1 thread to client is likely overkill too.  Pooling threads works nicely.  You may need to test anticipated processing load against various pool levels to observe the point of inflection.

At any point in time, each CPU/core will be executing just one set of instructions for the duration of the timeslice (or until yielded).

If the objective is to process math calculations from first to last as fast as possible, then like SirDice suggests, fewer threads will likely be better.  Don't forget to include your time spent coding/testing/maintaining the programs in your targets.  The learning curve for managing thread-specific data, mutexes & such can be non-trivial.

If the objective is to test the limits of your machine (and possibly your patience), then by all means proceed.


----------



## dennylin93 (Jun 2, 2010)

A similar case would be parallel builds (-j flag when using make()). Increasing the number does speed things up, but there's a limit. When you go over it, things just stay the same or slow down dramatically.


----------

