Parallel Streams java 8
Java8 introduces the concept of parallel stream to do parallel processing. As we have more number of cpu cores nowadays due to cheap hardware costs, parallel processing can be used to perform operation faster.
import java.util.Arrays; import java.util.stream.IntStream; public class Test { public static void main(String[] args) { System.out.println("================================="); System.out.println("Using Sequential Stream"); System.out.println("================================="); int[] array= {1,2,3,4,5,6,7,8,9,10}; IntStream intArrStream=Arrays.stream(array); intArrStream.forEach(s-> { System.out.println(s+" "+Thread.currentThread().getName()); } ); System.out.println("================================="); System.out.println("Using Parallel Stream"); System.out.println("================================="); IntStream intParallelStream=Arrays.stream(array).parallel(); intParallelStream.forEach(s-> { System.out.println(s+" "+Thread.currentThread().getName()); } ); } }
Output,main thread is doing all the work in case of sequential stream. It waits for the current iteration to complete and then works on the next iteration.
In case of Parallel stream,4 threads are spawned simultaneously and it internally uses Fork and Join pool to create and manage threads.Parallel streams create ForkJoinPool instance via static ForkJoinPool.commonPool() method.
Parallel Stream takes the benefits of all available CPU cores and processes the tasks in parallel. If the number of tasks exceeds the number of cores, then remaining tasks wait for the currently running task to complete.
When to use parallel Stream
It is anything but difficult to change over successive Streams to resemble Stream just by including .parallel, doesn't mean you should always use it.
There are bunches of factors you have to consider while utilizing parallel streams else you will experience the negative effects of equal Streams.
Parallel Stream has much higher overhead than sequential Stream and it takes a good amount of time to coordinate between threads.
You need to consider parallel Stream if and only if:
You have a large dataset to process.
As you know that Java uses ForkJoinPool to achieve parallelism, ForkJoinPool forks sources streams and submits for execution, so your source stream should be splittable.
For example:
ArrayList is very easy to split, as we can find a middle element by its index and split it but LinkedList is very hard to split and does not perform very well in most of the cases.
You are actually suffering from performance issues.
You need to make sure that all the shared resources between threads need to be synchronized properly otherwise it might produce unexpected results.
Simplest formula for measuring parallelism is “NQ” model as provided by Brian Goetz in his presentation.
NQ Model:
N x Q > 10000
where,
N = number of items in dataset
Q = amount of work per item
It means if you have a large number of datasets and less work per item(For example: Sum), parallelism might help you run the program faster and vice versa is also true. So if you have less
number of datasets and more work per item(doing some computational work), then also parallelism might help you in achieving results faster.
No comments: