Streams and Collectors

The Streams API (since Java 8) provides a functional, declarative way to process collections. Instead of writing imperative loops, you build pipelines of operations that filter, transform, and aggregate data.

Quick start

List<String> names = List.of("Alice", "Bob", "Charlie", "Diana", "Eve");

List<String> result = names.stream()
    .filter(name -> name.length() > 3)
    .map(String::toUpperCase)
    .sorted()
    .toList();

// result: [ALICE, CHARLIE, DIANA]

Every stream pipeline has three parts:

Source -- a collection, array, generator, or I/O channel
Intermediate operations -- lazy transformations (filter, map, sorted, distinct, ...)
Terminal operation -- triggers execution and produces a result (collect, forEach, reduce, count, ...)

Creating streams

// From a collection
List<String> list = List.of("a", "b", "c");
Stream<String> s1 = list.stream();

// From an array
String[] array = {"a", "b", "c"};
Stream<String> s2 = Arrays.stream(array);

// From static values
Stream<String> s3 = Stream.of("a", "b", "c");

// Empty stream
Stream<String> s4 = Stream.empty();

// Infinite stream (use with limit!)
Stream<Double> randoms = Stream.generate(Math::random).limit(5);

// Iterate (seed + unary operator)
Stream<Integer> counting = Stream.iterate(0, n -> n + 2).limit(10);
// 0, 2, 4, 6, 8, 10, 12, 14, 16, 18

// Primitive streams (avoid boxing)
IntStream ints = IntStream.range(1, 100);      // 1..99
LongStream longs = LongStream.rangeClosed(1, 100); // 1..100
DoubleStream doubles = DoubleStream.of(1.0, 2.5, 3.7);

Intermediate operations

All intermediate operations are lazy -- they do not execute until a terminal operation is called.

filter

Keep elements matching a predicate:

List<Integer> evens = IntStream.rangeClosed(1, 20)
    .filter(n -> n % 2 == 0)
    .boxed()
    .toList();
// [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

map

Transform each element:

List<Integer> lengths = List.of("hello", "world", "!")
    .stream()
    .map(String::length)
    .toList();
// [5, 5, 1]

flatMap

Flatten nested structures (one-to-many transformation):

List<List<String>> nested = List.of(
    List.of("a", "b"),
    List.of("c", "d"),
    List.of("e")
);

List<String> flat = nested.stream()
    .flatMap(Collection::stream)
    .toList();
// [a, b, c, d, e]

Practical example -- splitting sentences into words:

List<String> sentences = List.of("Hello world", "Java streams are powerful");

List<String> words = sentences.stream()
    .flatMap(s -> Arrays.stream(s.split("\\s+")))
    .toList();
// [Hello, world, Java, streams, are, powerful]

distinct, sorted, peek

List<Integer> numbers = List.of(3, 1, 4, 1, 5, 9, 2, 6, 5);

List<Integer> result = numbers.stream()
    .distinct()                          // remove duplicates
    .sorted()                            // natural order
    .peek(n -> System.out.print(n + " ")) // debug: 1 2 3 4 5 6 9
    .toList();
// [1, 2, 3, 4, 5, 6, 9]

limit and skip

List<Integer> page = IntStream.rangeClosed(1, 100)
    .boxed()
    .skip(20)     // skip first 20
    .limit(10)    // take next 10
    .toList();
// [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]

mapToInt, mapToLong, mapToDouble

Avoid boxing overhead by switching to primitive streams:

double average = List.of("hello", "world", "hi")
    .stream()
    .mapToInt(String::length)
    .average()
    .orElse(0.0);
// 3.666...

Terminal operations

collect

The most versatile terminal operation. Used with Collectors to produce lists, maps, strings, and custom aggregations.

// To list (Java 16+ shorthand)
List<String> list = stream.toList();

// To list (mutable, pre-Java 16)
List<String> mutableList = stream.collect(Collectors.toList());

// To set
Set<String> set = stream.collect(Collectors.toSet());

// To specific collection
TreeSet<String> treeSet = stream.collect(Collectors.toCollection(TreeSet::new));

reduce

Combine all elements into a single value:

// Sum
int sum = IntStream.rangeClosed(1, 10).reduce(0, Integer::sum);
// 55

// Product
int product = IntStream.rangeClosed(1, 5).reduce(1, (a, b) -> a * b);
// 120

// Longest string
Optional<String> longest = List.of("short", "medium-ish", "a")
    .stream()
    .reduce((a, b) -> a.length() >= b.length() ? a : b);
// Optional[medium-ish]

forEach

Side effects (printing, logging, sending):

names.stream()
    .filter(n -> n.startsWith("A"))
    .forEach(System.out::println);

Avoid forEach for anything other than side effects. If you are building a collection, use collect or toList().

count, min, max, findFirst, findAny, anyMatch, allMatch, noneMatch

List<Integer> nums = List.of(3, 7, 2, 9, 4);

long count = nums.stream().filter(n -> n > 5).count();       // 2
Optional<Integer> min = nums.stream().min(Integer::compareTo); // Optional[2]
Optional<Integer> max = nums.stream().max(Integer::compareTo); // Optional[9]
Optional<Integer> first = nums.stream().findFirst();           // Optional[3]

boolean anyBig = nums.stream().anyMatch(n -> n > 8);   // true
boolean allPositive = nums.stream().allMatch(n -> n > 0); // true
boolean noneNeg = nums.stream().noneMatch(n -> n < 0);  // true

Collectors deep dive

toMap

record Person(String name, int age) {}

List<Person> people = List.of(
    new Person("Alice", 30),
    new Person("Bob", 25),
    new Person("Charlie", 35)
);

// Name -> Age map
Map<String, Integer> nameToAge = people.stream()
    .collect(Collectors.toMap(Person::name, Person::age));
// {Alice=30, Bob=25, Charlie=35}

Handling duplicate keys:

// With merge function (keep the older person)
Map<String, Integer> map = people.stream()
    .collect(Collectors.toMap(
        Person::name,
        Person::age,
        (existing, replacement) -> existing  // merge function
    ));

groupingBy

Group elements by a classifier:

Map<Integer, List<Person>> byAge = people.stream()
    .collect(Collectors.groupingBy(Person::age));

// Group by age decade
Map<Integer, List<Person>> byDecade = people.stream()
    .collect(Collectors.groupingBy(p -> p.age() / 10 * 10));
// {30=[Alice, Charlie], 20=[Bob]}

Downstream collectors:

// Count per group
Map<Integer, Long> countByDecade = people.stream()
    .collect(Collectors.groupingBy(
        p -> p.age() / 10 * 10,
        Collectors.counting()
    ));
// {30=2, 20=1}

// Names per group
Map<Integer, List<String>> namesByDecade = people.stream()
    .collect(Collectors.groupingBy(
        p -> p.age() / 10 * 10,
        Collectors.mapping(Person::name, Collectors.toList())
    ));
// {30=[Alice, Charlie], 20=[Bob]}

// Average age per group
Map<String, Double> avgAgeByFirstLetter = people.stream()
    .collect(Collectors.groupingBy(
        p -> p.name().substring(0, 1),
        Collectors.averagingInt(Person::age)
    ));

partitioningBy

Split into two groups (true/false):

Map<Boolean, List<Person>> partitioned = people.stream()
    .collect(Collectors.partitioningBy(p -> p.age() >= 30));
// {true=[Alice, Charlie], false=[Bob]}

joining

Concatenate strings:

String csv = people.stream()
    .map(Person::name)
    .collect(Collectors.joining(", "));
// "Alice, Bob, Charlie"

String withBrackets = people.stream()
    .map(Person::name)
    .collect(Collectors.joining(", ", "[", "]"));
// "[Alice, Bob, Charlie]"

summarizingInt / summarizingDouble

Get all stats at once:

IntSummaryStatistics stats = people.stream()
    .collect(Collectors.summarizingInt(Person::age));

stats.getCount();   // 3
stats.getSum();     // 90
stats.getMin();     // 25
stats.getMax();     // 35
stats.getAverage(); // 30.0

Parallel streams

Parallel streams split the workload across multiple CPU cores using the common ForkJoinPool:

long count = IntStream.rangeClosed(1, 10_000_000)
    .parallel()
    .filter(n -> isPrime(n))
    .count();

When parallel streams help

Scenario	Parallel benefit
Large dataset (100k+ elements)	Likely beneficial
CPU-intensive computation per element	Beneficial
Simple operations (filter, map) on small lists	No benefit (overhead > gain)
I/O-bound operations	No benefit (use async instead)
Order-dependent operations	Harmful (forces sequential coordination)

When to avoid

// BAD: mutating shared state in parallel
List<Integer> results = new ArrayList<>(); // NOT thread-safe!
numbers.parallelStream().forEach(results::add); // Race condition!

// GOOD: let the collector handle accumulation
List<Integer> results = numbers.parallelStream()
    .filter(n -> n > 0)
    .toList();

Practical examples

Word frequency counter

String text = "the quick brown fox jumps over the lazy dog the fox";

Map<String, Long> wordFreq = Arrays.stream(text.split("\\s+"))
    .collect(Collectors.groupingBy(
        String::toLowerCase,
        Collectors.counting()
    ));
// {the=3, quick=1, brown=1, fox=2, jumps=1, over=1, lazy=1, dog=1}

CSV line parsing

record Product(String name, double price, String category) {}

List<Product> products = Files.lines(Path.of("products.csv"))
    .skip(1) // skip header
    .map(line -> line.split(","))
    .map(parts -> new Product(parts[0].trim(), Double.parseDouble(parts[1].trim()), parts[2].trim()))
    .toList();

// Most expensive per category
Map<String, Optional<Product>> mostExpensive = products.stream()
    .collect(Collectors.groupingBy(
        Product::category,
        Collectors.maxBy(Comparator.comparingDouble(Product::price))
    ));

Flattening nested objects

record Order(String customer, List<String> items) {}

List<Order> orders = List.of(
    new Order("Alice", List.of("Book", "Pen")),
    new Order("Bob", List.of("Laptop", "Mouse", "Keyboard")),
    new Order("Alice", List.of("Notebook"))
);

// All unique items across all orders
Set<String> allItems = orders.stream()
    .flatMap(order -> order.items().stream())
    .collect(Collectors.toSet());
// [Book, Pen, Laptop, Mouse, Keyboard, Notebook]

// Items per customer
Map<String, List<String>> itemsByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customer,
        Collectors.flatMapping(o -> o.items().stream(), Collectors.toList())
    ));
// {Alice=[Book, Pen, Notebook], Bob=[Laptop, Mouse, Keyboard]}

Common pitfalls

Pitfall	Problem	Fix
Reusing a stream	Streams can only be consumed once; second terminal op throws `IllegalStateException`	Create a new stream from the source
Mutating state in `forEach`	Race conditions with parallel streams; violates functional contract	Use `collect` or `reduce` instead
`Stream.of(array)` with primitives	`Stream.of(new int[]{1,2,3})` creates `Stream<int[]>`, not `Stream<Integer>`	Use `Arrays.stream(array)` or `IntStream.of(1,2,3)`
Forgetting terminal operation	Intermediate ops are lazy; nothing happens without a terminal op	Always end with `collect`, `forEach`, `count`, etc.
Infinite streams without `limit`	`Stream.generate(...)` or `Stream.iterate(...)` runs forever	Always add `.limit(n)`
`peek` for business logic	`peek` is for debugging; it may not execute for short-circuiting ops	Use `map` + side effect, or process after collecting
`Optional.get()` without check	Throws `NoSuchElementException` if empty	Use `orElse`, `orElseThrow`, or `ifPresent`
Parallel stream on small data	Thread coordination overhead exceeds the computation benefit	Only use parallel for large datasets with CPU-bound work

Quick start​

Creating streams​

Intermediate operations​

filter​

map​

flatMap​

distinct, sorted, peek​

limit and skip​

mapToInt, mapToLong, mapToDouble​

Terminal operations​

collect​

reduce​

forEach​

count, min, max, findFirst, findAny, anyMatch, allMatch, noneMatch​

Collectors deep dive​

toMap​

groupingBy​

partitioningBy​

joining​

summarizingInt / summarizingDouble​

Parallel streams​

When parallel streams help​

When to avoid​

Practical examples​

Word frequency counter​

CSV line parsing​

Flattening nested objects​

Common pitfalls​

See also​