Learn Java Stream API Practically

Amirhosein Gharaati
Stackademic
Published in
10 min readAug 15, 2023

--

Streams was one of the major features added to Java 8. This tutorial is an introduction to the many functionalities supported by streams, with a focus on simple, practical examples.

Let’s talk about functional programming at first.

What is functional programming?

Functional programming is a declarative programming style as opposed to imperative.

The fundamental aim of this approach is to create code that is more concise, less intricate, more predictable, and easier to test when compared to traditional coding practices.

A distinctive feature of functional languages is the tendency to avoid shared states, with some languages even enforcing immutability of states.

Functional programming vs Purely Functional programming:

Pure functional programming languages don’t allow any mutability in its nature.

A functional style language provides higher-order functions but often permits mutability at the risk of we failing to do the right things, which put a burden on us rather than protecting us.

So, in general, we can say if a language provides higher-order function it is functional style language, and if a language goes to the extent of limiting mutability in addition to higher-order function then it becomes purely functional language.

Java is a functional style language.

What is stream?

A sequence of elements supporting sequential and parallel aggregate operations.

In a human-friendly definition:

Accomplish tasks using declarative operations

We got a data source, perform zero or more intermediate operations, and get a result.

So in Stream API we are able to do some functional style process.

Stream API Advantages

Laziness

The most significant advantage of streams over loops is laziness. Until we call a terminal operation on a stream, no work is done. We can only run it at the exact time we want it to.

And not just the building of the pipeline is lazy. Most intermediate operations are lazy, too. Elements are only consumed as they’re needed.

(Mostly) stateless

One of the main pillars of functional programming is an immutable state. Most intermediate operations are stateless, except for distinct(), sorted(...), limit(...), and skip(...).

Even though Java allows the building of stateful lambdas, we should always strive to design them to be stateless. Any state can have severe impacts on safety and performance and might introduce unintended side effects.

Optimizations included

Thanks to being (mostly) stateless, streams can optimize themselves quite efficiently. Stateless intermediate operations can be fused together to a combined consumer. Redundant operations might be removed. And some pipeline paths might be short-circuited.

The JVM will optimize traditional loops, too. But streams are an easier target due to their multi-operation design and are mostly stateless.

Non-reusable

Being just a dumb pipeline, streams can’t be reused. But they don’t change the original data source — we can always create another stream from the source.

Less boilerplate

Streams are often easier to read and comprehend.

This is a simple data-processing example with a for loop:

List<Album> albums = ...;
List<String> result = new ArrayList<>();

for (Album album : albums) {
if (album.getYear() != 1999) {
continue;
}
if (album.getGenre() != Genre.ALTERNATIVE) {
continue;
}
result.add(album.getName());
if (result.size() == 5) {
break
}
}
Collections.sort(result);

This code is equivalent to:

List<String> result = 
albums.stream()
.filter(album -> album.getYear == 1999)
.filter(album -> album.getGenre() == Genre.ALTERNATIVE)
.limit(5)
.map(Album::getName)
.sorted()
.toList();

We have a shorter code block, clearer operations, no loop boilerplate, and no extra temporary variables. All nicely packaged in a fluent API. This way, our code reflects the what, and we no longer need to care about the actual iteration process, the how.

parallelization

Concurrency is hard to do right and easy to do wrong. Streams support parallel execution (forkJoin) and remove much of the overhead if we’re doing it ourselves.

A stream can be parallelized by calling the intermediate operation parallel() and turned back to sequential by calling sequential(). But not every stream pipeline is a good match for parallel processing.

Learning Stream API

Feel free to explore the code within this repository:

https://github.com/AmirHosein-Gharaati/learnstreamapi

We will go through some examples with the help of some operations that stream API provides for us.

We use Junit5 to write different cases and check the result. Also, we use a User model and some mock data for the sake of simplicity.

User Model

import lombok.AllArgsConstructor;
import lombok.Getter;
import lombok.Setter;

import java.util.List;

@Getter
@Setter
@AllArgsConstructor
public class User {
private Long id;
private String firstName;
private String lastName;
private String email;
private Integer age;
private List<String> interests;
}

Mock Data

private static List<User> users;

@BeforeAll
public static void init() {
users = List.of(
new User(1L, "Amirhosein", "Gharaati", "amirgh1380@gmail.com", 22, List.of("computer", "board games")),
new User(2L, "Mohammad", "Shoja", "rezajsh@yahoo.com", 26, List.of("computer", "guitar")),
new User(3L, "Babak", "Ahmadi", "babakahmadi@gmail.com", 33, List.of("shopping")),
new User(2L, "Robin", "Eklund", "robin.eklund@twitter.com", 28, List.of("reading")),
new User(5L, "Amir", "Tavakoli", "amirtvkli@gmail.com", 30, List.of("reading", "computer", "cooking")),
new User(5L, "Farhad", "Kiani", "farhadkiani@focalpay.se", 28, List.of())
);
}

Let’s learn some basic operations!

filter

produces a new stream that contains elements of the original stream that pass a given test (specified by a Predicate).

filter users with gmail account with age greater than equal 25

@Test
void filter_users_with_gmail_with_age_greater_than_equal_25() {
List<User> filteredUsers = users.stream()
.filter(user -> hasGmailAccount(user))
.filter(user -> ageIsGreaterThanEqual25(user))
.toList();
int expectedNumOfUsers = 2;

assertEquals(expectedNumOfUsers, filteredUsers.size());
}

private boolean ageIsGreaterThanEqual25(User user) {
return user.getAge() >= 25;
}

private boolean hasGmailAccount(User user) {
return user.getEmail().endsWith("gmail.com");
}

map

produces a new stream after applying a function to each element of the original stream. The new stream could be of different type.

generate users fullName

@Test
void generate_users_fullname() {
List<String> fullNames = users.stream()
.map(this::createFullName) // better syntax
.toList();

String expectedFirstPersonFullName = "Amirhosein Gharaati";
assertEquals(expectedFirstPersonFullName, fullNames.get(0));
}

private String createFullName(User user) {
return "%s %s".formatted(user.getFirstName(), user.getLastName());
}

findFirst

returns an Optional for the first entry in the stream; the Optional can, of course, be empty.

find the first person whose age is greater than equal to 25 (lazy evaluation)

@Test
void first_person_with_age_greater_than_equal_25() {
Optional<User> user = users.stream()
.filter(this::ageIsGreaterThanEqual25)
.findFirst();

Integer expectedAge = 26;
String expectedFirstName = "Mohammad";

assertEquals(expectedAge, user.get().getAge());
assertEquals(expectedFirstName, user.get().getFirstName());
}

private boolean ageIsGreaterThanEqual25(User user) {
return user.getAge() >= 25;
}

flatMap

flattens the data structure to simplify further operations

count computer interests in data

@Test
void count_computer_interest() {
long numberOfComputerInterest = users.stream()
.map(User::getInterests)
.flatMap(Collection::stream)
.filter(interest -> interest.equals("computer"))
.count();
int expectedCount = 3;

assertEquals(expectedCount, numberOfComputerInterest);
}

collect: toSet / toList / toMap / . . .

the common ways to get stuff out of the stream once we are done with all the processing.

collect all unique ids of users

@Test
void collect_unique_ids() {
Set<Long> uniqueIds = users.stream()
.map(User::getId)
.collect(Collectors.toSet());

Set<Long> expectedUniqueIds = Set.of(1L, 2L, 3L, 5L);

assertEquals(expectedUniqueIds, uniqueIds);
}

collect: groupingBy

offers advanced partitioning — where we can partition the stream into more than just two groups.

Group users by email and count each email provider.

@Test
void group_users_by_email() {
Map<String, Long> emailToCount = users.stream()
.collect(
Collectors.groupingBy(
user -> getEmailProvider(user), Collectors.counting()
)
);

Long numberOfUsersWithGmailAccount = 3L;

assertEquals(numberOfUsersWithGmailAccount, emailToCount.get("gmail.com"));
}

private String getEmailProvider(User user) {
return user.getEmail().split("@")[1];
}

Problem Solving

Now we know some basic operations. So let’s try to solve some problems

Extract all interests that users have

@Test
void extract_all_interests() {
Set<String> interests = ; // TODO
int numberOfDistinctInterests = 6;

assertEquals(numberOfDistinctInterests, interests.size());

}

Answer

@Test
void extract_all_interests() {
Set<String> interests = users.stream()
.map(User::getInterests)
.flatMap(Collection::stream)
.collect(Collectors.toSet());

int numberOfDistinctInterests = 6;

assertEquals(numberOfDistinctInterests, interests.size());

}

Given some ids, find if a user exists with that id. Collect the user if found.

@Test
void find_users_by_id_if_exists() {
List<Long> ids = List.of(1L, 2L, 7L);

List<User> users = ; // TODO
int expectedNumOfFoundUsers = 2;

assertEquals(expectedNumOfFoundUsers, users.size());
}

Answer (Traditional Way)

@Test
void traditional_find_users_by_id_if_exists() {
List<Long> ids = List.of(1L, 2L, 7L);

List<User> users = new ArrayList<>();
for(Long id: ids) {
Optional<User> optionalUser = findById(id);
if(optionalUser.isPresent()) {
users.add(optionalUser.get());
}
}

int expectedNumOfFoundUsers = 2;

assertEquals(expectedNumOfFoundUsers, users.size());
}

private Optional<User> findById(Long id) {
return users.stream()
.filter(user -> user.getId().equals(id))
.findFirst();
}

Answer (Stream API)

@Test
void find_users_by_id_if_exists() {
List<Long> ids = List.of(1L, 2L, 7L);

List<User> users = ids.stream()
.map(this::findById)
.flatMap(Optional::stream)
.toList();
int expectedNumOfFoundUsers = 2;

assertEquals(expectedNumOfFoundUsers, users.size());
}

private Optional<User> findById(Long id) {
return users.stream()
.filter(user -> user.getId().equals(id))
.findFirst();
}

Collect duplicated ids of the users

@Test
void extract_duplicated_user_ids() {
Set<Long> duplicatedUsersIds = ; // TODO

Set<Long> expectedIds = Set.of(2L, 5L);

assertEquals(expectedIds, duplicatedUsersIds);
}

Hint: you can use these operators and terminals:

collect: groupingBy | collect: toSet | filter | map

Answer (Traditional Way)

@Test
void traditional_extract_duplicated_users_based_on_user_id() {
Set<Long> duplicatedUsersIds = new HashSet<>();

Map<Long, Long> idToCount = generateIdToCount();

for(Map.Entry<Long, Long> entry: idToCount.entrySet()) {
if(entry.getValue() > 1) {
duplicatedUsersIds.add(entry.getKey());
}
}

Set<Long> expectedIds = Set.of(2L, 5L);

assertEquals(expectedIds, duplicatedUsersIds);
}

private Map<Long, Long> generateIdToCount() {
Map<Long, Long> idToCount = new HashMap<>();

for(User user: users) {
if(idToCount.containsKey(user.getId())) {
Long amount = idToCount.get(user.getId());
idToCount.put(user.getId(), amount + 1);
} else {
idToCount.put(user.getId(), 1L);
}
}

return idToCount;
}

Answer (Stream API)

@Test
void extract_duplicated_users_based_on_user_id() {
Set<Long> duplicatedUsersIds = users.stream()
.collect(Collectors.groupingBy(User::getId, Collectors.counting()))
.entrySet()
.stream()
.filter(entry -> entry.getValue() > 1)
.map(Map.Entry::getKey)
.collect(Collectors.toSet());

Set<Long> expectedIds = Set.of(2L, 5L);

assertEquals(expectedIds, duplicatedUsersIds);
}

Best Practices

Smaller operations

Lambdas can be simple one-liners or huge code blocks if wrapped in curly braces. To retain simplicity and conciseness, we should restrict ourselves to these two use cases for operations:

  • One-line expressions
    e.g., .filter(album -> album.getYear() > 4)
  • Method references
    e.g., filter(this::myFilterCriteria)

By using method references, we can have more complex operations, reuse operational logic, and even unit test it more easily.

Method references

The bytecode between a lambda and a method reference differs slightly with the method reference generating less. A lambda might be translated into an anonymous class calling the body, creating more code than needed.

Also, by using method references, we lose the visual noise of the lambda:

source.stream()
.map(s -> s.length())
.collect(Collectors.toList());

// VS

source.stream()
.map(String::length)
.collect(Collectors.toList());

Code formatting

By putting each pipeline step into a new line, we can improve readability:

List<String> result = albums.stream().filter(album -> album.getReleaseYear == 1999)
.filter(album -> album.getGenre() == Genre.ALTERNATIVE).limit(5)
.map(Album::getName).sorted().collect(Collectors.toList());

// VS.
List<String> result =
albums.stream()
.filter(album -> album.getReleaseYear == 1999)
.filter(album -> album.getGenre() == Genre.ALTERNATIVE)
.limit(5)
.map(Album::getName)
.sorted()
.collect(Collectors.toList());

It also allows us to set breakpoints at the correct pipeline step easier.

Order of operations

Think of a simple stream:

Stream.of("ananas", "oranges", "apple", "pear", "banana")
.map(String::toUpperCase) // 1. Process
.sorted(String::compareTo) // 2. Sort
.filter(s -> s.startsWith("A")) // 3. Filter
.forEach(System.out::println);

This code will run map five times, sorted eight times, filter five times, and forEach two times. This means a total of 20 operations to output two values.

If we reorder the pipeline parts, we can reduce the total operations count significantly without changing the actual outcome:

Stream.of("ananas", "oranges", "apple", "pear", "banana")
.filter(s -> s.startsWith("a")) // 1. Filter first
.map(String::toUpperCase) // 2. Process
.sorted(String::compareTo) // 3. Sort
.forEach(System.out::println);

By filtering first, we’re going to restrict the other operations to a minimum: filter five times, map two times, sort one time, and forEach two times, which saves us 10 operations in total.

Real World Project Example

An Excel file is given which is the exported data of a company system. You are going to do some process: validate data, ignore bad data, generate report to automate your application.

You can use a plain Java application or a back-end service like Spring Boot. Feel free to choose any framework that you want.

For reading an excel file you need to use Apache POIin your project:

<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.3</version>
</dependency>

Download the spreadsheet as excel file: excel file

You are going to implement 4 different phases in this project:

  1. Read excel file
  2. Validate data
  3. Generate report
  4. Extract valid data

Reading Excel file

Using Apache POI dependency, read the excel file and extract the rows in Java objects.

The Apache POI gives some methods while reading the excel file.

There are some things to consider:

  • Use takeWhile operation in stream API to check the last row. (Don’t rely on the last row index Apache POI gives you)
  • Create an object for every row. It is better that each field be a String.
  • Return a list of row objects as the result.

You may use these operators and terminals:

skip | takeWhile | map | toList

Validating data

The data may not be valid or be in a correct format. We can have some policies for the bad data. For example we reject the rows which have bad data.

  • The user id should be in a numeric format. Also there shouldn’t be any duplicated id. ID is greater than equal to 1.
  • Price values check if the string is numeric or not.
  • Phone number should be in a phone format.
  • Email should be in a email format.
  • Age shouldn’t be more than 120.
  • There shouldn’t be any empty value in a cell.

Extract all bad data based on user id and create Error objects for each column which there is a problem for further usage.

You may use these operators and terminals:

map | filter | collect: toMap | collect: groupingBy | collect: toSet

Generate report

Generate a report based on the extracted bad data that. This can be for example another excel file within two columns:

First column: the user ID

Second column: the column name(s) which the data in not valid. You can separate column names by comma.

You may use these operators and terminals:

forEach | filter | IntStream . . .

Extract valid data

Based on the extracted data from excel file, filter the rows that has no invalid data.

We assume that there can be some more process in the data after we filtered them, but for simplicity and our learning purpose, we leave it now.

Conclusion

In this project we covered the basics of stream API, some examples and practices and a project if you were interested in developing using stream API.

Stream API is a powerful tool with a clean syntax that can maintain your project although in some cases the old way might sound better.

Feel free to share your opinion about the use cases or some other stuff about stream API!

Resources:

https://www.baeldung.com/java-8-streams

Thank you for reading until the end. Please consider following the writer and this publication. Visit Stackademic to find out more about how we are democratizing free programming education around the world.

--

--