java中的Stream流式操作

在 Java 8 及以上版本中Stream（流式操作）是处理集合（Collection）、数组等数据的高效工具，它基于函数式编程思想，提供了声明式（而非命令式）的数据处理方式，核心优势是简洁、高效、可并行。Stream 本质是数据的 “流水线”数据从数据源（集合、数组等）流入，经过一系列中间操作（过滤、映射、排序等）的 “加工”，最终通过终端操作（统计、收集、遍历等）输出结果（或无输出）。这个 “

heartbeat..

2102人浏览 · 2025-11-22 20:01:50

heartbeat.. · 2025-11-22 20:01:50 发布

一、介绍

在 Java 8 及以上版本中，Stream（流式操作） 是处理集合（Collection）、数组等数据的高效工具，它基于函数式编程思想，提供了声明式（而非命令式）的数据处理方式，核心优势是 简洁、高效、可并行。
Stream 本质是数据的 “流水线”：数据从数据源（集合、数组等）流入，经过一系列中间操作（过滤、映射、排序等）的 “加工”，最终通过终端操作（统计、收集、遍历等）输出结果（或无输出）。
这个 “流水线” 可以用 “工厂食品加工生产线” 来形象比喻，每个环节和 Stream 操作一一对应，特别好理解：

你可以把 Stream 想象成一条 自动化食品加工生产线：

「数据源」= 生产线的 “原料仓库”：比如一堆新鲜的蔬菜（对应集合 / 数组里的原始数据），是加工的起点；
「中间操作」= 生产线的 “加工工序”：比如 “筛选（去掉腐烂的蔬菜）”“切割（切成小块）”“清洗（去掉泥沙）”“调味（加调料）”，每个工序只负责自己的加工，不直接产出最终产品（惰性求值），而是把加工后的原料传到下一道工序；
「终端操作」= 生产线的 “成品包装 / 出库”：比如 “装盒打包”“称重计数”“直接运往超市”，只有启动这一步，前面所有加工工序才会真正运转（触发计算），最终产出成品（对应 Stream 的结果）或完成交付（比如 forEach 遍历相当于 “把食品分发给消费者”，无返回值）。

补充两个贴合 Stream 特性的细节：

原料只能用一次：同一批蔬菜（同一个 Stream）经过生产线（终端操作）后，就变成了成品，不能再放回仓库重新加工（Stream 一次性消费）；
可选 “并行加工”：普通生产线是单条流水线（串行流），如果原料太多，可启动多条并行流水线（并行流），同时处理不同批次的原料，最后合并成品，效率更高。

二、特性

惰性求值：中间操作不会立即执行，只有调用终端操作时，才会触发整个流水线的计算（节省资源）。
一次性消费：一个 Stream 只能被使用一次，执行终端操作后会自动关闭，再次使用会抛出异常。
无状态 / 有状态操作：
- 无状态：每个元素的处理独立（如 filter(筛选)、map(映射)），不依赖其他元素；
- 有状态：处理元素需要依赖其他元素（如 sorted(自然排序)、distinct(去重)、limit(限制)），可能需要缓存数据。
并行支持：通过 parallelStream() 或 stream.parallel() 轻松实现并行处理（底层依赖 Fork/Join 框架）。

三、操作

完整的 Stream 操作分为 3 步：

创建 Stream：从数据源（集合、数组、生成器等）获取 Stream；
中间操作：对数据进行过滤、映射、排序等加工（可多个中间操作链式调用）；
终端操作：触发计算，输出结果（或执行副作用，如遍历），Stream 生命周期结束。

流程图：

集合/数组 → 创建 Stream → 中间操作1 → 中间操作2 → ... → 终端操作 → 结果

四、详解

1. 创建 Stream（数据源）

（1）从集合创建（最常用）

List<String> list = Arrays.asList("a", "b", "c");
Stream<String> stream = list.stream(); // 串行流
Stream<String> parallelStream = list.parallelStream(); // 并行流

（2）从数组创建

String[] arr = {"x", "y", "z"};
Stream<String> stream = Arrays.stream(arr);

// 部分数组（从索引1到2，左闭右开）
Stream<String> partStream = Arrays.stream(arr, 1, 3);

（3）通过 Stream.of () 创建

Stream<Integer> stream = Stream.of(1, 2, 3, 4);
Stream<Object> emptyStream = Stream.empty(); // 空流

（4）无限流（生成器 / 迭代器）

适用于需要动态生成数据的场景（需配合 limit 限制长度，否则无限循环）：

// 1. generate：生成固定值或随机值（Supplier接口）
Stream<Double> randomStream = Stream.generate(Math::random).limit(5); // 5个随机数

// 2. iterate：迭代生成（种子 -> 下一个元素，UnaryOperator接口）
Stream<Integer> numStream = Stream.iterate(0, n -> n + 2).limit(5); // 0,2,4,6,8

2. 中间操作（加工数据）

中间操作返回新的 Stream，支持链式调用，常用操作如下：

操作	功能描述	示例（以 List<Integer> list = [1,2,3,4,5,6] 为例）
`filter(Predicate)`	过滤元素（满足条件保留）	`list.stream().filter(n -> n % 2 == 0)` → [2,4,6]
`map(Function)`	元素映射（转换为另一种类型）	`list.stream().map(n -> n * 2)` → [2,4,6,8,10,12]
`flatMap(Function)`	扁平化映射（将流中的流展开为单个流）	`List<List<Integer>> list2 = [[1,2],[3,4]]; list2.stream().flatMap(Collection::stream)` → [1,2,3,4]
`distinct()`	去重（基于 `equals()` 方法）	`Stream.of(1,2,2,3).distinct()` → [1,2,3]
`sorted()`	自然排序（元素需实现 `Comparable`）	`Stream.of(3,1,2).sorted()` → [1,2,3]
`sorted(Comparator)`	自定义排序	`list.stream().sorted((a,b) -> b - a)` → [6,5,4,3,2,1]
`limit(long)`	限制返回前 N 个元素	`list.stream().limit(3)` → [1,2,3]
`skip(long)`	跳过前 N 个元素	`list.stream().skip(3)` → [4,5,6]
`peek(Consumer)`	遍历元素（无修改，常用于调试）	`list.stream().peek(System.out::println).count()` → 打印所有元素后返回计数

3. 终端操作（触发计算）

终端操作会消耗 Stream，返回非 Stream 结果（或无返回），常用操作如下：

（1）遍历与消费

forEach(Consumer)：遍历元素（最常用，有副作用）；

forEachOrdered(Consumer)：并行流中保证遍历顺序（串行流与 forEach 无区别）。

List<String> list = Arrays.asList("a", "b", "c");
list.stream().forEach(System.out::println); // 输出 a b c（并行流可能乱序）
list.parallelStream().forEachOrdered(System.out::println); // 保证顺序 a b c

（2）聚合统计（返回单个值）

操作	功能描述	示例
`count()`	返回元素个数（`long` 类型）	`Stream.of(1,2,3).count()` → 3
`max(Comparator)`	返回最大值（`Optional` 类型，避免空指针）	`Stream.of(1,2,3).max(Integer::compare)` → Optional[3]
`min(Comparator)`	返回最小值	`Stream.of(1,2,3).min(Integer::compare)` → Optional[1]
`findFirst()`	返回第一个元素（`Optional`，串行流高效）	`Stream.of(1,2,3).findFirst()` → Optional[1]
`findAny()`	返回任意一个元素（并行流高效）	`Stream.of(1,2,3).parallel().findAny()` → 可能是 1/2/3
`anyMatch(Predicate)`	是否存在满足条件的元素（短路求值）	`Stream.of(1,2,3).anyMatch(n -> n > 2)` → true
`allMatch(Predicate)`	所有元素是否满足条件（短路求值）	`Stream.of(1,2,3).allMatch(n -> n > 0)` → true
`noneMatch(Predicate)`	所有元素是否都不满足条件（短路求值）	`Stream.of(1,2,3).noneMatch(n -> n < 0)` → true

短路求值：一旦满足条件就停止计算（如 anyMatch 找到一个满足条件的元素就返回，无需遍历所有）。

（3）收集结果（`collect(Collector)`）

将 Stream 结果收集为集合、数组、Map 等，由Collectors 工具类提供了大量预定义收集器。

① 收集为集合

List<Integer> list = Stream.of(1,2,3).collect(Collectors.toList()); // 默认 ArrayList
List<Integer> linkedList = Stream.of(1,2,3).collect(Collectors.toCollection(LinkedList::new)); // 指定集合
Set<Integer> set = Stream.of(1,2,2).collect(Collectors.toSet()); // 去重集合

② 收集为数组

Integer[] arr = Stream.of(1,2,3).toArray(Integer[]::new); // 推荐（类型安全）
Object[] objArr = Stream.of(1,2,3).toArray(); // 不推荐（需要强转）

③ 收集为 Map（注意 key 唯一）

List<String> userList = Arrays.asList("id1:张三", "id2:李四", "id3:王五");
// key: 拆分后的id，value: 拆分后的姓名
Map<String, String> userMap = userList.stream()
    .map(user -> user.split(":")) // 拆分后为 String[]
    .collect(Collectors.toMap(
        arr -> arr[0], // key 映射
        arr -> arr[1]  // value 映射
    ));
// 若 key 可能重复，需指定冲突解决策略（如保留第一个）
Map<String, String> userMapSafe = userList.stream()
    .map(user -> user.split(":"))
    .collect(Collectors.toMap(
        arr -> arr[0],
        arr -> arr[1],
        (v1, v2) -> v1 // 冲突时保留第一个值
    ));

④ 聚合统计（求和、平均值等）

List<Integer> numList = Arrays.asList(1,2,3,4,5);
// 求和（Integer 类型）
int sum = numList.stream().collect(Collectors.summingInt(Integer::intValue));
// 求和（Long 类型，避免溢出）
long sumLong = numList.stream().collect(Collectors.summingLong(Integer::longValue));
// 平均值（返回 Double）
double avg = numList.stream().collect(Collectors.averagingInt(Integer::intValue));
// 统计信息（总和、平均值、计数等）
IntSummaryStatistics stats = numList.stream().collect(Collectors.summarizingInt(Integer::intValue));
System.out.println("总和：" + stats.getSum());
System.out.println("平均值：" + stats.getAverage());
System.out.println("最大值：" + stats.getMax());

⑤ 分组（`groupingBy`）

按指定条件分组，返回 Map<分组键, List<元素>>：

List<String> fruitList = Arrays.asList("苹果", "香蕉", "橙子", "红苹果", "青苹果");
// 按首字符分组
Map<Character, List<String>> groupByFirstChar = fruitList.stream()
    .collect(Collectors.groupingBy(fruit -> fruit.charAt(0)));
// 结果：{苹=[苹果, 红苹果, 青苹果], 香=[香蕉], 橙=[橙子]}

// 分组后统计个数（返回 Map<分组键, 计数>）
Map<Character, Long> countByGroup = fruitList.stream()
    .collect(Collectors.groupingBy(
        fruit -> fruit.charAt(0),
        Collectors.counting() // 下游收集器：统计个数
    ));
// 结果：{苹=3, 香=1, 橙=1}

⑥ 拼接字符串（`joining`）

List<String> strList = Arrays.asList("a", "b", "c");
String join1 = strList.stream().collect(Collectors.joining()); // "abc"
String join2 = strList.stream().collect(Collectors.joining(",")); // "a,b,c"
String join3 = strList.stream().collect(Collectors.joining(",", "[", "]")); // "[a,b,c]"

五、并行

Stream 天生支持并行，只需将串行流转为并行流，无需手动处理线程：

集合直接获取并行流：list.parallelStream()；
串行流转为并行流：stream.parallel()。

注意：

线程安全：若中间操作修改外部变量（如全局集合），需保证线程安全（建议使用无状态操作，避免副作用）；
顺序性：并行流的 forEach 可能乱序，需顺序则用 forEachOrdered（但会降低并行效率）；
性能考量：数据量小时，并行流的线程开销可能超过收益（建议数据量 > 1000 时使用）
处理效率：数据量越大，并行流优势越明显（小数据量可能因线程开销导致效率更低）。
非线程安全的数据源：如 ArrayList 是线程安全的（读取时），但 LinkedList 并行效率低，建议使用数组或 ArrayList 作为并行流数据源。

示例：

List<Integer> numList = IntStream.range(1, 1000000).boxed().collect(Collectors.toList());

// 串行流求和（耗时较长）
long serialSum = numList.stream().mapToLong(Integer::longValue).sum();

// 并行流求和（效率更高）
long parallelSum = numList.parallelStream().mapToLong(Integer::longValue).sum();

六、注意

忘记终端操作：只写中间操作不写终端操作，代码不会执行；
重复使用 Stream：一个 Stream 执行终端操作后关闭，再次使用会抛出 IllegalStateException；
并行流的副作用：并行流中修改外部非线程安全的变量（如 ArrayList），会导致数据错乱；
滥用 peek：peek 设计用于调试，不建议用于修改元素（虽然语法允许）；

忽略 Optional 空指针风险：max、findFirst 等返回 Optional，直接调用 get() 可能抛出 NoSuchElementException，建议用 orElse() 或 ifPresent() 处理空值：

// 错误：无元素时抛出异常
Integer max = Stream.empty().max(Integer::compare).get();
// 正确：无元素时返回默认值 0
Integer maxSafe = Stream.empty().max(Integer::compare).orElse(0);