# distspmv_balanced

**Repository Path**: qmsg/distspmv_balanced

## Basic Information

- **Project Name**: distspmv_balanced
- **Description**: 分布式稀疏矩阵-向量乘法（SpMV）负载均衡算法的复现实现。
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-05-15
- **Last Updated**: 2026-06-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# DistSpMV_Balanced

复现论文 *Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication* (CCGrid 2023)。通过 METIS 图划分重排序、对角块扩展、Flag 去重通信和远程非零元均衡，实现分布式 SpMV 的计算与通信双重负载均衡。

## 环境依赖

```bash
sudo apt install libopenmpi-dev openmpi-bin libmetis-dev
```

## 快速开始

```bash
# 编译
make

# 计算并保存结果
mpirun -np 4 ./compute test/cant/cant.mtx 4

# 验证正确性
./verify test/cant/cant.mtx
```

## 运行

```bash
# 计算（完整分布式 SpMV，保存结果到 result/）
mpirun -np <进程数> ./compute <矩阵.mtx> <线程数>

# 验证（串行 SpMV，与 compute 结果比较）
./verify <矩阵.mtx>

# 性能对比
bash benchmark/run_benchmark.sh [进程数] [线程数] [矩阵名]

# 直接运行 benchmark 二进制
mpirun -np <进程数> benchmark/naive     <矩阵.mtx> <线程数>
mpirun -np <进程数> benchmark/reordered <矩阵.mtx> <线程数>
mpirun -np <进程数> benchmark/ablation  <矩阵.mtx> <线程数> [--no-metis] [--no-dedup] [--no-nnz-balance]
```

## 算法流程（论文完整实现）

**预处理阶段** (rank 0 执行，按需分发给各进程)
- 1.1 METIS 图划分 → 行列重排序 → 向量重排
- 1.2 统计对角块非零数 → 确定 lower_bound
- 1.3 Algorithm 1: 扫描线扩展对角块右边界
- 1.4 本地矩阵按行均分 → 每进程提取 A_local（对角块行）
- 1.5 远程矩阵按非零元数均分 → 每进程提取 A_remote（按 remote_nnzbnd）
- 1.6 Algorithm 2: 从 A_remote 计算通信计划
- 1.7 数据分发: MPI_Send/Recv 将 A_local + A_remote + 元数据发送给各进程

**运行时阶段** (每个进程独立执行)
- 2.1 Algorithm 3: MPI_Alltoall 交换计数 + Isend/Irecv 交换 x 分量 → 扩展本地向量
- 2.2 Algorithm 4: A_local × x（对角块计算）+ A_remote × x（远程计算），OpenMP 并行
- 2.3 远程 y 中间结果 Send 给行主人进程，行主人 += 累加到本地 y

**结果恢复阶段**
- 3.1 MPI_Bcast 行数信息 + MPI_Gatherv 收集子向量到 rank 0
- 3.2 逆置换恢复原始行序，保存到 result/y.txt

## 项目结构

```
distspmv_balanced/
├── compute              # 计算可执行文件（MPI+OpenMP）
├── verify               # 验证可执行文件（串行）
├── Makefile
├── include/             # 头文件
│   ├── config.h         # 编译期配置 (ITERATIONS)
│   ├── csr.h            # CSR 稀疏矩阵
│   ├── mm_reader.h      # Matrix Market 读取
│   ├── metis_wrapper.h  # METIS 图划分 + 重排序
│   ├── partition.h      # 对角块扩展与本地/远程提取
│   ├── comm.h           # MPI 通信计划
│   └── spmv.h           # 并行 SpMV 计算
├── src/                 # 核心源码
│   ├── compute.c        # 主流程 (预处理→分发→通信→SpMV→Reduce→保存)
│   ├── test.c           # 串行验证（读取 result/y.txt 与参考值比较）
│   ├── csr.c            # CSR 构造/转换
│   ├── mm_reader.c      # .mtx 解析、对称补全、重复合并
│   ├── metis_wrapper.c  # METIS_PartGraphKway + 行列重排序
│   ├── partition.c      # Algorithm 1 + 本地/远程提取 + 远程nnz均衡
│   ├── comm.c           # Algorithm 2+3: 通信计划 + Isend/Irecv + x扩展
│   └── spmv.c           # Algorithm 4: OpenMP 单循环 SpMV
├── benchmark/           # 性能对比
│   ├── main_naive.c     # DistSpMV: 朴素分布式 (无重排/去重/均衡)
│   ├── main_reordered.c # DistSpMV_Reordered: 仅 METIS 重排
│   ├── main_balanced_ablation.c  # 带消融开关
│   └── run_benchmark.sh # 方法对比脚本
├── result/              # 计算结果输出目录
└── test/
    ├── run_test.sh      # 正确性测试
    └── cant/            # 62,451x62,451, nnz=4,007,383 (结构力学)
```

## 核心差异：与 Distribution SpMV_Balanced 论文一致

| 方面 | 当前实现 | 说明 |
|------|---------|------|
| 数据分布 | MPI_Send/Recv 按需分发 | 每进程只拿到自己的 A_local + A_remote |
| 内存/通信复杂度 | O(nnz) | 非 O(nnz × np) |
| 本地矩阵 | 每进程仅对角块行 (subrowadd) | 通过 `partition_extract_local()` 提取 |
| 远程矩阵 | 按 nnz 均分 (remote_nnzbnd) | 独立 CSR，通过 `partition_extract_remote()` 提取 |
| x 通信 | Algorithm 3: MPI_Alltoall + Isend/Irecv | 仅交换远程列所需的 x 分量 |
| y 汇总 | 远程 y Send 给行主人 += 累加，再 MPI_Gatherv 收集 | 与论文 Section II-B 步骤 4-5 一致 |
| 结果保存 | 逆置换后写入 result/y.txt | 可用 verify 验证 |

## 性能对比

| 二进制 | 对应论文 | 说明 |
|--------|----------|------|
| `benchmark/naive` | DistSpMV | 行均分，无任何优化 |
| `benchmark/reordered` | DistSpMV_Reordered | METIS 重排 + 均匀行列切分 |
| `compute` | DistSpMV_Balanced | 完整论文算法 |
| `benchmark/ablation` | — | 独立实现，支持消融开关 |

Ablation flags:

| Flag | 效果 |
|------|------|
| `--no-metis` | 均匀分块替代 METIS 划分 |
| `--no-dedup` | 通信不去重 |
| `--no-nnz-balance` | 远程按行均分而非按 nnz |

## 时间测量

| 指标 | 测量范围 | 对应论文 |
|------|----------|----------|
| Preprocessing time | 预处理阶段（1.1-1.7） | Figure 7 |
| Avg time per iteration | 50 次迭代平均（通信 + 计算） | Section IV-A |

迭代次数通过 `include/config.h` 中 `#define ITERATIONS 50` 控制。

## 添加测试矩阵

从 [SuiteSparse Matrix Collection](https://sparse.tamu.edu/) 下载 Matrix Market (.mtx) 格式矩阵，解压到 `test/<矩阵名>/` 目录。两种使用方式：

```bash
# 方式一：第三个参数直接指定矩阵名
bash test/run_test.sh 2 1 cant
bash benchmark/run_benchmark.sh 4 2 cant

# 方式二：修改脚本中的 MATRIX 变量
# 编辑 test/run_test.sh 或 benchmark/run_benchmark.sh 中的 MATRIX 行
```

输入的 .mtx 文件格式：

```
%%MatrixMarket matrix coordinate real general
M N NNZ
row col value
...
```

代码自动处理：行列索引 1→0 转换、对称矩阵补全、重复条目合并求和。