2025
|
Matcha: A Language and Compiler for Backtracking-based Subgraph Matching.
Yihua Wei, Lihan Hu, and Peng Jiang.
IEEE International Parallel and Distributed Processing Symposium (IPDPS).
|
2025
|
A Memory-efficient and Computation-balanced Lossy Compressor on Wafer-scale Engine.
Shihui Song, Robert Underwood, Sheng Di, Yafan Huang, Peng Jiang, and Franck Cappello.
IEEE International Parallel and Distributed Processing Symposium (IPDPS).
|
2025
|
Improving Accuracy and Efficiency of Graph Embedding Training with Fine-grained Parameter Management.
Lihan Hu and Peng Jiang.
IEEE International Parallel and Distributed Processing Symposium (IPDPS).
|
2024
|
GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs.
Yihua Wei and Peng Jiang.
IEEE International Parallel and Distributed Processing Symposium (IPDPS).
|
2024
|
cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding.
Lihan Hu, Jing Li, and Peng Jiang.
IEEE International Parallel and Distributed Processing Symposium (IPDPS).
|
2024
|
CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2.
Shihui Song, Yafan Huang, Peng Jiang, Xiaodong Yu, Weijian Zheng, Sheng Di, Qinglei Cao, Yunhe Feng, Zhen Xie, and Franck Cappello.
International Symposium on High-Performance Parallel and Distributed Computing (HPDC).
|
2023
|
PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework.
Jiya Su, Peng Jiang, and Rujia Wang.
CoRR.
|
2023
|
End-to-End LU Factorization of Large Matrices on GPUs.
Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath.
ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP).
|
2022
|
STMatch: Accelerating Graph Pattern Matching on GPU with Stack-Based Loop Optimizations.
Yihua Wei and Peng Jiang.
International Conference on High Performance Computing, Networking, Storage and Analysis (SC).
|
2022
|
SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation.
Peng Jiang, Yihua Wei, Jiya Su, Rujia Wang, and Bo Wu.
International Conference on Parallel Architectures and Compilation Techniques (PACT).
|
2022
|
Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training.
Peng Jiang, Lihan Hu, and Shihui Song.
Advances in Neural Information Processing Systems (NeurIPS).
|
2022
|
Scaling and Selecting GPU Methods for All Pairs Shortest Paths Computations.
Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath.
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
|
2022
|
Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs.
Shihui Song and Peng Jiang.
International Conference on Supercomputing (ICS).
|
2021
|
Scaling Sparse Matrix Multiplication on CPU-GPU Nodes.
Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath.
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
|
2021
|
Exploring PIM Architecture for High-Performance Graph Pattern Mining.
Jiya Su, Linfeng He, Peng Jiang, and Rujia Wang.
IEEE Computer Architecture Letters 20(2).
|
2021
|
Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks.
Peng Jiang and Masuma Akter Rumi.
CoRR.
|
2020
|
Scaling out speculative execution of finite-state machines with parallel merge.
Yang Xia, Peng Jiang, and Gagan Agrawal.
25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
|
2020
|
A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs.
Peng Jiang, Changwan Hong, and Gagan Agrawal.
25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
|
2020
|
Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning.
Masuma Akter Rumi, Xiaolong Ma, Yanzhi Wang, and Peng Jiang.
International Conference on Parallel Architectures and Compilation Techniques (PACT).
|
2020
|
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning.
Peng Jiang and Gagan Agrawal.
CoRR.
|
2019
|
A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction.
Gangyi Zhu, Peng Jiang, and Gagan Agrawal.
28th International Conference on Parallel Architectures and Compilation Techniques (PACT).
|
2019
|
Enabling prefix sum parallelism pattern for recurrences with principled function reconstruction.
Yang Xia, Peng Jiang, and Gagan Agrawal.
International Conference on Compiler Construction (CC).
|
2018
|
Revealing parallel scans and reductions in recurrences through function reconstruction.
Peng Jiang, Linchuan Chen, and Gagan Agrawal.
International Conference on Parallel Architectures and Compilation Techniques (PACT).
|
2018
|
Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances.
Peng Jiang and Gagan Agrawal.
International Symposium on Code Generation and Optimization (CGO).
|
2018
|
A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication.
Peng Jiang and Gagan Agrawal.
Advances in Neural Information Processing Systems.
|
2017
|
Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation.
Peng Jiang and Gagan Agrawal.
International Conference on Supercomputing (ICS).
|
2017
|
Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation.
Peng Jiang and Gagan Agrawal.
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
|
2016
|
Exploiting recent SIMD architectural advances for irregular applications.
Linchuan Chen, Peng Jiang, and Gagan Agrawal.
International Symposium on Code Generation and Optimization (CGO).
|
2016
|
Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications.
Peng Jiang, Linchuan Chen, and Gagan Agrawal.
International Conference on Supercomputing (ICS).
|