主会场会议日程

8月13日上午日程

时间 大会特邀报告题目 报告人 主持人
08:40-09:00 会议开幕 武成岗研究员
中科院计算技术研究所
09:00-9:40 存算融合的新型计算机系统研究与思考
郑纬民院士
清华大学
9:40-10:20 Data flow under the von Neumann Computer Architecture
孙贤和教授
Illinois Institute of Technology
10:20-10:30 会议休息
10:30-11:10 指令系统的自主与兼容 胡伟武研究员
中科院计算技术研究所
李涛教授
南开大学
11:10-11:50 软硬件协同的系统安全研究 陈海波教授
上海交通大学
11:50-12:00 会议讨论

8月14日上午日程

时间 大会特邀报告题目 报告人 主持人
09:00-9:40 光子计算机初探
孙凝晖院士
中国科学院计算技术研究所
李东升研究员
国防科技大学
9:40-10:20 Efficient Deep Leaning at Scale
陈怡然教授
Duke University
10:20-10:30 会议休息
10:30-11:10 Versal: The Xilinx Adaptive Compute Acceleration Platforms (ACAP) Kees Vissers
Xilinx Fellow
李超研究员
上海交通大学
11:10-11:50 计算机体系结构的平衡设计思想 窦勇研究员
国防科技大学
11:50-12:10 大会颁奖

大会特邀报告

郑纬民 院士

报告题目: 存算融合的新型计算机系统研究与思考

报告摘要: 存储与计算融合技术被认为具有突破传统“冯诺依曼”瓶颈的巨大潜力,是发展下一代高性能计算机体系结构的颠覆性思路。目前,基于忆阻器的存算融合技术不断取得重要进展,展现出迅猛的发展势头。 从多种新兴的存算融合技术入手,重点介绍了基于忆阻器的存储与计算融合理论及实现技术的研究进展,包括其应用于深度学习、类脑计算,以及某些通用计算领域的最新工作,并分析了目前存在的问题。进一步的,展望了新一代存算融合计算机系统在计算理论、体系结构、基础软件等方面的发展方向,介绍了清华大学计算机系在这些方面的工作。

报告人简介: 中国工程院院士,清华大学计算机系教授。1970年毕业于清华大学并留校任教,1982年获硕士学位。曾任中国计算机学会第十届理事长。目前担任《大数据》和《Big Data Mining and Analytics》期刊主编。长期从事高性能计算机体系结构、并行算法和系统研究。在高性能存储系统领域,率先研制出具有自主知识产权的国产网络存储系统。在高性能计算机体系结构领域,在国内率先研制并成功应用集群架构高性能计算机。在大规模并行算法与应用方面,在国产神威太湖之光上研制的极大规模天气预报应用获得ACM的Gordon Bell奖。获国家科技进步一等奖1次,获国家科技进步二等奖2次,获国家发明二等奖1次。2016年获何梁何利基金科学与技术进步奖。与合作者一起发表论文530余篇,著作10部。教学方面长期讲授计算机系统结构课程,2008年被评为国家级精品课程;已编写和出版计算机系统结构教材和专著10本。

孙贤和 教授

报告题目Data flow under the von Neumann Computer Architecture

报告摘要:Big data applications have changed the landscape of computing. A most fundamental question today is how to design a data-centric computer architecture for the big data era. The traditional von Neumann architecture is computing centric. The known dataflow architecture is also computing centric where data flow is arranged to maximize parallel computations. In this study, we first reexamine computer architectures from a data-centric point of view. We find von Neumann preserves a neutrality between computing and memory. The traditional compute-centric focus of von Neumann is due to historical reasons. We next reexamine memory systems of von Neumann from a data-centric point of view and introduce several metrics and methods to model and optimize memory systems. We then extend the memory system analysis to computing and the trade-off between computing and data movement. Finally, we propose the data flow under von Neumann computer architecture approach, noted as dataflowV. DataflowV determines where and how to conduct computing with the consideration of data movement cost. DataflowV is not a new computer architecture. It is a data-centric implementation of the von Neumann computer architecture. We model memory systems performance using four different types of memory cycles. Through an in-depth, hierarchical analysis of these four types of memory cycles and computing cost, we have formulated the trade-off of computing and data movement. While dataflowV is a challenge task, any of its “point solutions” can benefit current computing systems immediately. We will present the concept, modeling, and design of dataflowV and some of its implementation results in this talk. We will also discuss research issues related to dataflowV and memory systems in general.

报告人简介: Dr. Xian-He Sun is a University Distinguished Professor of Computer Science at the Department of Computer Science in the Illinois Institute of Technology (IIT). Before joining IIT, he worked at DoE Ames National Laboratory, at ICASE, NASA Langley Research Center, at Louisiana State University, Baton Rouge, and was an ASEE fellow at Navy Research Laboratories. Dr. Sun is an IEEE fellow and is known for his memory-bounded speedup model, also called Sun-Ni’s Law, for scalable computing. His research interests include high-performance computing, memory and I/O systems, and performance evaluation and optimization. He has over 250 publications and 6 patents in these areas. He is the Associate Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems, a Golden Core member of the IEEE CS society, the past chair of the Computer Science Department at IIT. More information about Dr. Sun can be found at his web site www.cs.iit.edu/~sun/.

胡伟武 研究员

报告题目: 指令系统的自主与兼容

报告摘要:长期以来,关于我国自主CPU应该采用自主设计的指令集还是兼容主流指令集,学术界和工业界众说纷纭。目前国产CPU采用的指令集多种多样,给软件生态的发展带来了很大的困难。本报告介绍了龙芯在长期实践的基础上提出的一个新思路:发展以兼容为重要特征的自主指令集。它一方面通过自主设计掌握发展主动权,提高系统效率;另一方面通过软硬件协同的二进制翻译技术实现对现有主流指令集的高效兼容,缓解新生态发展面临的困难。

报告人简介: 胡伟武, 男, 1968 年11 月出生于浙江永康, 1986 年中学毕业于永康一中,1991 年7 月大学毕业于中国科学技术大学计算机系, 随后免试进入中科院计算所直接攻读博士学位, 师从著名计算机专家夏培肃院士, 1996 年3 月获工学博士学位,博士论文被评为全国百篇优秀论文。现任龙芯中科技术有限公司董事长、总经理,中科院计算所总工程师、研究员、博士生导师,第十一届全国青联常委,第十一届全国人大代表,党的十八大、十九大代表。 胡伟武研究员2001 年起投身于龙芯处理器的研制工作。先后主持完成了我国第一个通用处理器龙芯1 号、第一个64 位通用处理器龙芯2 号、第一个四核处理器龙芯3 号的研制,使我国处理器研制达到世界先进水平。目前,龙芯处理器已经形成系列产品,广泛应用于军工、党政办公、工控、嵌入式等领域,为国家安全和自主信息产业发展做出了贡献。

陈海波 教授

报告题目:软硬件协同的系统安全研究

报告人简介: 陈海波,男,博士,上海交通大学教授,并行与分布式系统研究所所长,教育部领域操作系统工程研究中心主任,国家杰出青年基金获得者、ACM杰出科学家。主要研究领域为操作系统和系统安全。曾获教育部技术发明一等奖(第一完成人)、CCF青年科学家奖、全国优秀博士学位论文奖等。目前担任ACM SIGOPS ChinaSys主席、CCF系统软件专委会副主任、ACM旗舰杂志《Communications of the ACM》中国首位编委与Special Sections领域共同主席、《ACM Transactions on Storage》编委。曾任ACM SOSP 2017年大会共同主席、ACM CCS 2018系统安全领域主席、ACM SIGSAC奖励委员会委员。按照csrankings.org的统计,其在操作系统领域近5年(2015-2019)发表的高水平会议(SOSP/OSDI, EuroSys, Usenix ATC和FAST)论文数居世界第一。

孙凝晖 院士

报告题目: 光子计算机初探

报告摘要:在报告中将介绍电子计算技术遇到的瓶颈,和光子计算技术已有的工作,重点介绍我们在中科院战略性科技先导专项(B类) “大规模光子集成芯片原型验证系统”中关于光子计算芯片的初步探索,讨论未来光子计算可能的发展方向。

报告人简介: 孙凝晖,男,1968年3月出生,1989年毕业于北京大学,1999年于中国科学院计算技术研究所获博士学位,现为中科院计算所研究员,所长,计算机体系结构国家重点实验室主任,博士生导师,中国工程院院士。作为负责人主持曙光2000、曙光3000、曙光4000、曙光5000、曙光6000系列等高性能计算机的研制,并多次获得国家科技进步奖一、二等奖,中国青年科技奖,中国科学院杰出成就奖,中国十大杰出青年,国家基金委杰青等。主要研究领域是高性能计算机、计算机体系结构。

陈怡然 教授

报告题目: Efficient Deep Leaning at Scale

报告摘要:The efficiency of deep learning at scale involves inference efficiency on resource-constrained devices and training efficiency on multiple accelerators. Although state-of-the-art (SOTA) deep neural networks (DNNs) achieve outstanding performance on various domains, their high computation demand and massive number of parameters make it difficult to deploy these SOTA DNNs onto resource-constrained devices. In our recent work PENNI, a DNN model compression framework, we achieve model compactness and hardware efficiency simultaneously by enabling kernel sharing in convolution layers via a small number of basis kernels and alternately adjusting bases and coefficients with sparse constraints. Depth is a key component of DNNs, however, designing depth is heuristic and requires many human efforts. One of our recent works – AutoGrow automates depth discovery in DNNs: starting from a shallow seed architecture, AutoGrow grows new layers if the growth improves the accuracy and stops depth growing if the accuracy is no longer improved. With data forward, error backward and gradient calculation, DNN training is a complicated process with high computation and communication intensity. It is believed that a promising approach is to explore coarse-grained parallelism among multiple performance-bounded accelerators to support DNN training. Distributing computations on multiple heterogeneous accelerators to achieve high throughput and balanced execution, however, remaining challenging. Our recent work AccPar is a principled and systematic method of determining the tensor partition among heterogeneous accelerator arrays. Compared to prior empirical methods, AccPar considers the complete tensor partition space and can reveal previously unknown new parallelism configurations. The enhanced flexibility of tensor partitioning in AccPar allows the flexible ratio of computations to be distributed among accelerators with different performance bounds.

报告人简介: Yiran Chen received B.S and M.S. from Tsinghua University and Ph.D. from Purdue University in 2005. After five years in industry, he joined University of Pittsburgh in 2010 as Assistant Professor and then promoted to Associate Professor with tenure in 2014, held Bicentennial Alumni Faculty Fellow. He now is the Professor of the Department of Electrical and Computer Engineering at Duke University and serving as the director of NSF Industry–University Cooperative Research Center (IUCRC) for Alternative Sustainable and Intelligent Computing (ASIC) and co-director of Duke University Center for Computational Evolutionary Intelligence (CEI), focusing on the research of new memory and storage systems, machine learning and neuromorphic computing, and mobile computing systems. Dr. Chen has published one book and more than 400 technical publications and has been granted 94 US patents. He serves or served the associate editor of several IEEE and ACM transactions/journals and served on the technical and organization committees of more than 50 international conferences. He is now serving as the Editor-in-Chief of IEEE Circuits and Systems Magazine. He received 6 best paper awards, 1 best poster award, and 14 best paper nominations from international conferences and workshops. He is the recipient of NSF CAREER award, ACM SIGDA outstanding new faculty award, the Humboldt Research Fellowship for Experienced Researchers, and the IEEE SYSC/CEDA TCCPS Mid-Career Award. He is the Fellow of IEEE, Distinguished Member of ACM, and a distinguished lecturer of IEEE CEDA.

Kees Vissers Xilinx Fellow

报告题目: Versal: The Xilinx Adaptive Compute Acceleration Platforms (ACAP)

报告摘要:In this presentation I will present the Xilinx Versal Platform. I will show the overall system architecture of the family of devices including the Arm cores (scalar engines), the programmable logic (Adaptable Engines) and the new vector processor cores (AI engines). I will focus on the new AI engines in more detail and I will show some application domains, including Machine Learning and 5G wireless applications. The first device in this family contains 400 of these vector processor cores. These complete devices are supported by an integrated programming environment. The commercial application in 5G processing is showing promising results.

报告人简介: Kees Vissers graduated from Delft University in the Netherlands. He worked at Philips Research in Eindhoven, the Netherlands, for many years. The work included Digital Video system design, HW –SW co-design, VLIW processor design and dedicated video processors. He was a visiting industrial fellow at Carnegie Mellon University, where he worked on early High Level Synthesis tools. He was a visiting industrial fellow at UC Berkeley where he worked on several models of computation and dataflow computing. He was a director of architecture at Trimedia, and CTO at Chameleon Systems. For more than a decade he is heading a team of researchers at Xilinx, including a significant part of the Xilinx European Laboratories. The research topics include next generation programming environments for processors and FPGA fabric, high-performance video systems, machine learning applications and architectures, wireless applications and new datacenter applications. He has been instrumental in the High-Level Synthesis technology and one of the technical leads in the architecture of the AI engines technology. He is a Fellow at Xilinx.

窦勇 研究员

报告题目: 计算机体系结构的平衡设s计思想

报告摘要:本报告首先介绍计算机体系结构设计方法方面的平衡设计思想,然后结合计算机体系结构发展历程分析数据流计算机体系结构的发展脉络,从数据流计算机起源到CPU、GPU中数据流计算思想的体现,分析定制智能算法加速器的优势,最后展望人工智能算法发展对计算机体系结构的影响。

报告人简介: 博士生导师,国防科技大学计算机学院并行与分布处理重点实验室研究员,中国计算机学会会士。主要研究方向为高性能计算(并行计算、可重构计算等)、智能计算(机器学习、深度学习等)。国家自然科学基金杰出青年基金获得者,军队育才奖金奖获得者,多次担任国际学术会议的程序委员会委员或主席。先后主持或参与国家自然科学基金等10 多个国家级课题研究,重点开展高性能嵌入式计算、算法加速器异构并行体系结构、遥感卫星图像智能分析等基础研究工作。在TOC、AAAI、IJCAI、FPGA等学术期刊和会议发表学术论文100余篇,培养博士、硕士研究生100余名。