NPU算子开发
Description
Enflame is a company designing and implementing neural network processing unit (NPU) and corresponding software to enable continuous innovation on AI applications.
Enflame software department is responsible for software stack (including driver, SDK, distributed train/inference framework, etc.) to support NPU for cloud AI server.
In this role you will contribute to the distributed framework required to train/inference AI applications based on high performance NPU in the cloud environment. We need our engineers to be versatile, display innovation qualities and be enthusiastic to tackle new problems across the full-stack as we continue to push technology forward.
Responsibilities:
Neural network operator design and development with C/C++/Assembly.
Operator verification on simulation/emulation platform and real hardware.
Operator performance benchmarking.
● Operator performance profiling and optimization.
Minimum Qualifications:
CS/EE MS 3+ years relate work experience.
Strong coding skills in C/C++ and Python.
Having solid math foundation.
Experienced in DSP/x86/GPU code performance optimization.
Familiar with popular framework like Tensorflow, Caffe, Mxnet.
Familiar with dev/build utility (like git, CMake, Bazel etc.) and shell script(like bash).
Good communication skill and technical leadership.
span>
Preferred Qualifications:
Familiar with popular CNN, RNN models like Resnet50, Googlenet, VGG16.
Experience in using HW emulation platform, including Palladium, Zebu, or FPGA.
Experience in instruction pipeline optimization.
Experience in DMA performance optimization.
Experience in CUDA C programing and performance tuning.
NPU软件性能优化
Description
Enflame is a company designing and implementing neural network processing unit (NPU) and corresponding software to enable continuous innovation on AI applications.
Enflame software department is responsible for software stack (including driver, SDK, distributed train/inference framework, etc.) to support NPU for cloud AI server.
In this role you will contribute to the distributed framework required to train/inference AI applications based on high performance NPU in the cloud environment. We need our engineers to be versatile, display innovation qualities and be enthusiastic to tackle new problems across the full-stack as we continue to push technology forward.
Responsibilities:
Training/inference performance requirement analysis
Training/inference performance modelling and mini-benchmark design
● Benchmark application performance profiling and optimization
Support hardware designers for hardware performance goal
Minimum Qualifications:
CS/EE MS 3yrs+ relate work experience.
Familiar with popular framework like Tensorflow, Caffe, Mxnet.
Familiar with popular CNN, RNN models like Resnet50, Googlenet, VGG16.
Experienced in NPU/GPGPU/HPC performance profiling and tuning experience.
Strong coding skills in C/C++ and Python.
Familiar with dev/build utility (like git, CMake, Bazel etc.) and shell script(like bash).
Good communication skill and technical leadership
span>
Preferred Qualifications:
Experience in using HW emulation platform, including Palladium, Zebu, or FPGA.
Experience with Resnet50 training performance tuning.
Experience in NPU training performance profiling and optimization
Experience in CUDA/cuDNN development and performance tuning.
Experience in GPU driver performance tuning.
AI软件性能优化
Deep learning training/inference performance tuning
GPU Compute/OpenCL/CUDA benchmark performance tuning
Responsibilities:
Design SW performance model for a given deep learning HW platform
Co-design HW architecture for performance goal
Deep learning benchmark performance tuning
神经网络算法和实现高级工程师
职位描述和职责:
燧原科技是一家设计云端神经网络集成电路的高科技公司。我们为各种云端神经网络应用提供最高性能-成本比,高度可配置的硬件解决方案和相应的软件栈。这个职位负责把神经网络前沿研究和软硬件体系结构设计联系起来。其主要责任包括:
追踪研究领域和应用中的最新神经网络技术,重现论文和公开代码中所公布的结果;
构建算法开发框架和测试平台;
分析软硬件划分的接口;
为硬件实现提供算法原型并进行优化;
预测算法级的性能;
职位要求
在微积分,线性代数,随机理论和最优化技术方面具有扎实的数学基础
研究生学历以上,具有至少5年从事神经网络相关研究和应用的经验
能熟练地使用神经网络领域的技术语言进行交流
精通C++(至少C++11以上)和Python (2.0和3.0)
精通Linux环境和script开发
知道如何修改CAFFE(至少CAFFE1.0,最好2.0)以加入自定义的层和算法
具有使用CUDA加速神经网络计算的经验
有关Tensorflow, MXNet以及其它驻留神经网络应用框架的知识是一个加分项
具有视觉,音频或者一般机器智能应用经验是一个加分项
具有集成电路或驱动开发知识或经验是一个加分项
具有Matlab技能是一个加分项
软件工程师
这一职位需要你实现基于高性能神经网络芯片的云端分布式训练和推理框架。我们期望你是一个多面手,富有创造力,热衷于解决可能遇到的所有软件问题,不断推动团队的软件技术能力提升。
职位基本要求
● 计算机科学或电子工程学本科毕业;
● 理解分布式计算概念和架构以及相应的软件栈(linux操作系统,RPC,Docker等);
● 熟练掌握C/C++,Python编程;
● 熟悉驱动软件开发工具和脚本语言(例如Makefiels, CMake, Bazel, bash等);
职位要求加分项
● 计算机科学或电子工程学硕士毕业或三年以上相关工作经验;
● 基于TensorFlow的神经网络应用开发经验;
● Docker和Kubernetes开发经验;
● GPU驱动或应用(CUDA库)开发经验;
● 设计和实现针对特定硬件的软件开发工具的经验(Compiler, Debugger, Profiler等);
职责
该职位负责神经网络芯片驱动程序的开发和集成。具体包括以下一到多项:
● 设计和实现NPU芯片驱动和Tensorflow后端集成;
● 构建基于Docker的神经网络服务器;
● 构建基于Kubernetes的分布式神经网络训练/推理框架;
● 分析和优化分布式神经网络框架的性能;
● 测试分布式神经网络框架的方法和工具;
● NPU模拟器的设计与实现;
● 分布式神经网络框架模拟器的设计与实现;
6、中间件软件工程师
这一职位需要你实现基于高性能神经网络芯片的云端分布式训练和推理框架。我们期望你是一个学习能力强,富有创造力,热衷于解决问题,不断推动团队的软件技术能力提升。
职责
该职位负责神经网络芯片驱动程序的开发和集成。具体包括以下一到多项:
● 扩展Tensorflow/Caffee/PyTorch等AI框架的后端,实现对新神经网络计算设备的支持;
● 分析和优化AI框架的性能;
● 针对神经网络计算设备,设计和调优AI框架调度算法和模块;
● 设计实现支持神经网络框架的设备模拟器;
● 测试和评估AI框架的方法、工具及benchmark;
职位基本要求
● 计算机或软件相关学科本科毕业;
● 理解AI框架及常见的神经网络模型;
● 熟练掌握C/C++,Python编程;
● 熟悉软件开发工具和脚本语言(如git, CMake, Bazel, bash等);
● 熟悉软件开发、发布和管理流程(如敏捷开发,缺陷管理,CI/CD概念等);
职位要求加分项
● 计算机科学或电气工程学硕士毕业或三年以上相关工作经验;
● 基于TensorFlow的神经网络应用开发和调优经验;
● GPU驱动或应用(CUDA和cuDNN库)开发经验;
● LLVM相关开发经验;
神经网络芯片编译器软件工程师
职位基本要求
● 计算机科学或电子工程学本科毕业;
● 理解计算机处理器概念和体系结构以及相应的软件栈(Linux操作系统,驱动,编译器等);
● 熟练掌握C/C++,Python编程;
● 把熟悉编译器理论与开发;
● 熟悉编译器软件开发工具和脚本语言(例如Makefiels, Bazel, bash等);
● 熟悉软件开发、验证、发布和管理流程(如敏捷开发,缺陷管理,CI/CD概念等);
职位要求加分项
● 计算机科学或电子工程学硕士毕业或三年以上相关工作经验;
● LLVM, XLA等编译器框架的经验;
● 设计和实现针对特定硬件的软件开发工具的经验(Compiler, Linker,Debugger, Profiler等);
● CUDA, cuDNN, OpenCL应用开发和优化经验;
● GCC或Shader Compiler相关的开发经验;
● DSP/GPU等异构计算平台下编译工具链研发经验;
职责
该职位负责神经网络芯片驱动程序的开发和集成。具体包括以下一到多项:
● 完成NPU芯片AI算子汇编实现并和TensorFlow后端集成;
● 研发NPU编译器工具链(compiler, code generator, un/assembler等);
● 基于benchmark验证和调优NPU编译器工具链;
● 参与NPU体系架构设计和优化设计;
神经网络芯片驱动软件工程师
职位基本要求
● 计算机科学或电子工程学本科毕业;
● 理解计算机处理器概念和体系结构以及相应的软件栈(linux操作系统,编译器等);
● 熟练掌握C/C++编程;
● 熟练掌握Linux 内核驱动或用户态驱动的开发及相关工具链;
职位要求加分项
● 计算机科学或电子工程学硕士毕业或本科两年以上相关工作经验;
● GPU或NPU内核驱动开发经验;
● Docker开发经验;
● OpenGL, DirectX, Vulkan, OpenCL, CUDA驱动开发经验;
● 设计和实现针对特定硬件的软件开发工具的经验(Debugger, Profiler等);
职责
该职位负责神经网络芯片驱动程序的开发和集成。具体包括:
● 针对神经网络的使用场景设计和实现NPU芯片的Linux内核和用户态驱动;
● 测试驱动软件的方法和工具;
● 分析和优化驱动程序的性能;
● 可能参与NPU模拟器的设计与实现;