Breaking boundaries: can a unified hardware abstraction layer simplify transformer deployments on edge devices?

Mehrdad Zakershahrak*, Samira Ghodratnama

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

The deployment of transformer models on edge devices like smartphones and tablets is pivotal for leveraging machine learning benefits in real-world scenarios. However, it brings forth challenges including hardware compatibility, memory efficiency, energy efficiency, and real-time performance. We introduce a versatile Hardware Abstraction Layer (HAL) to (1) bridge pre-trained transformer models with the target hardware for optimized deployment, and (2) incorporate intermediate representations (IR) as a crucial element. The IR facilitates seamless execution of models across diverse hardware backends, ensuring enhanced privacy, security, and functionality, especially in regions with limited internet connectivity. Our HAL, endowed with configurable parameters, dynamic model optimizations, and a modular design, caters to varied performance objectives, offering a unified layer that eases the deployment of IR while focusing on user-specified performance priorities. The main contribution of this work is the introduction of IR within the HAL framework, pushing the frontier in edge-device machine learning deployments to focus on latency, energy efficiency, or memory usage. Our results exhibit that the proposed HAL, with its IR component, significantly trims down deployment time and boosts inference efficiency, without compromising model accuracy on iPhone devices.

Original languageEnglish
Title of host publicationService-oriented computing
Subtitle of host publication ICSOC 2023 Workshops
EditorsFlavia Monti, Pierluigi Plebani, Naouel Moha, Hye-young Paik, Johanna Barzen, Gowri Ramachandran, Devis Bianchini, Damian A. Tamburri, Massimo Mecella
Place of PublicationSingapore
PublisherSpringer, Springer Nature
Pages62-71
Number of pages10
ISBN (Electronic)9789819709892
ISBN (Print)9789819709885
DOIs
Publication statusPublished - 2024
EventScientific satellite events held in conjunction with the International Conference on Service-Oriented Computing (21st : 2023) - Rome, Italy
Duration: 28 Nov 20231 Dec 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14518
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceScientific satellite events held in conjunction with the International Conference on Service-Oriented Computing (21st : 2023)
Abbreviated titleICSOC 2023
Country/TerritoryItaly
CityRome
Period28/11/231/12/23

Keywords

  • Deep Learning Model Optimization
  • Hardware Abstraction Layer (HAL)
  • Performance Optimization
  • Real-Time Performance Monitoring
  • ultra-low edge-device inference

Fingerprint

Dive into the research topics of 'Breaking boundaries: can a unified hardware abstraction layer simplify transformer deployments on edge devices?'. Together they form a unique fingerprint.

Cite this