Scholar's Hub

Award-Winning Papers: Systems & Databases

These papers have received best paper awards or distinguished paper awards from renowned computer science conferences in the Systems and Databases fields.

This collection is sourced from each conference. If you notice any errors, please contact us.

Illustration: Trending Papers

Systems

ESEC/FSE

The evolution of type annotations in python: an empirical study

  • L. Grazia, Michael Pradel

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Type annotations and gradual type checkers attempt to reveal errors and facilitate maintenance in dynamically typed programming languages. Despite the availability of these features and tools, it is currently unclear how quickly developers are adopting them, what strategies they follow when doing so, and whether adding type annotations reveals more type errors. This paper presents the first large-scale empirical study of the evolution of type annotations and type errors in Python. The study is based on an analysis of 1,414,936 type annotation changes, which we extract from 1,123,393 commits among 9,655 projects. Our results show that (i) type annotations are getting more popular, and once added, often remain unchanged in the projects for a long time, (ii) projects follow three evolution patterns for type annotation usage -- regular annotation, type sprints, and occasional uses -- and that the used pattern correlates with the number of contributors, (iii) more type annotations help find more type errors (0.704 correlation), but nevertheless, many commits (78.3%) are committed despite having such errors. Our findings show that better developer training and automated techniques for adding type annotations are needed, as most code still remains unannotated, and they call for a better integration of gradual type checking into the development process.

TLDR

The findings show that better developer training and automated techniques for adding type annotations are needed, as most code still remains unannotated, and they call for a better integration of gradual type checking into the development process.

Asynchronous technical interviews: reducing the effect of supervised think-aloud on communication ability

  • Mahnaz Behroozi, Chris Parnin, Chris Brown

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Software engineers often face a critical test before landing a job—passing a technical interview. During these sessions, candidates must write code while thinking aloud as they work toward a solution to a problem under the watchful eye of an interviewer. While thinking aloud during technical interviews gives interviewers a picture of candidates’ problem-solving ability, surprisingly, these types of interviews often prevent candidates from communicating their thought process effectively. To understand if poor performance related to interviewer presence can be reduced while preserving communication and technical skills, we introduce asynchronous technical interviews—where candidates submit recordings of think-aloud and coding. We compare this approach to traditional whiteboard interviews and find that, by eliminating interviewer supervision, asynchronicity significantly improved the clarity of think-aloud via increased informativeness and reduced stress. Moreover, we discovered asynchronous technical interviews preserved, and in some cases even enhanced, technical problem-solving strategies and code quality. This work offers insight into asynchronous technical interviews as a design for supporting communication during interviews, and discusses trade-offs and guidelines for implementing this approach in software engineering hiring practices.

TLDR

This work compares this approach to traditional whiteboard interviews and finds that, by eliminating interviewer supervision, asynchronicity significantly improved the clarity of think-aloud via increased informativeness and reduced stress.

SPINE: a scalable log parser with feedback guidance

  • Xuheng Wang, Xu Zhang, Liqun Li, Shilin He, Hongyu Zhang, Yudong Liu, Ling Zheng, Yu Kang, Qingwei Lin, Yingnong Dang, S. Rajmohan, Dongmei Zhang

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log parsers, we identify two problems. Firstly, it is non-trivial to scale a log parser to a vast number of logs, especially in real-world scenarios where the log data is extremely imbalanced. Secondly, existing log parsers overlook the importance of user feedback, which is imperative for parser fine-tuning under the continuous evolution of log data. To overcome the challenges, we propose SPINE, which is a highly scalable log parser with user feedback guidance. Based on our log parser equipped with initial grouping and progressive clustering,we propose a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data. Besides, we introduce user feedback to make the parser fast adapt to the evolving logs. We evaluated SPINE on 16 public log datasets. SPINE achieves more than 0.90 parsing accuracy on average with the highest parsing efficiency, which outperforms the state-of-the-art log parsers. We also evaluated SPINE in the production environment of Microsoft, in which SPINE can parse 30million logs in less than 8 minutes under 16 executors, achieving near real-time performance. In addition, our evaluations show that SPINE can consistently achieve good accuracy under log evolution with a moderate number of user feedback.

TLDR

This work proposes SPINE, which is a highly scalable log parser with user feedback guidance, and proposes a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data.

Using nudges to accelerate code reviews at scale

  • Qianhua Shan, D. Sukhdeo, Qianying Huang, Seth Rogers, Lawrence Chen, Elise Paradis, Peter C. Rigby, Nachiappan Nagappan

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

We describe a large-scale study to reduce the amount of time code review takes. Each quarter at Meta we survey developers. Combining sentiment data from a developer experience survey and telemetry data from our diff review tool, we address, “When does a diff review feel too slow?” From the sentiment data alone, we learn that 84.7% of developers are satisfied with the time their diffs spend in review. By enriching the survey results with telemetry for each respondent, we determined that sentiment is closely associated with the 75th percentile time in review for that respondent’s diffs, ie those that take more than 24 hours. To encourage developers to act on stale diffs that have had no action for 24 or more hours, we designed a NudgeBot to notify, ie nudge, reviewers. To determine who to nudge when a diff is stale, we created a model to rank the reviewers based on the probability that they will make a comment or perform some other action on a diff. This model outperformed models that looked at files the reviewer had modified in the past. Combining this information with prior author-review relationships, we achieved an MRR and AUC of .81 and .88, respectively. To evaluate NudgeBot in production, we conducted an A/B cluster-randomized experiment on over 30k engineers. We observed substantial statistically significant decrease in both time in review (-6.8%, p=0.049) and time to first reviewer action (-9.9%, p=0.010). We also used guard metrics to ensure that most reviews were still done in fewer than 24 hours and that reviewers still spend the same amount of time looking at diffs, and saw no statistically significant change in these metrics. NudgeBot is now rolled out company wide and is used daily by thousands of engineers at Meta.

TLDR

A large-scale study to reduce the amount of time code review takes is described and a model to rank the reviewers based on the probability that they will make a comment or perform some other action on a diff is created.

Online testing of RESTful APIs: promises and challenges

  • Alberto Martin-Lopez, Sergio Segura, Antonio Ruiz-Cortés

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Online testing of web APIs—testing APIs in production—is gaining traction in industry. Platforms such as RapidAPI and Sauce Labs provide online testing and monitoring services of web APIs 24/7, typically by re-executing manually designed test cases on the target APIs on a regular basis. In parallel, research on the automated generation of test cases for RESTful APIs has seen significant advances in recent years. However, despite their promising results in the lab, it is unclear whether research tools would scale to industrial-size settings and, more importantly, how they would perform in an online testing setup, increasingly common in practice. In this paper, we report the results of an empirical study on the use of automated test case generation methods for online testing of RESTful APIs. Specifically, we used the RESTest framework to automatically generate and execute test cases in 13 industrial APIs for 15 days non-stop, resulting in over one million test cases. To scale at this level, we had to transition from a monolithic tool approach to a multi-bot architecture with over 200 bots working cooperatively in tasks like test generation and reporting. As a result, we uncovered about 390K failures, which we conservatively triaged into 254 bugs, 65 of which have been acknowledged or fixed by developers to date. Among others, we identified confirmed faults in the APIs of Amadeus, Foursquare, Yelp, and YouTube, accessed by millions of applications worldwide. More importantly, our reports have guided developers on improving their APIs, including bug fixes and documentation updates in the APIs of Amadeus and YouTube. Our results show the potential of online testing of RESTful APIs as the next must-have feature in industry, but also some of the key challenges to overcome for its full adoption in practice.

TLDR

The results of an empirical study on the use of automated test case generation methods for online testing of RESTful APIs used the RESTest framework to automatically generate and execute test cases in 13 industrial APIs for 15 days non-stop, resulting in over one million test cases.

Minerva: browser API fuzzing with dynamic mod-ref analysis

  • Chijin Zhou, Quan Zhang, Mingzhe Wang, Lihua Guo, Jie Liang, Zhe Liu, Mathias Payer, Yuting Jiang

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Browser APIs are essential to the modern web experience. Due to their large number and complexity, they vastly expand the attack surface of browsers. To detect vulnerabilities in these APIs, fuzzers generate test cases with a large amount of random API invocations. However, the massive search space formed by arbitrary API combinations hinders their effectiveness: since randomly-picked API invocations unlikely interfere with each other (i.e., compute on partially shared data), few interesting API interactions are explored. Consequently, reducing the search space by revealing inter-API relations is a major challenge in browser fuzzing. We propose Minerva, an efficient browser fuzzer for browser API bug detection. The key idea is to leverage API interference relations to reduce redundancy and improve coverage. Minerva consists of two modules: dynamic mod-ref analysis and guided code generation. Before fuzzing starts, the dynamic mod-ref analysis module builds an API interference graph. It first automatically identifies individual browser APIs from the browser’s code base. Next, it instruments the browser to dynamically collect mod-ref relations between APIs. During fuzzing, the guided code generation module synthesizes highly-relevant API invocations guided by the mod-ref relations. We evaluate Minerva on three mainstream browsers, i.e. Safari, FireFox, and Chromium. Compared to state-of-the-art fuzzers, Minerva improves edge coverage by 19.63% to 229.62% and finds 2x to 3x more unique bugs. Besides, Minerva has discovered 35 previously-unknown bugs out of which 20 have been fixed with 5 CVEs assigned and acknowledged by browser vendors.

TLDR

Minerva is proposed, an efficient browser fuzzer for browser API bug detection that improves edge coverage by 19.63% to 229.62% and finds 2x to 3x more unique bugs.

First come first served: the impact of file position on code review

  • Enrico Fregnan, Larissa Braz, Marco D'Ambros, Gul cCalikli, Alberto Bacchelli

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • August 8, 2022

The most popular code review tools (e.g., Gerrit and GitHub) present the files to review sorted in alphabetical order. Could this choice or, more generally, the relative position in which a file is presented bias the outcome of code reviews? We investigate this hypothesis by triangulating complementary evidence in a two-step study. First, we observe developers’ code review activity. We analyze the review comments pertaining to 219,476 Pull Requests (PRs) from 138 popular Java projects on GitHub. We found files shown earlier in a PR to receive more comments than files shown later, also when controlling for possible confounding factors: e.g., the presence of discussion threads or the lines added in a file. Second, we measure the impact of file position on defect finding in code review. Recruit- ing 106 participants, we conduct an online controlled experiment in which we measure participants’ performance in detecting two unrelated defects seeded into two different files. Participants are assigned to one of two treatments in which the position of the defective files is switched. For one type of defect, participants are not affected by its file’s position; for the other, they have 64% lower odds to identify it when its file is last as opposed to first. Overall, our findings provide evidence that the relative position in which files are presented has an impact on code reviews’ outcome; we discuss these results and implications for tool design and code review.

TLDR

Evidence is provided that the relative position in which files are presented has an impact on code reviews’ outcome; the results and implications for tool design and code review are discussed.

HPCA

DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing

  • Zhe Zhou, Cong Li, Fan Yang, Guangyu Suny

  • 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

  • February 1, 2023

DIMM-based near-memory processing architectures (DIMM-NMP) have received growing interest from both academia and industry. They have the advantages of large memory capacity, low manufacturing cost, high flexibility, compatible form factor, etc. However, inter-DIMM communication (IDC) has become a critical obstacle for generic DIMM-NMP architectures because it involves costly forwarding transactions through the host CPU. Recent research has demonstrated that, for many applications, the overhead induced by IDC may even offset the performance and energy benefits of near-memory processing.To tackle this problem, we propose DIMM-Link, which enables high-performance IDC in DIMM-NMP architectures and supports seamless integration with existing host memory systems. It adopts bidirectional external data links to connect DIMMs, via which point-to-point communication and inter-DIMM broadcast are efficiently supported in a packet-routing way. We present the full-stack design of DIMM-Link, including the hardware architecture, interconnect protocol, system organization, routing mechanisms, optimization strategies, etc. Comprehensive experiments on typical data-intensive tasks demonstrate that the DIMM-Link-equipped NMP system can achieve a 5.93× average speedup over the 16-core CPU baseline. Compared to other IDC methods, DIMM-Link outperforms MCN, AIM, and ABC-DIMM by 2.42×, 1.87×, and 1.77×, respectively. More importantly, DIMM-Link fully considers the implementation feasibility and system integration constraints, which are critical for designing NMP architectures based on modern DDR4/DDR5 DIMMs.

TLDR

DIMM-Link is proposed, which enables high-performance IDC in DIMm-NMP architectures and supports seamless integration with existing host memory systems and fully considers the implementation feasibility and system integration constraints, which are critical for designing NMP architectures based on modern DDR4/DDR5 DIMMs.

Scalable and Secure Row-Swap: Efficient and Safe Row Hammer Mitigation in Memory Systems

  • Jeonghyun Woo, Gururaj Saileshwar, Prashant J. Nair

  • 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

  • December 23, 2022

As Dynamic Random Access Memories (DRAM) scale, they are becoming increasingly susceptible to Row Hammer. By rapidly activating rows of DRAM cells (aggressor rows), attackers can exploit inter-cell interference through Row Hammer to flip bits in neighboring rows (victim rows). A recent work, called Randomized Row-Swap (RRS), proposed proactively swapping aggressor rows with randomly selected rows before an aggressor row can cause Row Hammer.Our paper observes that RRS is neither secure nor scalable. We first propose the ‘Juggernaut attack pattern’ that breaks RRS in under 1 day. Juggernaut exploits the fact that the mitigative action of RRS, a swap operation, can itself induce additional target row activations, defeating such a defense. Second, this paper proposes a new defense Secure Row-Swap mechanism that avoids the additional activations from swap (and unswap) operations and protects against Juggernaut. Furthermore, this paper extends Secure Row-Swap with attack detection to defend against even future attacks. While this provides better security, it also allows for securely reducing the frequency of swaps, thereby enabling Scalable and Secure Row-Swap. The Scalable and Secure Row-Swap mechanism provides years of Row Hammer protection with 3.3× lower storage overheads as compared to the RRS design. It incurs only a 0.7% slowdown as compared to a not-secure baseline for a Row Hammer threshold of 1200.

TLDR

The ‘Juggernaut attack pattern’ that breaks RRS in under 1 day is proposed, and a new defense Secure Row-Swap mechanism is proposed that avoids the additional activations from swap (and unswap) operations and protects against Juggernaut.

SupermarQ: A Scalable Quantum Benchmark Suite

  • T. Tomesh, P. Gokhale, V. Omole, Gokul Subramanian Ravi, Kaitlin N. Smith, Joshua Viszlai, Xin-Chuan Wu, Nikos Hardavellas, M. Martonosi, F. Chong

  • 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

  • February 22, 2022

The emergence of quantum computers as a new computational paradigm has been accompanied by speculation concerning the scope and timeline of their anticipated revolutionary changes. While quantum computing is still in its infancy, the variety of different architectures used to implement quantum computations make it difficult to reliably measure and compare performance. This problem motivates our introduction of SupermarQ, a scalable, hardware-agnostic quantum benchmark suite which uses application-level metrics to measure performance. SupermarQ is the first attempt to systematically apply techniques from classical benchmarking methodology to the quantum domain. We define a set of feature vectors to quantify coverage, select applications from a variety of domains to ensure the suite is representative of real workloads, and collect benchmark results from the IBM, IonQ, and AQT@LBNL platforms. Looking forward, we envision that quantum benchmarking will encompass a large cross-community effort built on open source, constantly evolving benchmark suites. We introduce SupermarQ as an important step in this direction.

TLDR

SupermarQ is the first attempt to systematically apply techniques from classical benchmarking methodology to the quantum domain, and envision that quantum benchmarking will encompass a large cross-community effort built on open source, constantly evolving benchmark suites.

ICSE

"STILL AROUND": Experiences and Survival Strategies of Veteran Women Software Developers

  • S. V. Breukelen, A. Barcomb, Sebastian Baltes, A. Serebrenik

  • ArXiv

  • February 7, 2023

The intersection of ageism and sexism can create a hostile environment for veteran software developers belonging to marginalized genders. In this study, we conducted 14 interviews to examine the experiences of people at this intersection, primarily women, in order to discover the strategies they employed in order to successfully remain in the field. We identified 283 codes, which fell into three main categories: Strategies, Experiences, and Perception. Several strategies we identified, such as (Deliberately) Not Trying to Look Younger, were not previously described in the software engineering literature. We found that, in some companies, older women developers are recognized as having particular value, further strengthening the known benefits of diversity in the workforce. Based on the experiences and strategies, we suggest organizations employing software developers to consider the benefits of hiring veteran women software developers. For example, companies can draw upon the life experiences of older women developers in order to better understand the needs of customers from a similar demographic. While we recognize that many of the strategies employed by our study participants are a response to systemic issues, we still consider that, in the short-term, there is benefit in describing these strategies for developers who are experiencing such issues today.

TLDR

A Qualitative Study on the Implementation Design Decisions of Developers

  • Jenny Liang, Maryam Arab, Minhyuk Ko, Amy J. Ko, Thomas D. LaToza

  • ArXiv

  • January 24, 2023

Decision-making is a key software engineering skill. Developers constantly make choices throughout the software development process, from requirements to implementation. While prior work has studied developer decision-making, the choices made while choosing what solution to write in code remain understudied. In this mixed-methods study, we examine the phenomenon where developers select one specific way to implement a behavior in code, given many potential alternatives. We call these decisions implementation design decisions. Our mixed-methods study includes 46 survey responses and 14 semi-structured interviews with professional developers about their decision types, considerations, processes, and expertise for implementation design decisions. We find that implementation design decisions, rather than being a natural outcome from higher levels of design, require constant monitoring of higher level design choices, such as requirements and architecture. We also show that developers have a consistent general structure to their implementation decision-making process, but no single process is exactly the same. We discuss the implications of our findings on research, education, and practice, including insights on teaching developers how to make implementation design decisions.

TLDR

It is found that implementation design decisions, rather than being a natural outcome from higher levels of design, require constant monitoring of higher level design choices, such as requirements and architecture.

Compatible Remediation on Vulnerabilities from Third-Party Libraries for Java Projects

  • Lyuye Zhang, Chengwei Liu, Zhengzi Xu, Sen Chen, Lingling Fan, Lida Zhao, Jiahui Wu, Yang Liu

  • ArXiv

  • January 20, 2023

With the increasing disclosure of vulnerabilities in open-source software, software composition analysis (SCA) has been widely applied to reveal third-party libraries and the associated vulnerabilities in software projects. Beyond the revelation, SCA tools adopt various remediation strategies to fix vulnerabilities, the quality of which varies substantially. However, ineffective remediation could induce side effects, such as compi- lation failures, which impede acceptance by users. According to our studies, existing SCA tools could not correctly handle the concerns of users regarding the compatibility of remediated projects. To this end, we propose Compatible Remediation of Third-party libraries (C ORAL ) for Maven projects to fix vulnerabilities without breaking the projects. The evaluation proved that C ORAL not only fixed 87 . 56% of vulnerabilities which outperformed other tools (best 75 . 32% ) and achieved a 98 . 67% successful compilation rate and a 92 . 96% successful unit test rate. Furthermore, we found that 78 . 45% of vulnerabilities in popular Maven projects could be fixed without breaking the compilation, and the rest of the vulnerabilities ( 21 . 55% ) could either be fixed by upgrades that break the compilations or even be impossible to fix by upgrading.

TLDR

Compatible Remediation of Third-party libraries (C ORAL) for Maven projects to vulnerabilities without breaking the projects is proposed and it is found that 78% of vulnerabilities in popular Maven Projects could be fixed without breaks the compilation, and the rest of the vulnerabilities could either be broken by upgrades that break the compilations or even be impossible to break by upgrading.

Do I Belong? Modeling Sense of Virtual Community Among Linux Kernel

  • Bianca Trinkenreich, Klaas-Jan Stol, A. Sarma, D. Germán, M. Gerosa, Igor Steinmacher

  • ArXiv

  • January 16, 2023

The sense of belonging to a community is a basic human need that impacts an individuals behavior, long-term engagement, and job satisfaction, as revealed by research in disciplines such as psychology, healthcare, and education. Despite much research on how to retain developers in Open Source Software projects and other virtual, peer-production communities, there is a paucity of research investigating what might contribute to a sense of belonging in these communities. To that end, we develop a theoretical model that seeks to understand the link between OSS developer motives and a Sense of Virtual Community. We test the model with a dataset collected in the Linux Kernel developer community, using structural equation modeling techniques. Our results for this case study show that intrinsic motivations - social or hedonic motives - are positively associated with a sense of virtual community, but living in an authoritative country and being paid to contribute can reduce the sense of virtual community. Based on these results, we offer suggestions for open source projects to foster a sense of virtual community, with a view to retaining contributors and improving projects sustainability.

TLDR

A theoretical model is developed that seeks to understand the link between OSS developer motives and a Sense of Virtual Community and shows that intrinsic motivations - social or hedonic motives - are positively associated with a sense of virtual community, but living in an authoritative country and being paid to contribute can reduce the sense ofvirtual community.

Efficiency Matters: Speeding Up Automated Testing with GUI Rendering Inference

  • Sidong Feng, Mulong Xie, Chunyang Chen

  • December 10, 2022

Due to the importance of Android app quality assurance, many automated GUI testing tools have been developed. Although the test algorithms have been improved, the impact of GUI rendering has been overlooked. On the one hand, setting a long waiting time to execute events on fully rendered GUIs slows down the testing process. On the other hand, setting a short waiting time will cause the events to execute on partially rendered GUIs, which negatively affects the testing effectiveness. An optimal waiting time should strike a balance between effectiveness and efficiency. We propose AdaT, a lightweight image-based approach to dynamically adjust the inter-event time based on GUI rendering state. Given the real-time streaming on the GUI, AdaT presents a deep learning model to infer the rendering state, and synchronizes with the testing tool to schedule the next event when the GUI is fully rendered. The evaluations demonstrate the accuracy, efficiency, and effectiveness of our approach. We also integrate our approach with the existing automated testing tool to demonstrate the usefulness of AdaT in covering more activities and executing more events on fully rendered GUIs.

TLDR

AdaT, a lightweight image-based approach to dynamically adjust the inter-event time based on GUI rendering state is proposed, given the real-time streaming on the GUI, which presents a deep learning model to infer the rendering state, and synchronizes with the testing tool to schedule the next event when the GUI is fully rendered.

An Empirical Investigation on the Challenges Faced by Women in the Software Industry: A Case Study

  • Bianca Trinkenreich, Ricardo Britto, M. Gerosa, Igor Steinmacher

  • 2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)

  • March 20, 2022

Context: Addressing women's under-representation in the soft-ware industry, a widely recognized concern, requires attracting as well as retaining more women. Hearing from women practitioners, particularly those positioned in multi-cultural settings, about their challenges and and adopting their lived experienced solutions can support the design of programs to resolve the under-representation issue. Goal: We investigated the challenges women face in global software development teams, particularly what motivates women to leave their company; how those challenges might break down according to demographics; and strategies to mitigate the identified challenges. Method: To achieve this goal, we conducted an ex-ploratory case study in Ericsson, a global technology company. We surveyed 94 women and employed mixed-methods to analyze the data. Results: Our findings reveal that women face socio-cultural challenges, including work-life balance issues, benevolent and hos-tile sexism, lack of recognition and peer parity, impostor syndrome, glass ceiling bias effects, the prove-it-again phenomenon, and the maternal wall. The participants of our research provided different suggestions to address/mitigate the reported challenges, including sabbatical policies, flexibility of location and time, parenthood support, soft skills training for managers, equality of payment and opportunities between genders, mentoring and role models to sup-port career growth, directives to hire more women, inclusive groups and events, women's empowerment, and recognition for women's success. The framework of challenges and suggestions can inspire further initiatives both in academia and industry to onboard and retain women.

TLDR

The challenges women face in global software development teams, particularly what motivates women to leave their company; how those challenges might break down according to demographics; and strategies to mitigate the identified challenges are investigated.

INFOCOM

More than Enough is Too Much: Adaptive Defenses against Gradient Leakage in Production Federated Learning

  • Fei Wang, Ethan Hugh, Baochun Li

  • December 31, 2022

With increasing concerns on privacy leakage from gradients, a variety of attack mechanisms emerged to recover private data from gradients at an honest-but-curious server, which challenged the primary advantage of privacy protection in federated learning. However, we cast doubt upon the real impact of these gradient attacks on production federated learning systems. By taking away several impractical assumptions that the literature has made, we find that gradient attacks pose a limited degree of threat to the privacy of raw data. Through a comprehensive evaluation on existing gradient attacks in a federated learning system with practical assumptions, we have systematically analyzed their effectiveness under a wide range of configurations. We present key priors required to make the attack possible or stronger, such as a narrow distribution of initial model weights, as well as inversion at early stages of training. We then propose a new lightweight defense mechanism that provides sufficient and self-adaptive protection against timevarying levels of the privacy leakage risk throughout the federated learning process. As a variation of gradient perturbation method, our proposed defense, called OUTPOST, selectively adds Gaussian noise to gradients at each update iteration according to the Fisher information matrix, where the level of noise is determined by the privacy leakage risk quantified by the spread of model weights at each layer. To limit the computation overhead and training performance degradation, OUTPOST only performs perturbation with iteration-based decay. Our experimental results demonstrate that OUTPOST can achieve a much better tradeoff than the state-of-the-art with respect to convergence performance, computational overhead, and protection against gradient attacks.

TLDR

It is found that gradient attacks pose a limited degree of threat to the privacy of raw data and a new lightweight defense mechanism is proposed, called OUTPOST, that provides sufficient and self-adaptive protection against timevarying levels of the privacy leakage risk throughout the federated learning process.

ChARM: NextG Spectrum Sharing Through Data-Driven Real-Time O-RAN Dynamic Control

  • L. Baldesi, Francesco Restuccia, T. Melodia

  • IEEE INFOCOM 2022 - IEEE Conference on Computer Communications

  • January 17, 2022

Today’s radio access networks (RANs) are monolithic entities which often operate statically on a given set of parameters for the entirety of their operations. To implement realistic and effective spectrum sharing policies, RANs will need to seamlessly and intelligently change their operational parameters. In stark contrast with existing paradigms, the new O-RAN architectures for 5G-and-beyond networks (NextG) separate the logic that controls the RAN from its hardware substrate, allowing unprecedented real-time fine-grained control of RAN components. In this context, we propose the Channel-Aware Reactive Mechanism (ChARM), a data-driven O-RAN-compliant framework that allows (i) sensing the spectrum to infer the presence of interference and (ii) reacting in real time by switching the distributed unit (DU) and radio unit (RU) operational parameters according to a specified spectrum access policy. ChARM is based on neural networks operating directly on unprocessed I/Q waveforms to determine the current spectrum context. ChARM does not require any modification to the existing 3GPP standards. It is designed to operate within the O-RAN specifications, and can be used in conjunction with other spectrum sharing mechanisms (e.g., LTE-U, LTE-LAA or MulteFire). We demonstrate the performance of ChARM in the context of spectrum sharing among LTE and Wi-Fi in unlicensed bands, where a controller operating over a RAN Intelligent Controller (RIC) senses the spectrum and switches cell frequency to avoid Wi-Fi. We develop a prototype of ChARM using srsRAN, and leverage the Colosseum channel emulator to collect a large-scale waveform dataset to train our neural networks with. To collect standard-compliant Wi-Fi data, we extended the Colosseum testbed using system-on-chip (SoC) boards running a modified version of the OpenWiFi architecture. Experimental results show that ChARM achieves accuracy of up to 96% on Colosseum and 85% on an over-the-air testbed, demonstrating the capacity of ChARMto exploit the considered spectrum channels.

TLDR

The Channel-Aware Reactive Mechanism (ChARM), a data-driven O-RAN-compliant framework that allows sensing the spectrum to infer the presence of interference and reacting in real time by switching the distributed unit (DU) and radio unit (RU) operational parameters according to a specified spectrum access policy, is proposed.

Scalable Real-Time Bandwidth Fairness in Switches

  • Robert MacDavid, Xiaoqi Chen, J. Rexford

  • December 31, 2021

Network operators want to enforce fair bandwidth sharing between users without solely relying on congestion control running on end-user devices. However, in edge networks (e.g., 5G), the number of user devices sharing a bottleneck link far exceeds the number of queues supported by today’s switch hardware; even accurately tracking per-user sending rates may become too resource-intensive. Meanwhile, traditional software- based queuing on CPUs struggles to meet the high throughput and low latency demanded by 5G users. We propose Approximate Hierarchical Allocation of Bandwidth (AHAB), a per-user bandwidth limit enforcer that runs fully in the data plane of commodity switches. AHAB tracks each user’s approximate traffic rate and compares it against a bandwidth limit, which is iteratively updated via a real-time feedback loop to achieve max-min fairness across users. Using a novel sketch data structure, AHAB avoids storing per-user state, and therefore scales to thousands of slices and millions of users. Furthermore, AHAB supports network slicing, where each slice has a guaranteed share of the bandwidth that can be scavenged by other slices when under-utilized. Evaluation shows AHAB can achieve fair bandwidth allocation within 3.1ms, 13x faster than prior data-plane hierarchical schedulers.

TLDR

Approximate Hierarchical Allocation of Bandwidth (AHAB) is proposed, a per-user bandwidth limit enforcer that runs fully in the data plane of commodity switches and can achieve fair bandwidth allocation within 3.1ms, 13x faster than prior data-plane hierarchical schedulers.

PreGAN: Preemptive Migration Prediction Network for Proactive Fault-Tolerant Edge Computing

  • S. Tuli, G. Casale, N. Jennings

  • IEEE INFOCOM 2022 - IEEE Conference on Computer Communications

  • December 4, 2021

Building a fault-tolerant edge system that can quickly react to node overloads or failures is challenging due to the unreliability of edge devices and the strict service deadlines of modern applications. Moreover, unnecessary task migrations can stress the system network, giving rise to the need for a smart and parsimonious failure recovery scheme. Prior approaches often fail to adapt to highly volatile workloads or accurately detect and diagnose faults for optimal remediation. There is thus a need for a robust and proactive fault-tolerance mechanism to meet service level objectives. In this work, we propose PreGAN, a composite AI model using a Generative Adversarial Network (GAN) to predict preemptive migration decisions for proactive fault-tolerance in containerized edge deployments. PreGAN uses co-simulations in tandem with a GAN to learn a few-shot anomaly classifier and proactively predict migration decisions for reliable computing. Extensive experiments on a Raspberry-Pi based edge environment show that PreGAN can outperform state-of-the-art baseline methods in fault-detection, diagnosis and classification, thus achieving high quality of service. PreGAN accomplishes this by 5.1% more accurate fault detection, higher diagnosis scores and 23.8% lower overheads compared to the best method among the considered baselines.

TLDR

PreGAN, a composite AI model using a Generative Adversarial Network to predict preemptive migration decisions for proactive fault-tolerance in containerized edge deployments and can outperform state-of-the-art baseline methods in fault-detection, diagnosis and classification, thus achieving high quality of service.

ISCA

Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters

  • Kaiyang Zhao, Kaiwen Xue, Ziqi Wang, Dan Schatzberg, Leon Yang, Antonis Manousis, Johannes Weiner, Rik Van Riel, Bikash Sharma, Chunqiang Tang, Dimitrios Skarlatos

  • International Symposium on Computer Architecture

  • June 17, 2023

The unabating growth of the memory needs of emerging datacenter applications has exacerbated the scalability bottleneck of virtual memory. However, reducing the excessive overhead of address translation will remain onerous until the physical memory contiguity predicament gets resolved. To address this problem, this paper presents Contiguitas, a novel redesign of memory management in the operating system and hardware that provides ample physical memory contiguity. We identify that the primary cause of memory fragmentation in Meta's datacenters is unmovable allocations scattered across the address space that impede large contiguity from being formed. To provide ample physical memory contiguity by design, Contiguitas first separates regular movable allocations from unmovable ones by placing them into two different continuous regions in physical memory and dynamically adjusts the boundary of the two regions based on memory demand. Drastically reducing unmovable allocations is challenging because the majority of unmovable pages cannot be moved with software alone given that access to the page cannot be blocked for a migration to take place. Furthermore, page migration is expensive as it requires a long downtime to (a) perform TLB shootdowns that scale poorly with the number of victim TLBs, and (b) copy the page. To this end, Contiguitas eliminates the primary source of unmovable allocations by introducing hardware extensions in the last-level cache to enable the transparent and efficient migration of unmovable pages even while the pages remain in use. We build the operating system component of Contiguitas into the Linux kernel and run our experiments in a production environment at Meta's datacenters. Our results show that Contiguitas's OS component successfully confines unmovable allocations, drastically reducing unmovable 2MB blocks from an average of 31% scattered across the address space down to 7% confined in the unmovable region, leading to significant performance gains. Specifically, we show that for three major production services, Contiguitas achieves end-to-end performance improvements of 2--9% for partially fragmented servers, and 7--18% for highly fragmented servers, which account for nearly a quarter of Meta's fleet. We further use full-system simulations to demonstrate the effectiveness of the hardware extensions of Contiguitas. Our evaluation shows that Contiguitas-HW enables the efficient migration of unmovable allocations, scales well with the number of victim TLBs, and does not affect application performance. We are currently in the process of upstreaming Contiguitas into Linux.

TLDR

Contiguitas, a novel redesign of memory management in the operating system and hardware that provides ample physical memory contiguity, is presented and hardware extensions in the last-level cache enable the transparent and efficient migration of unmovable pages even while the pages remain in use.

NvMR: non-volatile memory renaming for intermittent computing

  • A. Bhattacharyya, Abhijith Somashekhar, Joshua San Miguel

  • Proceedings of the 49th Annual International Symposium on Computer Architecture

  • June 18, 2022

Intermittent systems on energy-harvesting devices have to frequently back up data because of an unreliable energy supply to make forward progress. These devices come with non-volatile memories like Flash/FRAM on board that are used to back up the system state. However, quite paradoxically, writing to a non-volatile memory consumes a lot of energy that makes backups expensive. Idem-potency violations inherent to intermittent programs are major contributors to the problem, as they render system state inconsistent and force backups to occur even when plenty of energy is available. In this work, we first characterize the complex persist dependencies that are unique to intermittent computing. Based on these insights, we propose NvMR, an intermittent architecture that eliminates idempotency violations in the program by renaming non-volatile memory addresses. This can reduce the number of backups to their theoretical minimum and decouple the decision of when to perform backups from the memory access constraints imposed by the program. Our evaluations show that compared to a state-of-the-art intermittent architecture, NvMR can save about 20% energy on average when running common embedded applications.

TLDR

NvMR, an intermittent architecture that eliminates idempotency violations in the program by renaming non-volatile memory addresses is proposed, which can reduce the number of backups to their theoretical minimum and decouple the decision of when to perform backups from the memory access constraints imposed by the program.

ISSTA

Combining solution reuse and bound tightening for efficient analysis of evolving systems

  • Clay Stevens, H. Bagheri

  • Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

  • July 18, 2022

Software engineers have long employed formal verification to ensure the safety and validity of their system designs. As the system changes---often via predictable, domain-specific operations---their models must also change, requiring system designers to repeatedly execute the same formal verification on similar system models. State-of-the-art formal verification techniques can be expensive at scale, the cost of which is multiplied by repeated analysis. This paper presents a novel analysis technique---implemented in a tool called SoRBoT---which can automatically determine domain-specific optimizations that can dramatically reduce the cost of repeatedly analyzing evolving systems. Different from all prior approaches, which focus on either tightening the bounds for analysis or reusing all or part of prior solutions, SoRBoT's automated derivation of domain-specific optimizations combines the benefits of both solution reuse and bound tightening while avoiding the main pitfalls of each. We experimentally evaluate SoRBoT against state-of-the-art techniques for verifying evolving specifications, demonstrating that SoRBoT substantially exceeds the run-time performance of those state-of-the-art techniques while introducing only a negligible overhead, in contrast to the expensive additional computations required by the state-of-the-art verification techniques.

TLDR

A novel analysis technique is presented which can automatically determine domain-specific optimizations that can dramatically reduce the cost of repeatedly analyzing evolving systems and which substantially exceeds the run-time performance of those state-of-the-art techniques while introducing only a negligible overhead.

NCScope: hardware-assisted analyzer for native code in Android apps

  • Hao Zhou, Shuohan Wu, Xiapu Luo, Ting Wang, Yajin Zhou, Chao Zhang, Haipeng Cai

  • Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

  • July 18, 2022

More and more Android apps implement their functionalities in native code, so does malware. Although various approaches have been designed to analyze the native code used by apps, they usually generate incomplete and biased results due to their limitations in obtaining and analyzing high-fidelity execution traces and memory data with low overheads. To fill the gap, in this paper, we propose and develop a novel hardware-assisted analyzer for native code in apps. We leverage ETM, a hardware feature of ARM platform, and eBPF, a kernel component of Android system, to collect real execution traces and relevant memory data of target apps, and design new methods to scrutinize native code according to the collected data. To show the unique capability of NCScope, we apply it to four applications that cannot be accomplished by existing tools, including systematic studies on self-protection and anti-analysis mechanisms implemented in native code of apps, analysis of memory corruption in native code, and identification of performance differences between functions in native code. The results uncover that only 26.8% of the analyzed financial apps implement self-protection methods in native code, implying that the security of financial apps is far from expected. Meanwhile, 78.3% of the malicious apps under analysis have anti-analysis behaviors, suggesting that NCScope is very useful to malware analysis. Moreover, NCScope can effectively detect bugs in native code and identify performance differences.

TLDR

A novel hardware-assisted analyzer for native code in apps that can effectively detect bugs in native code and identify performance differences, and is very useful to malware analysis.

Finding Permission Bugs in Smart Contracts with Role Mining

  • Ye Liu

  • December 31, 2021

Smart contracts deployed on permissionless blockchains, such as Ethereum, are accessible to any user in a trustless environment. Therefore, most smart contract applications implement access control policies to protect their valuable assets from unauthorized accesses. A difficulty in validating the conformance to such policies, i.e., whether the contract implementation adheres to the expected behaviors, is the lack of policy specifications. In this paper, we mine past transactions of a contract to recover a likely access control model, which can then be checked against various information flow policies and identify potential bugs related to user permissions. We implement our role mining and security policy validation in tool SPCon. The experimental evaluation on labeled smart contract role mining benchmark demonstrates that SPCon effectively mines more accurate user roles compared to the state-of-the-art role mining tools. Moreover, the experimental evaluation on real-world smart contract benchmark and access control CVEs indicates SPCon effectively detects potential permission bugs while having better scalability and lower false-positive rate compared to the state-of-the-art security tools, finding 11 previously unknown bugs and detecting six CVEs that no other tool can find.

TLDR

This paper mine past transactions of a contract to recover a likely access control model, which can then be checked against various information flow policies and identify potential bugs related to user permissions in tool SPCon, a role mining and security policy validation tool for smart contracts.

Cross-lingual transfer learning for statistical type inference

  • Zhiming Li, Xiaofei Xie, Hao Li, Zhengzi Xu, Yi Li, Yang Liu

  • Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

  • July 1, 2021

Hitherto statistical type inference systems rely thoroughly on supervised learning approaches, which require laborious manual effort to collect and label large amounts of data. Most Turing-complete imperative languages share similar control- and data-flow structures, which make it possible to transfer knowledge learned from one language to another. In this paper, we propose a cross-lingual transfer learning framework, PLATO, for statistical type inference, which allows us to leverage prior knowledge learned from the labeled dataset of one language and transfer it to the others, e.g., Python to JavaScript, Java to JavaScript, etc. PLATO is powered by a novel kernelized attention mechanism to constrain the attention scope of the backbone Transformer model such that model is forced to base its prediction on commonly shared features among languages. In addition, we propose the syntax enhancement that augments the learning on the feature overlap among language domains. Furthermore, PLATO can also be used to improve the performance of the conventional supervised-based type inference by introducing cross-language augmentation, which enables the model to learn more general features across multiple languages. We evaluated PLATO under two settings: 1) under the cross-domain scenario that the target language data is not labeled or labeled partially, the results show that PLATO outperforms the state-of-the-art domain transfer techniques by a large margin, , it improves the Python to TypeScript baseline by +14.6%@EM, +18.6%@weighted-F1, and 2) under the conventional monolingual supervised scenario, PLATO improves the Python baseline by +4.10%@EM, +1.90%@weighted-F1 with the introduction of the cross-lingual augmentation.

TLDR

A cross-lingual transfer learning framework, PLATO, for statistical type inference, which allows us to leverage prior knowledge learned from the labeled dataset of one language and transfer it to the others, e.g., Python to JavaScript, Java to Java, etc.

MOBICOM

Magnetoelectric backscatter communication for millimeter-sized wireless biomedical implants

  • Zhanghao Yu, Fatima T. Alrashdan, Wei Wang, M. Parker, Xinyu Chen, Frank Y. Chen, Joshua Woods, Zhiyu Chen, Jacob T. Robinson, Kaiyuan Yang

  • Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

  • October 14, 2022

This paper presents the design, implementation, and experimental evaluation of a wireless biomedical implant platform exploiting the magnetoelectric effect for wireless power and bi-directional communication. As an emerging wireless power transfer method, magnetoelectric is promising for mm-scaled bio-implants because of its superior misalignment sensitivity, high efficiency, and low tissue absorption compared to other modalities [46, 59, 60]. Utilizing the same physical mechanism for power and communication is critical for implant miniaturization, but low-power magnetoelectric uplink communication has not been achieved yet. For the first time, we design and demonstrate near-zero power magnetoelectric backscatter from the mm-sized implants by exploiting the converse magnetostriction effects. The system for demonstration consists of an 8.2-mm3 wireless implantable device and a custom portable transceiver. The implant's ASIC interfacing with the magnetoelectric transducer encodes uplink data by changing the transducer's load, resulting in resonance frequency changes for frequency-shift-keying modulation. The magnetoelectrically backscattered signal is sensed and demodulated through frequency-to-digital conversion by the external transceiver. With design optimizations in data modulation and recovery, the proposed system archives > 1-kbps data rate at the 335-kHz carrier frequency, with a communication distance greater than 2 cm and a bit error rate less than 1E-3. Further, we validate the proposed system for wireless stimulation and sensing, and conducted ex-vivo tests through a 1.5-cm porcine tissue. The proposed magnetoelectric backscatter approach provides a path towards miniaturized wireless bio-implants for advanced biomedical applications like closed-loop neuromodulation.

TLDR

The proposed magnetoelectric backscatter approach provides a path towards miniaturized wireless bio-implants for advanced biomedical applications like closed-loop neuromodulation.

NSDI

LeakyScatter: A Frequency-Agile Directional Backscatter Network Above 100 GHz

  • Atsu Kludze, Yasaman Ghasempour

  • Symposium on Networked Systems Design and Implementation

  • December 31, 2022

Wireless backscattering has been deemed suitable for various emerging energy-constrained applications given its low-power architectures. Although existing backscatter nodes often operate at sub-6 GHz frequency bands, moving to the sub-THz bands offers significant advantages in scaling low-power connectivity to dense user populations; as concurrent transmissions can be separated in both spectral and spatial domains given the large swath of available bandwidth and laser-shaped beam directionality in this frequency regime. However, the power consumption and complexity of wireless devices increase significantly with frequency. In this paper, we present LeakyScatter, the first backscatter system that enables directional, low-power, and frequency-agile wireless links above 100 GHz. LeakyScatter departs from conventional backscatter designs and introduces a novel architecture that relies on aperture reciprocity in leaky-wave devices. We have fabricated LeakyScatter and evaluated its performance through extensive simulations and over-the-air experiments. Our results demonstrate a scalable wireless link above 100 GHz that is retrodirective and operates at a large bandwidth (tens of GHz) and ultra-low-power (zero power consumed for directional steering and ≤ 1 mW for data modulation).

TLDR

LeakyScatter departs from conventional backscatter designs and introduces a novel architecture that relies on aperture reciprocity in leaky-wave devices and demonstrates a scalable wireless link above 100 GHz that is retrodirective and operates at a large bandwidth and ultra-low-power.

CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation

  • Abdullah Alomar, Pouya Hamadanian, Arash Nasr-Esfahany, A. Agarwal, MohammadIman Alizadeh, D. Shah

  • Symposium on Networked Systems Design and Implementation

  • January 5, 2022

We present CausalSim, a causal framework for unbiased trace-driven simulation. Current trace-driven simulators assume that the interventions being simulated (e.g., a new algorithm) would not affect the validity of the traces. However, real-world traces are often biased by the choices algorithms make during trace collection, and hence replaying traces under an intervention may lead to incorrect results. CausalSim addresses this challenge by learning a causal model of the system dynamics and latent factors capturing the underlying system conditions during trace collection. It learns these models using an initial randomized control trial (RCT) under a fixed set of algorithms, and then applies them to remove biases from trace data when simulating new algorithms. Key to CausalSim is mapping unbiased trace-driven simulation to a tensor completion problem with extremely sparse observations. By exploiting a basic distributional invariance property present in RCT data, CausalSim enables a novel tensor completion method despite the sparsity of observations. Our extensive evaluation of CausalSim on both real and synthetic datasets, including more than ten months of real data from the Puffer video streaming system shows it improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines. Moreover, CausalSim provides markedly different insights about ABR algorithms compared to the biased baseline simulator, which we validate with a real deployment.

TLDR

CausalSim improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines and provides markedly different insights about ABR algorithms compared to the biased baseline simulator, which is validated with a real deployment.

Learning to Communicate Effectively Between Battery-free Devices

  • Kai Geissdoerfer, Marco Zimmerling

  • Symposium on Networked Systems Design and Implementation

  • December 31, 2021

Successful wireless communication requires that sender and receiver are operational at the same time. This requirement is difficult to satisfy in battery-free networks, where the energy harvested from ambient sources varies across time and space and is often too weak to continuously power the devices. We present Bonito , the first connection protocol for battery-free systems that enables reliable and efficient bi-directional communication between intermittently powered nodes. We collect and analyze real-world energy-harvesting traces from five diverse scenarios involving solar panels and piezoelectric harvesters, and find that the nodes’ charging times approximately follow well-known distributions. Bonito learns a model of these distributions online and adapts the nodes’ wake-up times so that sender and receiver are operational at the same time, enabling successful communication. Experiments with battery-free prototype nodes built from off-the-shelf hardware components demonstrate that our design improves the average throughput by 10–80 × compared with the state of the art.

TLDR

Bonito is presented, the first connection protocol for battery-free systems that enables reliable and efficient bi-directional communication between intermittently powered nodes, and it is found that the nodes’ charging times approximately follow well-known distributions.

Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets

  • Hamid Ghasemirahni, Tom Barbette, Georgios P. Katsikas, Alireza Farshin, Amir Roozbeh, Massimo Girondi, Marco Chiesa, Gerald Maguire, Dejan Kostic

  • Symposium on Networked Systems Design and Implementation

  • December 31, 2021

Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system’s caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits. In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that ( i ) the performance of a variety of widely deployed applications degrades substantially with even the slightest lack of traffic locality, and ( ii ) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing µs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.

TLDR

Reframer is a software solution that deliberately delays packets and reorders them to increase traffic locality and it is shown that Reframer increases the throughput of a network service chain by up to 84% and reduces the completion time of a web server while improving its throughput by 20%.

OSDI

Ensō: A Streaming Interface for NIC-Application Communication

  • Hugo Sadok, Nirav Atre, Zhipeng Zhao, Daniel S. Berger, J. Hoe, Aurojit Panda, Justine Sherry, Ren Wang

  • USENIX Symposium on Operating Systems Design and Implementation

  • December 31, 2022

Today, most communication between the NIC and software involves exchanging fixed-size packet buffers. This packetized interface was designed for an era when NICs implemented few offloads and software implemented the logic for translating between application data and packets. However, both NICs and networked software have evolved: modern NICs implement hardware offloads, e.g., TSO, LRO, and serialization offloads that canmore efficiently translate between application data and packets. Furthermore, modern software increasingly batches network I/O to reduce overheads. These changes have led to a mismatch between the packetized interface, which assumes that the NIC and software exchange fixed-size buffers, and the features provided by modern NICs and used by modern software. This incongruence between interface and data adds software complexity and I/O overheads, which in turn limits communication performance. This paper proposes Ensō, a new streaming NIC-to-software interface designed to better support how NICs and software interact today. At its core, Ensō eschews fixed-size buffers, and instead structures communication as a stream that can be used to send arbitrary data sizes. We show that this change reduces software overheads, reduces PCIe bandwidth requirements, and leads to fewer cache misses. These improvements allow an Ensōbased NIC to saturate a 100Gbps link with minimum-sized packets (forwarding at 148.8Mpps) using a single core, improve throughput for high-performance network applications by 1.5– 6×, and reduce latency by up to 43%.

TLDR

Ens¯o, a new streaming NIC-to-software interface designed to better support how NICs and software interact today, is proposed, which eschews fixed-size buffers, and instead structures communication as a stream that can be used to send arbitrary data sizes.

MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime

  • Chenxi Wang, Haoran Ma, Shicheng Liu, Yifan Qiao, Jon Eyolfson, Christian Navasca, Shan Lu, G. Xu

  • USENIX Symposium on Operating Systems Design and Implementation

  • December 31, 2021

Far-memory techniques that enable applications to use remote memory are increasingly appealing in modern data cen-ters, supporting applications’ large memory footprint and improving machines’ resource utilization. Unfortunately, most far-memory techniques focus on OS-level optimizations and are agnostic to managed runtimes and garbage collections (GC) underneath applications written in high-level languages. With different object-access patterns from applications, GC can severely interfere with existing far-memory techniques, breaking remote memory prefetching algorithms and causing severe local-memory misses. We developed MemLiner, a runtime technique that improves the performance of far-memory systems by “lining up” memory accesses from the application and the GC so that they follow similar memory access paths, thereby (1) reducing the local-memory working set and (2) improving remote-memory prefetching through simplified memory access patterns. We implemented MemLiner in two widely-used GCs in OpenJDK: G1 and Shenandoah. Our evaluation with a range of widely-deployed cloud systems shows MemLiner improves applications’ end-to-end performance by up to 2.5 × .

TLDR

MemLiner is developed, a runtime technique that improves the performance of far-memory systems by “lining up” memory accesses from the application and the GC so that they follow similar memory access paths, thereby reducing the local-memory working set and improving remote-memory prefetching through simplified memory access patterns.

PLDI

Mosaic: An Interoperable Compiler for Tensor Algebra

  • Manya Bansal, Olivia Hsu, K. Olukotun, Fredrik Kjolstad

  • Proceedings of the ACM on Programming Languages

  • June 6, 2023

We introduce Mosaic, a sparse tensor algebra compiler that can bind tensor expressions to external functions of other tensor algebra libraries and compilers. Users can extend Mosaic by adding new functions and bind a sub-expression to a function using a scheduling API. Mosaic substitutes the bound sub-expressions with calls to the external functions and automatically generates the remaining code using a default code generator. As the generated code is fused by default, users can productively leverage both fusion and calls to specialized functions within the same compiler. We demonstrate the benefits of our dual approach by showing that calling hand-written CPU and specialized hardware functions can provide speedups of up to 206× against fused code in some cases, while generating fused code can provide speedups of up to 3.57× against code that calls external functions in other cases. Mosaic also offers a search system that can automatically map an expression to a set of registered external functions. Both the explicit binding and automatic search are verified by Mosaic. Additionally, the interface for adding new external functions is simple and general. Currently, 38 external functions have been added to Mosaic, with each addition averaging 20 lines of code.

TLDR

The benefits of the dual approach are demonstrated by showing that calling hand-written CPU and specialized hardware functions can provide speedups of up to 206× against fused code in some cases, while generating fused code can provide up to 3.57× against code that calls external functions in other cases.

CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives

  • Joel Kuepper, Andres Erbsen, Jason Gross, Owen Conoly, Chuyue Sun, Samuel Tian, David Wu, A. Chlipala, C. Chuengsatiansup, Daniel Genkin, Markus Wagner, Y. Yarom

  • Proceedings of the ACM on Programming Languages

  • June 6, 2023

Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best- known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1 for the Intel 12𝑡ℎ and 13𝑡ℎ generations.

TLDR

CryptOpt is presented, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly.

Synthesizing MILP Constraints for Efficient and Robust Optimization

  • Jingbo Wang, Aarti Gupta, Chao Wang

  • Proceedings of the ACM on Programming Languages

  • June 6, 2023

While mixed integer linear programming (MILP) solvers are routinely used to solve a wide range of important science and engineering problems, it remains a challenging task for end users to write correct and efficient MILP constraints, especially for problems specified using the inherently non-linear Boolean logic operations. To overcome this challenge, we propose a syntax guided synthesis (SyGuS) method capable of generating high-quality MILP constraints from the specifications expressed using arbitrary combinations of Boolean logic operations. At the center of our method is an extensible domain specification language (DSL) whose expressiveness may be improved by adding new integer variables as decision variables, together with an iterative procedure for synthesizing linear constraints from non-linear Boolean logic operations using these integer variables. To make the synthesis method efficient, we also propose an over-approximation technique for soundly proving the correctness of the synthesized linear constraints, and an under-approximation technique for safely pruning away the incorrect constraints. We have implemented and evaluated the method on a wide range of benchmark specifications from statistics, machine learning, and data science applications. The experimental results show that the method is efficient in handling these benchmarks, and the quality of the synthesized MILP constraints is close to, or higher than, that of manually-written constraints in terms of both compactness and solving time.

TLDR

The experimental results show that the SyGuS method is efficient in handling benchmarks, and the quality of the synthesized MILP constraints is close to, or higher than, that of manually-written constraints in terms of both compactness and solving time.

An Automata-Based Framework for Verification and Bug Hunting in Quantum Circuits

  • Yu-Fang Chen, Kai-Min Chung, Ondřej Lengál, Jyun-Ao Lin, W. Tsai, Di-De Yen

  • Proceedings of the ACM on Programming Languages

  • June 6, 2023

We introduce a new paradigm for analysing and finding bugs in quantum circuits. In our approach, the problem is given by a triple {P} C {Q} and the question is whether, given a set P of quantum states on the input of a circuit C, the set of quantum states on the output is equal to (or included in) a set Q. While this is not suitable to specify, e.g., functional correctness of a quantum circuit, it is sufficient to detect many bugs in quantum circuits. We propose a technique based on tree automata to compactly represent sets of quantum states and develop transformers to implement the semantics of quantum gates over this representation. Our technique computes with an algebraic representation of quantum states, avoiding the inaccuracy of working with floating-point numbers. We implemented the proposed approach in a prototype tool and evaluated its performance against various benchmarks from the literature. The evaluation shows that our approach is quite scalable, e.g., we managed to verify a large circuit with 40 qubits and 141,527 gates, or catch bugs injected into a circuit with 320 qubits and 1,758 gates, where all tools we compared with failed. In addition, our work establishes a connection between quantum program verification and automata, opening new possibilities to exploit the richness of automata theory and automata-based verification in the world of quantum computing.

TLDR

A new paradigm for analysing and finding bugs in quantum circuits is introduced, a technique based on tree automata to compactly represent sets of quantum states is proposed and transformers to implement the semantics of quantum gates over this representation are developed.

Covering All the Bases: Type-Based Verification of Test Input Generators

  • Zhe-Wei Zhou, Ashish Mishra, Benjamin Delaware, S. Jagannathan

  • Proceedings of the ACM on Programming Languages

  • June 6, 2023

Test input generators are an important part of property-based testing (PBT) frameworks. Because PBT is intended to test deep semantic and structural properties of a program, the outputs produced by these generators can be complex data structures, constrained to satisfy properties the developer believes is most relevant to testing the function of interest. An important feature expected of these generators is that they be capable of producing all acceptable elements that satisfy the function’s input type and generator-provided constraints. However, it is not readily apparent how we might validate whether a particular generator’s output satisfies this coverage requirement. Typically, developers must rely on manual inspection and post-mortem analysis of test runs to determine if the generator is providing sufficient coverage; these approaches are error-prone and difficult to scale as generators become more complex. To address this important concern, we present a new refinement type-based verification procedure for validating the coverage provided by input test generators, based on a novel interpretation of types that embeds “must-style” underapproximate reasoning principles as a fundamental part of the type system. The types associated with expressions now capture the set of values guaranteed to be produced by the expression, rather than the typical formulation that uses types to represent the set of values an expression may produce. Beyond formalizing the notion of coverage types in the context of a rich core language with higher-order procedures and inductive datatypes, we also present a detailed evaluation study to justify the utility of our ideas.

TLDR

A new refinement type-based verification procedure for validating the coverage provided by input test generators is presented, based on a novel interpretation of types that embeds “must-style” underapproximate reasoning principles as a fundamental part of the type system.

Mostly Automated Proof Repair for Verified Libraries

  • K. Gopinathan, Mayank Keoliya, Ilya Sergey

  • Proceedings of the ACM on Programming Languages

  • June 6, 2023

The cost of maintaining formally specified and verified software is widely considered prohibitively high due to the need to constantly keep code and the proofs of its correctness in sync—the problem known as proof repair. One of the main challenges in automated proof repair for evolving code is to infer invariants for a new version of a once verified program that are strong enough to establish its full functional correctness. In this work, we present the first proof repair methodology for higher-order imperative functions, whose initial versions were verified in the Coq proof assistant and whose specifications remained unchanged. Our proof repair procedure is based on the combination of dynamic program alignment, enumerative invariant synthesis, and a novel technique for efficiently pruning the space of invariant candidates, dubbed proof-driven testing, enabled by the constructive nature of Coq’s proof certificates. We have implemented our approach in a mostly-automated proof repair tool called Sisyphus. Given an OCaml function verified in Coq and its unverified new version, Sisyphus produces a Coq proof for the new version, discharging most of the new proof goals automatically and suggesting high-confidence obligations for the programmer to prove for the cases when automation fails. We have evaluated Sisyphus on 10 OCaml programs taken from popular libraries, that manipulate arrays and mutable data structures, considering their verified original and unverified evolved versions. Sisyphus has managed to repair proofs for all those functions, suggesting correct invariants and generating a small number of easy-to-prove residual obligations.

TLDR

This work presents the first proof repair methodology for higher-order imperative functions, whose initial versions were verified in the Coq proof assistant and whose specifications remained unchanged, and implemented in a mostly-automated proof repair tool called Sisyphus.

Finding typing compiler bugs

  • Stefanos Chaliasos, Thodoris Sotiropoulos, D. Spinellis, Arthur Gervais, B. Livshits, Dimitris Mitropoulos

  • Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

  • June 9, 2022

We propose a testing framework for validating static typing procedures in compilers. Our core component is a program generator suitably crafted for producing programs that are likely to trigger typing compiler bugs. One of our main contributions is that our program generator gives rise to transformation-based compiler testing for finding typing bugs. We present two novel approaches (type erasure mutation and type overwriting mutation) that apply targeted transformations to an input program to reveal type inference and soundness compiler bugs respectively. Both approaches are guided by an intra-procedural type inference analysis used to capture type information flow. We implement our techniques as a tool, which we call Hephaestus. The extensibility of Hephaestus enables us to test the compilers of three popular JVM languages: Java, Kotlin, and Groovy. Within nine months of testing, we have found 156 bugs (137 confirmed and 85 fixed) with diverse manifestations and root causes in all the examined compilers. Most of the discovered bugs lie in the heart of many critical components related to static typing, such as type inference.

TLDR

A testing framework for validating static typing procedures in compilers and presents two novel approaches (type erasure mutation and type overwriting mutation) that apply targeted transformations to an input program to reveal type inference and soundness compiler bugs respectively.

RustHornBelt: a semantic foundation for functional verification of Rust programs with unsafe code

  • Yusuke Matsushita, Xavier Denis, Jacques-Henri Jourdan, Derek Dreyer

  • Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

  • June 9, 2022

Rust is a systems programming language that offers both low-level memory operations and high-level safety guarantees, via a strong ownership type system that prohibits mutation of aliased state. In prior work, Matsushita et al. developed RustHorn, a promising technique for functional verification of Rust code: it leverages the strong invariants of Rust types to express the behavior of stateful Rust code with first-order logic (FOL) formulas, whose verification is amenable to off-the-shelf automated techniques. RustHorn’s key idea is to use prophecies to describe the behavior of mutable borrows. However, the soundness of RustHorn was only established for a safe subset of Rust, and it has remained unclear how to extend it to support various safe APIs that encapsulate unsafe code (i.e., code where Rust’s aliasing discipline is relaxed). In this paper, we present RustHornBelt, the first machine-checked proof of soundness for RustHorn-style verification which supports giving FOL specs to safe APIs implemented with unsafe code. RustHornBelt employs the approach of semantic typing used in Jung et al.’s RustBelt framework, but it extends RustBelt’s model to reason not only about safety but also functional correctness. The key challenge in RustHornBelt is to develop a semantic model of RustHorn-style prophecies, which we achieve via a new separation-logic mechanism we call parametric prophecies.

TLDR

RustHornBelt is presented, the first machine-checked proof of soundness for RustHorn-style verification which supports giving FOL specs to safe APIs implemented with unsafe code.

PODS
S&P

WaVe: a verifiably secure WebAssembly sandboxing runtime

  • Evan Johnson, Evan Laufer, Stanford, Zijie Zhao, S. Savage, D. Stefan, Ucsd, Fraser Brown

  • December 31, 2021

The promise of software sandboxing is flexible, fast and portable isolation; capturing the benefits of hardware-based memory protection without requiring operating system involvement. This promise is reified in WebAssembly (Wasm), a popular portable bytecode whose compilers automatically insert runtime checks to ensure that data and control flow are constrained to a single memory segment. Indeed, modern compiled Wasm implementations have advanced to the point where these checks can themselves be verified, removing the compiler from the trusted computing base. However, the resulting integrity properties are only valid for code executing strictly inside the Wasm sandbox. Any interactions with the runtime system, which manages sandboxes and exposes the WebAssembly System Interface (WASI) used to access operating system resources, operate outside this contract. The resulting conundrum is how to maintain Wasm’s strong isolation properties while still allowing such programs to interact with the outside world (i.e., with the file system, the network, etc.). Our paper presents a solution to this problem, via WaVe, a verified secure runtime system that implements WASI. We mechanically verify that interactions with WaVe (including OS side effects) not only maintain Wasm’s memory safety guarantees, but also maintain access isolation for the host OS’s storage and network resources. Finally, in spite of completely removing the runtime from the trusted computing base, we show that WaVe offers performance competitive with existing industrial (yet unsafe) Wasm runtimes.

TLDR

This paper mechanically verify that interactions with WaVe not only maintain Wasm’s memory safety guarantees, but also maintain access isolation for the host OS’'s storage and network resources.

Characterizing Everyday Misuse of Smart Home Devices

  • Phoebe Moh, †. PubaliDatta, N. Warford, †. AdamBates, Nathan Malkin, Michelle L. Mazurek

  • How smart devices are misused — used without permission in a manner that causes harm — by device owners’ everyday associates such as friends, family, and romantic partners is explored.

  • December 31, 2021

TLDR

Exploration of Internet of Things (IoT) security often focuses on threats posed by external and technically-skilled attackers. While it is important to understand these most extreme cases, it is equally important to understand the most likely risks of harm posed by smart device ownership. In this paper, we explore how smart devices are misused — used without permission in a manner that causes harm — by device owners’ everyday associates such as friends, family, and romantic partners. In a preliminary characterization survey ( n = 100 ), we broadly capture the kinds of unauthorized use and misuse incidents participants have experienced or engaged in. Then, in a prevalence survey ( n = 483 ), we assess the prevalence of these incidents in a demographically-representative population. Our findings show that unauthorized use of smart devices is widespread (experienced by 43% of participants), and that misuse is also common (experienced by at least 19% of participants). However, highly individual factors determine whether these unauthorized use events constitute misuse. Through a focus on everyday abuses, this work sheds light on the most prevalent security and privacy threats faced by smart-home owners today.

Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions

  • H. Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, R. Karri

  • 2022 IEEE Symposium on Security and Privacy (SP)

  • August 20, 2021

There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described ‘AI pair programmer’, GitHub Copilot, which is a language model trained over open-source GitHub code. However, code often contains bugs—and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot’s code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk cybersecurity weaknesses, e.g. those from MITRE’s “Top 25” Common Weakness Enumeration (CWE) list. We explore Copilot’s performance on three distinct code generation axes—examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,689 programs. Of these, we found approximately 40% to be vulnerable.

TLDR

This work systematically investigates the prevalence and conditions that can cause GitHub Copilot to recommend insecure code, and explores Copilot’s performance on three distinct code generation axes—examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains.

SIGCOMM

Software-defined network assimilation: bridging the last mile towards centralized network configuration management with Nassim

  • Huangxun Chen, Yukai Miao, Li Chen, Haifeng Sun, Hong Chao Xu, Libin Liu, Gong Zhang, Wei Wang

  • Proceedings of the ACM SIGCOMM 2022 Conference

  • August 22, 2022

On-boarding new devices into an existing SDN network is a pain for network operations (NetOps) teams, because much expert effort is required to bridge the gap between the configuration models of the new devices and the unified data model in the SDN controller. In this work, we present an assistant framework NAssim, to help NetOps accelerate the process of assimilating a new device into a SDN network. Our solution features a unified parser framework to parse diverse device user manuals into preliminary configuration models, a rigorous validator that confirm the correctness of the models via formal syntax analysis, model hierarchy validation and empirical data validation, and a deep-learning-based mapping algorithm that uses state-of-the-art neural language processing techniques to produce human-comprehensible recommended mapping between the validated configuration model and the one in the SDN controller. In all, NAssim liberates the NetOps from most tedious tasks by learning directly from devices' manuals to produce data models which are comprehensible by both the SDN controller and human experts. Our evaluation shows, NAssim can accelerate the assimilation process by 9.1x. In this process, we also identify and correct 243 errors in four mainstream vendors' device manuals, and release a validated and expert-curated dataset of parsed manual corpus for future research.

TLDR

This work presents an assistant framework NAssim, to help NetOps accelerate the process of assimilating a new device into a SDN network, and identifies and correct 243 errors in four mainstream vendors' device manuals.

SIGMETRICS

Mean-field Analysis for Load Balancing on Spatial Graphs

  • Daan Rutten, Debankur Mukherjee

  • Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

  • January 9, 2023

A pivotal methodological tool behind the analysis of large-scale load balancing systems is mean-field analysis. The high-level idea is to represent the system state by aggregate quantities and characterize their rate of change as the system size grows large. An assumption for the above scheme to work is that the aggregate quantity is Markovian such that its rate of change can be expressed as a function of its current state. If the aggregate quantity is not Markovian, not only does this technique break down, the mean-field approximation may even turn out to be highly inaccurate. In load balancing systems, if servers are exchangeable, then the aggregate quantity is indeed Markovian. However, the growing heterogeneity in the types of tasks processed by modern data centers has recently motivated the research community to consider systems beyond the exchangeability assumption. The main reason stems from data locality, i.e., the fact that servers need to store resources to process tasks of a particular type locally and have only limited storage space. An emerging line of work thus considers a bipartite graph between task types and servers [2, 3, 5 -7]. In this compatibility graph, an edge between a server and a task type represents the server's ability to process these tasks. In practice, storage capacity or geographical constraints force a server to process only a small subset of all task types, leading to sparse network topologies. This motivates the study of load balancing in systems with suitably sparse bipartite compatibility graphs.

TLDR

The growing heterogeneity in the types of tasks processed by modern data centers has recently motivated the research community to consider systems beyond the exchangeability assumption, and an emerging line of work considers a bipartite graph between task types and servers.

WISEFUSE: Workload Characterization and DAG Transformation for Serverless Workflows

  • Ashraf Y. Mahgoub, Edgardo Barsallo Yi, Karthick Shankar, Eshaan Minocha, S. Elnikety, S. Bagchi, S. Chaterji

  • Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

  • June 6, 2022

We characterize production workloads of serverless DAGs at a major cloud provider. Our analysis highlights two major factors that limit performance: (a) lack of efficient communication methods between the serverless functions in the DAG, and (b) stragglers when a DAG stage invokes a set of parallel functions that must complete before starting the next DAG stage. To address these limitations, we propose WISEFUSE, an automated approach to generate an optimized execution plan for serverless DAGs for a user-specified latency objective or $ budget. We introduce three optimizations: (1) Fusion combines in-series functions together in a single VM to reduce the communication overhead between cascaded functions. (2) Bundling executes a group of parallel invocations of a function in one VM to improve resource sharing among the parallel workers to reduce skew. (3) Resource Allocation assigns the right VM size to each function or function bundle in the DAG to reduce the E2E latency and cost. We implement WISEFUSE to evaluate it experimentally using three popular serverless applications with different DAG structures, memory footprints, and intermediate data sizes. Compared to competing approaches and other alternatives, WISEFUSE shows significant improvements in E2E latency and cost. Specifically, for a machine learning pipeline, WISEFUSE achieves P95 latency that is 67% lower than Photons, 39% lower than Faastlane, and 90% lower than SONIC without increasing the $ cost.

TLDR

This work proposes WISEFUSE, an automated approach to generate an optimized execution plan for serverless DAGs for a user-specified latency objective or budget and implements it experimentally, showing significant improvements in E2E latency and cost.

WWW

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

  • C. Hays, Zachary Schutzman, Manish Raghavan, Erin Walk, Philipp Zimmer

  • Proceedings of the ACM Web Conference 2023

  • January 17, 2023

Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection. These tools employ machine learning and often achieve near-perfect performance for classification on existing datasets, suggesting bot detection is accurate, reliable and fit for use in downstream applications. We provide evidence that this is not the case and show that high performance is attributable to limitations in dataset collection and labeling rather than sophistication of the tools. Specifically, we show that simple decision rules — shallow decision trees trained on a small number of features — achieve near-state-of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets. Our findings reveal that predictions are highly dependent on each dataset’s collection and labeling procedures rather than fundamental differences between bots and humans. These results have important implications for both transparency in sampling and labeling procedures and potential biases in research using existing bot detection tools for pre-processing.

TLDR

It is shown that simple decision rules — shallow decision trees trained on a small number of features — achieve near-state- of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets.

Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways

  • Francesco Fabbri, Yanhao Wang, F. Bonchi, C. Castillo, M. Mathioudakis

  • Proceedings of the ACM Web Conference 2022

  • February 1, 2022

Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards more and more radicalized content, eventually being trapped in what we call a “radicalization pathway”. In this paper, we study the problem of mitigating radicalization pathways using a graph-based approach. Specifically, we model the set of recommendations of a “what-to-watch-next” recommender as a d-regular directed graph where nodes correspond to content items, links to recommendations, and paths to possible user sessions. We measure the “segregation” score of a node representing radicalized content as the expected length of a random walk from that node to any node representing non-radicalized content. High segregation scores are associated to larger chances to get users trapped in radicalization pathways. Hence, we define the problem of reducing the prevalence of radicalization pathways by selecting a small number of edges to “rewire”, so to minimize the maximum of segregation scores among all radicalized nodes, while maintaining the relevance of the recommendations. We prove that the problem of finding the optimal set of recommendations to rewire is NP-hard and NP-hard to approximate within any factor. Therefore, we turn our attention to heuristics, and propose an efficient yet effective greedy algorithm based on the absorbing random walk theory. Our experiments on real-world datasets in the context of video and news recommendations confirm the effectiveness of our proposal.

TLDR

This paper models the set of recommendations of a “what-to-watch-next” recommender as a d-regular directed graph where nodes correspond to content items, links to recommendations, and paths to possible user sessions, and proposes an efficient yet effective greedy algorithm based on the absorbing random walk theory.

Databases

SIGMOD

Predicate Pushdown for Data Science Pipelines

  • Cong Yan, Yin Lin, Yeye He

  • Proceedings of the ACM on Management of Data

  • June 13, 2023

Predicate pushdown is a widely adopted query optimization. Existing systems and prior work mostly use pattern-matching rules to decide when a predicate can be pushed through certain operators like join or groupby. However, challenges arise in optimizing for data science pipelines due to the widely used non-relational operators and user-defined functions (UDF) that existing rules would fail to cover. In this paper, we present MagicPush, which decides predicate pushdown using a search-verification approach.MagicPush searches for candidate predicates on pipeline input, which is often not the same as the predicate to be pushed down, and verifies that the pushdown does not change pipeline output with full correctness guarantees. Our evaluation on TPC-H queries and 200 real-world pipelines sampled from GitHub Notebooks shows that MagicPush substantially outperforms a strong baseline that uses a union of rules from prior work - it is able to discover new pushdown opportunities and better optimize 42 real-world pipelines with up to 99% reduction in running time, while discovering all pushdown opportunities found by the existing baseline on remaining cases.

TLDR

This paper presents MagicPush, which decides predicate pushdown using a search-verification approach, and is able to discover new pushdown opportunities and better optimize 42 real-world pipelines with up to 99% reduction in running time.

Detecting Logic Bugs of Join Optimizations in DBMS

  • Xiu Tang, Sai Wu, Dongxiang Zhang, F. Li, Gang Chen

  • Proceedings of the ACM on Management of Data

  • May 30, 2023

Generation-based testing techniques have shown their effectiveness in detecting logic bugs of DBMS, which are often caused by improper implementation of query optimizers. Nonetheless, existing generation-based debug tools are limited to single-table queries and there is a substantial research gap regarding multi-table queries with join operators. In this paper, we propose TQS, a novel testing framework targeted at detecting logic bugs derived by queries involving multi-table joins. Given a target DBMS, TQS achieves the goal with two key components: Data-guided Schema and Query Generation (DSG) and Knowledge-guided Query Space Exploration (KQE). DSG addresses the key challenge of multi-table query debugging: how to generate ground-truth (query, result) pairs for verification. It adopts the database normalization technique to generate a testing schema and maintains a bitmap index for result tracking. To improve debug efficiency, DSG also artificially inserts some noises into the generated data. To avoid repetitive query space search, KQE forms the problem as isomorphic graph set discovery and combines the graph embedding and weighted random walk for query generation. We evaluated TQS on four popular DBMSs: MySQL, MariaDB, TiDB and PolarDB. Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems. It successfully detected 115 bugs within 24 hours, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB, and 23 in PolarDB respectively.

TLDR

Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems, and successfully detected 115 bugs within 24 hours.

PG-Schema: Schemas for Property Graphs

  • A. Bonifati, Stefania Dumbrava, G. Fletcher, J. Hidders, Bei Li, L. Libkin, W. Martens, Filip Murlak, Stefan Plantikow, Ognjen Savkovi'c, Juan Sequeda, S. Staworko, Dominik Tomaszuk, H. Voigt, Domagoj Vrgovc, Mingxi Wu

  • Proceedings of the ACM on Management of Data

  • November 20, 2022

Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL Standard will include a rich DDL. Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. It features PG-Schema with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism. We provide the formal syntax and semantics of PG-Schema, which meet principled design requirements grounded in contemporary property graph management scenarios, and offer a detailed comparison of its features with those of existing schema languages and graph database systems.

TLDR

PG-Schema is proposed, a simple yet powerful formalism for specifying property graph schemas that meets principled design requirements grounded in contemporary property graph management scenarios, and a detailed comparison of its features with those of existing schema languages and graph database systems is offered.

R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys

  • Wei Dong, Juanru Fang, K. Yi, Yuchao Tao, Ashwin Machanavajjhala

  • Proceedings of the 2022 International Conference on Management of Data

  • June 10, 2022

Answering SPJA queries under differential privacy (DP), including graph pattern counting under node-DP as an important special case, has received considerable attention in recent years. The dual challenge of foreign-key constraints and self-joins is particularly tricky to deal with, and no existing DP mechanisms can correctly handle both. For the special case of graph pattern counting under node-DP, the existing mechanisms are correct (i.e., satisfy DP), but they do not offer nontrivial utility guarantees or are very complicated and costly. In this paper, we propose the first DP mechanism for answering arbitrary SPJA queries in a database with foreign-key constraints. Meanwhile, it achieves a fairly strong notion of optimality, which can be considered as a small and natural relaxation of instance optimality. Finally, our mechanism is simple enough that it can be easily implemented on top of any RDBMS and an LP solver. Experimental results show that it offers order-of-magnitude improvements in terms of utility over existing techniques, even those specifically designed for graph pattern counting.

TLDR

This paper proposes the first DP mechanism for answering arbitrary SPJA queries in a database with foreign-key constraints, and shows that it offers order-of-magnitude improvements in terms of utility over existing techniques, even those specifically designed for graph pattern counting.

VLDB

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

  • Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chauduri

  • Proceedings of the VLDB Endowment

  • July 1, 2023

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables "in the wild". Our survey of real spreadsheet-tables and web-tables shows that over 30% of such tables do not conform to the relational standard, for which complex table-restructuring transformations are needed before these tables can be queried easily using SQL-based tools. Unfortunately, the required transformations are non-trivial to program, which has become a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Tableau forums. We develop an Auto-Tables system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages), to transform non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations. We compile an extensive benchmark for this new task, by collecting 244 real test cases from user spreadsheets and online forums. Our evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.

TLDR

An Auto-Tables system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages), to transform non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations.

Latest News & Updates

Case Study: Iterative Design for Skimming Support

Case Study: Iterative Design for Skimming Support

How might we help researchers quickly assess the relevance of scientific literature? Take a closer look at Skimming, Semantic Reader’s latest AI feature, and the collaborative design process behind it.

Behind the Scenes of Semantic Scholar’s New Author Influence Design

Behind the Scenes of Semantic Scholar’s New Author Influence Design

We released a new version of Author Influence interface to help scholars better discover other scholars in their fields. Here's how we identified user insights and made those design choices.

Artificial-intelligence search engines wrangle academic literature

Artificial-intelligence search engines wrangle academic literature

Nature had a chat with Dan Weld, Chief Scientist at Semantic Scholar, to discuss how search engines are helping scientists explore and innovate by making it easier to draw connections from a massive collection of scientific literature.

Experience a smarter way to search and discover scholarly research.

Create Your Account