Scholar's Hub

Award-Winning Papers: Systems & Databases

These papers have received best paper awards or distinguished paper awards from renowned computer science conferences in the Systems and Databases fields.

This collection is sourced from each conference. If you notice any errors, please contact us.

Illustration: Trending Papers

Systems

ESEC/FSE

The evolution of type annotations in python: an empirical study

  • L. Grazia, Michael Pradel

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Type annotations and gradual type checkers attempt to reveal errors and facilitate maintenance in dynamically typed programming languages. Despite the availability of these features and tools, it is currently unclear how quickly developers are adopting them, what strategies they follow when doing so, and whether adding type annotations reveals more type errors. This paper presents the first large-scale empirical study of the evolution of type annotations and type errors in Python. The study is based on an analysis of 1,414,936 type annotation changes, which we extract from 1,123,393 commits among 9,655 projects. Our results show that (i) type annotations are getting more popular, and once added, often remain unchanged in the projects for a long time, (ii) projects follow three evolution patterns for type annotation usage -- regular annotation, type sprints, and occasional uses -- and that the used pattern correlates with the number of contributors, (iii) more type annotations help find more type errors (0.704 correlation), but nevertheless, many commits (78.3%) are committed despite having such errors. Our findings show that better developer training and automated techniques for adding type annotations are needed, as most code still remains unannotated, and they call for a better integration of gradual type checking into the development process.

TLDR

The findings show that better developer training and automated techniques for adding type annotations are needed, as most code still remains unannotated, and they call for a better integration of gradual type checking into the development process.

Asynchronous technical interviews: reducing the effect of supervised think-aloud on communication ability

  • Mahnaz Behroozi, Chris Parnin, Chris Brown

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Software engineers often face a critical test before landing a job—passing a technical interview. During these sessions, candidates must write code while thinking aloud as they work toward a solution to a problem under the watchful eye of an interviewer. While thinking aloud during technical interviews gives interviewers a picture of candidates’ problem-solving ability, surprisingly, these types of interviews often prevent candidates from communicating their thought process effectively. To understand if poor performance related to interviewer presence can be reduced while preserving communication and technical skills, we introduce asynchronous technical interviews—where candidates submit recordings of think-aloud and coding. We compare this approach to traditional whiteboard interviews and find that, by eliminating interviewer supervision, asynchronicity significantly improved the clarity of think-aloud via increased informativeness and reduced stress. Moreover, we discovered asynchronous technical interviews preserved, and in some cases even enhanced, technical problem-solving strategies and code quality. This work offers insight into asynchronous technical interviews as a design for supporting communication during interviews, and discusses trade-offs and guidelines for implementing this approach in software engineering hiring practices.

TLDR

This work compares this approach to traditional whiteboard interviews and finds that, by eliminating interviewer supervision, asynchronicity significantly improved the clarity of think-aloud via increased informativeness and reduced stress.

SPINE: a scalable log parser with feedback guidance

  • Xuheng Wang, Xu Zhang, Liqun Li, Shilin He, Hongyu Zhang, Yudong Liu, Ling Zheng, Yu Kang, Qingwei Lin, Yingnong Dang, S. Rajmohan, Dongmei Zhang

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Log parsing, which extracts log templates and parameters, is a critical prerequisite step for automated log analysis techniques. Though existing log parsers have achieved promising accuracy on public log datasets, they still face many challenges when applied in the industry. Through studying the characteristics of real-world log data and analyzing the limitations of existing log parsers, we identify two problems. Firstly, it is non-trivial to scale a log parser to a vast number of logs, especially in real-world scenarios where the log data is extremely imbalanced. Secondly, existing log parsers overlook the importance of user feedback, which is imperative for parser fine-tuning under the continuous evolution of log data. To overcome the challenges, we propose SPINE, which is a highly scalable log parser with user feedback guidance. Based on our log parser equipped with initial grouping and progressive clustering,we propose a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data. Besides, we introduce user feedback to make the parser fast adapt to the evolving logs. We evaluated SPINE on 16 public log datasets. SPINE achieves more than 0.90 parsing accuracy on average with the highest parsing efficiency, which outperforms the state-of-the-art log parsers. We also evaluated SPINE in the production environment of Microsoft, in which SPINE can parse 30million logs in less than 8 minutes under 16 executors, achieving near real-time performance. In addition, our evaluations show that SPINE can consistently achieve good accuracy under log evolution with a moderate number of user feedback.

TLDR

This work proposes SPINE, which is a highly scalable log parser with user feedback guidance, and proposes a novel log data scheduling algorithm to improve the efficiency of parallelization under the large-scale imbalanced log data.

Using nudges to accelerate code reviews at scale

  • Qianhua Shan, D. Sukhdeo, Qianying Huang, Seth Rogers, Lawrence Chen, Elise Paradis, Peter C. Rigby, Nachiappan Nagappan

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

We describe a large-scale study to reduce the amount of time code review takes. Each quarter at Meta we survey developers. Combining sentiment data from a developer experience survey and telemetry data from our diff review tool, we address, “When does a diff review feel too slow?” From the sentiment data alone, we learn that 84.7% of developers are satisfied with the time their diffs spend in review. By enriching the survey results with telemetry for each respondent, we determined that sentiment is closely associated with the 75th percentile time in review for that respondent’s diffs, ie those that take more than 24 hours. To encourage developers to act on stale diffs that have had no action for 24 or more hours, we designed a NudgeBot to notify, ie nudge, reviewers. To determine who to nudge when a diff is stale, we created a model to rank the reviewers based on the probability that they will make a comment or perform some other action on a diff. This model outperformed models that looked at files the reviewer had modified in the past. Combining this information with prior author-review relationships, we achieved an MRR and AUC of .81 and .88, respectively. To evaluate NudgeBot in production, we conducted an A/B cluster-randomized experiment on over 30k engineers. We observed substantial statistically significant decrease in both time in review (-6.8%, p=0.049) and time to first reviewer action (-9.9%, p=0.010). We also used guard metrics to ensure that most reviews were still done in fewer than 24 hours and that reviewers still spend the same amount of time looking at diffs, and saw no statistically significant change in these metrics. NudgeBot is now rolled out company wide and is used daily by thousands of engineers at Meta.

TLDR

A large-scale study to reduce the amount of time code review takes is described and a model to rank the reviewers based on the probability that they will make a comment or perform some other action on a diff is created.

Online testing of RESTful APIs: promises and challenges

  • Alberto Martin-Lopez, Sergio Segura, Antonio Ruiz-Cortés

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Online testing of web APIs—testing APIs in production—is gaining traction in industry. Platforms such as RapidAPI and Sauce Labs provide online testing and monitoring services of web APIs 24/7, typically by re-executing manually designed test cases on the target APIs on a regular basis. In parallel, research on the automated generation of test cases for RESTful APIs has seen significant advances in recent years. However, despite their promising results in the lab, it is unclear whether research tools would scale to industrial-size settings and, more importantly, how they would perform in an online testing setup, increasingly common in practice. In this paper, we report the results of an empirical study on the use of automated test case generation methods for online testing of RESTful APIs. Specifically, we used the RESTest framework to automatically generate and execute test cases in 13 industrial APIs for 15 days non-stop, resulting in over one million test cases. To scale at this level, we had to transition from a monolithic tool approach to a multi-bot architecture with over 200 bots working cooperatively in tasks like test generation and reporting. As a result, we uncovered about 390K failures, which we conservatively triaged into 254 bugs, 65 of which have been acknowledged or fixed by developers to date. Among others, we identified confirmed faults in the APIs of Amadeus, Foursquare, Yelp, and YouTube, accessed by millions of applications worldwide. More importantly, our reports have guided developers on improving their APIs, including bug fixes and documentation updates in the APIs of Amadeus and YouTube. Our results show the potential of online testing of RESTful APIs as the next must-have feature in industry, but also some of the key challenges to overcome for its full adoption in practice.

TLDR

The results of an empirical study on the use of automated test case generation methods for online testing of RESTful APIs used the RESTest framework to automatically generate and execute test cases in 13 industrial APIs for 15 days non-stop, resulting in over one million test cases.

Minerva: browser API fuzzing with dynamic mod-ref analysis

  • Chijin Zhou, Quan Zhang, Mingzhe Wang, Lihua Guo, Jie Liang, Zhe Liu, Mathias Payer, Yuting Jiang

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • November 7, 2022

Browser APIs are essential to the modern web experience. Due to their large number and complexity, they vastly expand the attack surface of browsers. To detect vulnerabilities in these APIs, fuzzers generate test cases with a large amount of random API invocations. However, the massive search space formed by arbitrary API combinations hinders their effectiveness: since randomly-picked API invocations unlikely interfere with each other (i.e., compute on partially shared data), few interesting API interactions are explored. Consequently, reducing the search space by revealing inter-API relations is a major challenge in browser fuzzing. We propose Minerva, an efficient browser fuzzer for browser API bug detection. The key idea is to leverage API interference relations to reduce redundancy and improve coverage. Minerva consists of two modules: dynamic mod-ref analysis and guided code generation. Before fuzzing starts, the dynamic mod-ref analysis module builds an API interference graph. It first automatically identifies individual browser APIs from the browser’s code base. Next, it instruments the browser to dynamically collect mod-ref relations between APIs. During fuzzing, the guided code generation module synthesizes highly-relevant API invocations guided by the mod-ref relations. We evaluate Minerva on three mainstream browsers, i.e. Safari, FireFox, and Chromium. Compared to state-of-the-art fuzzers, Minerva improves edge coverage by 19.63% to 229.62% and finds 2x to 3x more unique bugs. Besides, Minerva has discovered 35 previously-unknown bugs out of which 20 have been fixed with 5 CVEs assigned and acknowledged by browser vendors.

TLDR

Minerva is proposed, an efficient browser fuzzer for browser API bug detection that improves edge coverage by 19.63% to 229.62% and finds 2x to 3x more unique bugs.

First come first served: the impact of file position on code review

  • Enrico Fregnan, Larissa Braz, Marco D'Ambros, Gul cCalikli, Alberto Bacchelli

  • Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

  • August 8, 2022

The most popular code review tools (e.g., Gerrit and GitHub) present the files to review sorted in alphabetical order. Could this choice or, more generally, the relative position in which a file is presented bias the outcome of code reviews? We investigate this hypothesis by triangulating complementary evidence in a two-step study. First, we observe developers’ code review activity. We analyze the review comments pertaining to 219,476 Pull Requests (PRs) from 138 popular Java projects on GitHub. We found files shown earlier in a PR to receive more comments than files shown later, also when controlling for possible confounding factors: e.g., the presence of discussion threads or the lines added in a file. Second, we measure the impact of file position on defect finding in code review. Recruit- ing 106 participants, we conduct an online controlled experiment in which we measure participants’ performance in detecting two unrelated defects seeded into two different files. Participants are assigned to one of two treatments in which the position of the defective files is switched. For one type of defect, participants are not affected by its file’s position; for the other, they have 64% lower odds to identify it when its file is last as opposed to first. Overall, our findings provide evidence that the relative position in which files are presented has an impact on code reviews’ outcome; we discuss these results and implications for tool design and code review.

TLDR

Evidence is provided that the relative position in which files are presented has an impact on code reviews’ outcome; the results and implications for tool design and code review are discussed.

HPCA

DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing

  • Zhe Zhou, Cong Li, Fan Yang, Guangyu Suny

  • 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

  • February 1, 2023

DIMM-based near-memory processing architectures (DIMM-NMP) have received growing interest from both academia and industry. They have the advantages of large memory capacity, low manufacturing cost, high flexibility, compatible form factor, etc. However, inter-DIMM communication (IDC) has become a critical obstacle for generic DIMM-NMP architectures because it involves costly forwarding transactions through the host CPU. Recent research has demonstrated that, for many applications, the overhead induced by IDC may even offset the performance and energy benefits of near-memory processing.To tackle this problem, we propose DIMM-Link, which enables high-performance IDC in DIMM-NMP architectures and supports seamless integration with existing host memory systems. It adopts bidirectional external data links to connect DIMMs, via which point-to-point communication and inter-DIMM broadcast are efficiently supported in a packet-routing way. We present the full-stack design of DIMM-Link, including the hardware architecture, interconnect protocol, system organization, routing mechanisms, optimization strategies, etc. Comprehensive experiments on typical data-intensive tasks demonstrate that the DIMM-Link-equipped NMP system can achieve a 5.93× average speedup over the 16-core CPU baseline. Compared to other IDC methods, DIMM-Link outperforms MCN, AIM, and ABC-DIMM by 2.42×, 1.87×, and 1.77×, respectively. More importantly, DIMM-Link fully considers the implementation feasibility and system integration constraints, which are critical for designing NMP architectures based on modern DDR4/DDR5 DIMMs.

TLDR

DIMM-Link is proposed, which enables high-performance IDC in DIMm-NMP architectures and supports seamless integration with existing host memory systems and fully considers the implementation feasibility and system integration constraints, which are critical for designing NMP architectures based on modern DDR4/DDR5 DIMMs.

Scalable and Secure Row-Swap: Efficient and Safe Row Hammer Mitigation in Memory Systems

  • Jeonghyun Woo, Gururaj Saileshwar, Prashant J. Nair

  • 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

  • December 23, 2022

As Dynamic Random Access Memories (DRAM) scale, they are becoming increasingly susceptible to Row Hammer. By rapidly activating rows of DRAM cells (aggressor rows), attackers can exploit inter-cell interference through Row Hammer to flip bits in neighboring rows (victim rows). A recent work, called Randomized Row-Swap (RRS), proposed proactively swapping aggressor rows with randomly selected rows before an aggressor row can cause Row Hammer.Our paper observes that RRS is neither secure nor scalable. We first propose the ‘Juggernaut attack pattern’ that breaks RRS in under 1 day. Juggernaut exploits the fact that the mitigative action of RRS, a swap operation, can itself induce additional target row activations, defeating such a defense. Second, this paper proposes a new defense Secure Row-Swap mechanism that avoids the additional activations from swap (and unswap) operations and protects against Juggernaut. Furthermore, this paper extends Secure Row-Swap with attack detection to defend against even future attacks. While this provides better security, it also allows for securely reducing the frequency of swaps, thereby enabling Scalable and Secure Row-Swap. The Scalable and Secure Row-Swap mechanism provides years of Row Hammer protection with 3.3× lower storage overheads as compared to the RRS design. It incurs only a 0.7% slowdown as compared to a not-secure baseline for a Row Hammer threshold of 1200.

TLDR

The ‘Juggernaut attack pattern’ that breaks RRS in under 1 day is proposed, and a new defense Secure Row-Swap mechanism is proposed that avoids the additional activations from swap (and unswap) operations and protects against Juggernaut.

SupermarQ: A Scalable Quantum Benchmark Suite

  • T. Tomesh, P. Gokhale, V. Omole, Gokul Subramanian Ravi, Kaitlin N. Smith, Joshua Viszlai, Xin-Chuan Wu, Nikos Hardavellas, M. Martonosi, F. Chong

  • 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

  • February 22, 2022

The emergence of quantum computers as a new computational paradigm has been accompanied by speculation concerning the scope and timeline of their anticipated revolutionary changes. While quantum computing is still in its infancy, the variety of different architectures used to implement quantum computations make it difficult to reliably measure and compare performance. This problem motivates our introduction of SupermarQ, a scalable, hardware-agnostic quantum benchmark suite which uses application-level metrics to measure performance. SupermarQ is the first attempt to systematically apply techniques from classical benchmarking methodology to the quantum domain. We define a set of feature vectors to quantify coverage, select applications from a variety of domains to ensure the suite is representative of real workloads, and collect benchmark results from the IBM, IonQ, and AQT@LBNL platforms. Looking forward, we envision that quantum benchmarking will encompass a large cross-community effort built on open source, constantly evolving benchmark suites. We introduce SupermarQ as an important step in this direction.

TLDR

SupermarQ is the first attempt to systematically apply techniques from classical benchmarking methodology to the quantum domain, and envision that quantum benchmarking will encompass a large cross-community effort built on open source, constantly evolving benchmark suites.

ICSE

"STILL AROUND": Experiences and Survival Strategies of Veteran Women Software Developers

  • S. V. Breukelen, A. Barcomb, Sebastian Baltes, A. Serebrenik

  • ArXiv

  • February 7, 2023

The intersection of ageism and sexism can create a hostile environment for veteran software developers belonging to marginalized genders. In this study, we conducted 14 interviews to examine the experiences of people at this intersection, primarily women, in order to discover the strategies they employed in order to successfully remain in the field. We identified 283 codes, which fell into three main categories: Strategies, Experiences, and Perception. Several strategies we identified, such as (Deliberately) Not Trying to Look Younger, were not previously described in the software engineering literature. We found that, in some companies, older women developers are recognized as having particular value, further strengthening the known benefits of diversity in the workforce. Based on the experiences and strategies, we suggest organizations employing software developers to consider the benefits of hiring veteran women software developers. For example, companies can draw upon the life experiences of older women developers in order to better understand the needs of customers from a similar demographic. While we recognize that many of the strategies employed by our study participants are a response to systemic issues, we still consider that, in the short-term, there is benefit in describing these strategies for developers who are experiencing such issues today.

TLDR

A Qualitative Study on the Implementation Design Decisions of Developers

  • Jenny Liang, Maryam Arab, Minhyuk Ko, Amy J. Ko, Thomas D. LaToza

  • ArXiv

  • January 24, 2023

Decision-making is a key software engineering skill. Developers constantly make choices throughout the software development process, from requirements to implementation. While prior work has studied developer decision-making, the choices made while choosing what solution to write in code remain understudied. In this mixed-methods study, we examine the phenomenon where developers select one specific way to implement a behavior in code, given many potential alternatives. We call these decisions implementation design decisions. Our mixed-methods study includes 46 survey responses and 14 semi-structured interviews with professional developers about their decision types, considerations, processes, and expertise for implementation design decisions. We find that implementation design decisions, rather than being a natural outcome from higher levels of design, require constant monitoring of higher level design choices, such as requirements and architecture. We also show that developers have a consistent general structure to their implementation decision-making process, but no single process is exactly the same. We discuss the implications of our findings on research, education, and practice, including insights on teaching developers how to make implementation design decisions.

TLDR

It is found that implementation design decisions, rather than being a natural outcome from higher levels of design, require constant monitoring of higher level design choices, such as requirements and architecture.

Compatible Remediation on Vulnerabilities from Third-Party Libraries for Java Projects

  • Lyuye Zhang, Chengwei Liu, Zhengzi Xu, Sen Chen, Lingling Fan, Lida Zhao, Jiahui Wu, Yang Liu

  • ArXiv

  • January 20, 2023

With the increasing disclosure of vulnerabilities in open-source software, software composition analysis (SCA) has been widely applied to reveal third-party libraries and the associated vulnerabilities in software projects. Beyond the revelation, SCA tools adopt various remediation strategies to fix vulnerabilities, the quality of which varies substantially. However, ineffective remediation could induce side effects, such as compi- lation failures, which impede acceptance by users. According to our studies, existing SCA tools could not correctly handle the concerns of users regarding the compatibility of remediated projects. To this end, we propose Compatible Remediation of Third-party libraries (C ORAL ) for Maven projects to fix vulnerabilities without breaking the projects. The evaluation proved that C ORAL not only fixed 87 . 56% of vulnerabilities which outperformed other tools (best 75 . 32% ) and achieved a 98 . 67% successful compilation rate and a 92 . 96% successful unit test rate. Furthermore, we found that 78 . 45% of vulnerabilities in popular Maven projects could be fixed without breaking the compilation, and the rest of the vulnerabilities ( 21 . 55% ) could either be fixed by upgrades that break the compilations or even be impossible to fix by upgrading.

TLDR

Compatible Remediation of Third-party libraries (C ORAL) for Maven projects to vulnerabilities without breaking the projects is proposed and it is found that 78% of vulnerabilities in popular Maven Projects could be fixed without breaks the compilation, and the rest of the vulnerabilities could either be broken by upgrades that break the compilations or even be impossible to break by upgrading.

Do I Belong? Modeling Sense of Virtual Community Among Linux Kernel

  • Bianca Trinkenreich, Klaas-Jan Stol, A. Sarma, D. Germán, M. Gerosa, Igor Steinmacher

  • ArXiv

  • January 16, 2023

The sense of belonging to a community is a basic human need that impacts an individuals behavior, long-term engagement, and job satisfaction, as revealed by research in disciplines such as psychology, healthcare, and education. Despite much research on how to retain developers in Open Source Software projects and other virtual, peer-production communities, there is a paucity of research investigating what might contribute to a sense of belonging in these communities. To that end, we develop a theoretical model that seeks to understand the link between OSS developer motives and a Sense of Virtual Community. We test the model with a dataset collected in the Linux Kernel developer community, using structural equation modeling techniques. Our results for this case study show that intrinsic motivations - social or hedonic motives - are positively associated with a sense of virtual community, but living in an authoritative country and being paid to contribute can reduce the sense of virtual community. Based on these results, we offer suggestions for open source projects to foster a sense of virtual community, with a view to retaining contributors and improving projects sustainability.

TLDR

A theoretical model is developed that seeks to understand the link between OSS developer motives and a Sense of Virtual Community and shows that intrinsic motivations - social or hedonic motives - are positively associated with a sense of virtual community, but living in an authoritative country and being paid to contribute can reduce the sense ofvirtual community.

Efficiency Matters: Speeding Up Automated Testing with GUI Rendering Inference

  • Sidong Feng, Mulong Xie, Chunyang Chen

  • December 10, 2022

Due to the importance of Android app quality assurance, many automated GUI testing tools have been developed. Although the test algorithms have been improved, the impact of GUI rendering has been overlooked. On the one hand, setting a long waiting time to execute events on fully rendered GUIs slows down the testing process. On the other hand, setting a short waiting time will cause the events to execute on partially rendered GUIs, which negatively affects the testing effectiveness. An optimal waiting time should strike a balance between effectiveness and efficiency. We propose AdaT, a lightweight image-based approach to dynamically adjust the inter-event time based on GUI rendering state. Given the real-time streaming on the GUI, AdaT presents a deep learning model to infer the rendering state, and synchronizes with the testing tool to schedule the next event when the GUI is fully rendered. The evaluations demonstrate the accuracy, efficiency, and effectiveness of our approach. We also integrate our approach with the existing automated testing tool to demonstrate the usefulness of AdaT in covering more activities and executing more events on fully rendered GUIs.

TLDR

AdaT, a lightweight image-based approach to dynamically adjust the inter-event time based on GUI rendering state is proposed, given the real-time streaming on the GUI, which presents a deep learning model to infer the rendering state, and synchronizes with the testing tool to schedule the next event when the GUI is fully rendered.

An Empirical Investigation on the Challenges Faced by Women in the Software Industry: A Case Study

  • Bianca Trinkenreich, Ricardo Britto, M. Gerosa, Igor Steinmacher

  • 2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)

  • March 20, 2022

Context: Addressing women's under-representation in the soft-ware industry, a widely recognized concern, requires attracting as well as retaining more women. Hearing from women practitioners, particularly those positioned in multi-cultural settings, about their challenges and and adopting their lived experienced solutions can support the design of programs to resolve the under-representation issue. Goal: We investigated the challenges women face in global software development teams, particularly what motivates women to leave their company; how those challenges might break down according to demographics; and strategies to mitigate the identified challenges. Method: To achieve this goal, we conducted an ex-ploratory case study in Ericsson, a global technology company. We surveyed 94 women and employed mixed-methods to analyze the data. Results: Our findings reveal that women face socio-cultural challenges, including work-life balance issues, benevolent and hos-tile sexism, lack of recognition and peer parity, impostor syndrome, glass ceiling bias effects, the prove-it-again phenomenon, and the maternal wall. The participants of our research provided different suggestions to address/mitigate the reported challenges, including sabbatical policies, flexibility of location and time, parenthood support, soft skills training for managers, equality of payment and opportunities between genders, mentoring and role models to sup-port career growth, directives to hire more women, inclusive groups and events, women's empowerment, and recognition for women's success. The framework of challenges and suggestions can inspire further initiatives both in academia and industry to onboard and retain women.

TLDR

The challenges women face in global software development teams, particularly what motivates women to leave their company; how those challenges might break down according to demographics; and strategies to mitigate the identified challenges are investigated.

INFOCOM

More than Enough is Too Much: Adaptive Defenses against Gradient Leakage in Production Federated Learning

  • Fei Wang, Ethan Hugh, Baochun Li

  • December 31, 2022

With increasing concerns on privacy leakage from gradients, a variety of attack mechanisms emerged to recover private data from gradients at an honest-but-curious server, which challenged the primary advantage of privacy protection in federated learning. However, we cast doubt upon the real impact of these gradient attacks on production federated learning systems. By taking away several impractical assumptions that the literature has made, we find that gradient attacks pose a limited degree of threat to the privacy of raw data. Through a comprehensive evaluation on existing gradient attacks in a federated learning system with practical assumptions, we have systematically analyzed their effectiveness under a wide range of configurations. We present key priors required to make the attack possible or stronger, such as a narrow distribution of initial model weights, as well as inversion at early stages of training. We then propose a new lightweight defense mechanism that provides sufficient and self-adaptive protection against timevarying levels of the privacy leakage risk throughout the federated learning process. As a variation of gradient perturbation method, our proposed defense, called OUTPOST, selectively adds Gaussian noise to gradients at each update iteration according to the Fisher information matrix, where the level of noise is determined by the privacy leakage risk quantified by the spread of model weights at each layer. To limit the computation overhead and training performance degradation, OUTPOST only performs perturbation with iteration-based decay. Our experimental results demonstrate that OUTPOST can achieve a much better tradeoff than the state-of-the-art with respect to convergence performance, computational overhead, and protection against gradient attacks.

TLDR

It is found that gradient attacks pose a limited degree of threat to the privacy of raw data and a new lightweight defense mechanism is proposed, called OUTPOST, that provides sufficient and self-adaptive protection against timevarying levels of the privacy leakage risk throughout the federated learning process.

ChARM: NextG Spectrum Sharing Through Data-Driven Real-Time O-RAN Dynamic Control

  • L. Baldesi, Francesco Restuccia, T. Melodia

  • IEEE INFOCOM 2022 - IEEE Conference on Computer Communications

  • January 17, 2022

Today’s radio access networks (RANs) are monolithic entities which often operate statically on a given set of parameters for the entirety of their operations. To implement realistic and effective spectrum sharing policies, RANs will need to seamlessly and intelligently change their operational parameters. In stark contrast with existing paradigms, the new O-RAN architectures for 5G-and-beyond networks (NextG) separate the logic that controls the RAN from its hardware substrate, allowing unprecedented real-time fine-grained control of RAN components. In this context, we propose the Channel-Aware Reactive Mechanism (ChARM), a data-driven O-RAN-compliant framework that allows (i) sensing the spectrum to infer the presence of interference and (ii) reacting in real time by switching the distributed unit (DU) and radio unit (RU) operational parameters according to a specified spectrum access policy. ChARM is based on neural networks operating directly on unprocessed I/Q waveforms to determine the current spectrum context. ChARM does not require any modification to the existing 3GPP standards. It is designed to operate within the O-RAN specifications, and can be used in conjunction with other spectrum sharing mechanisms (e.g., LTE-U, LTE-LAA or MulteFire). We demonstrate the performance of ChARM in the context of spectrum sharing among LTE and Wi-Fi in unlicensed bands, where a controller operating over a RAN Intelligent Controller (RIC) senses the spectrum and switches cell frequency to avoid Wi-Fi. We develop a prototype of ChARM using srsRAN, and leverage the Colosseum channel emulator to collect a large-scale waveform dataset to train our neural networks with. To collect standard-compliant Wi-Fi data, we extended the Colosseum testbed using system-on-chip (SoC) boards running a modified version of the OpenWiFi architecture. Experimental results show that ChARM achieves accuracy of up to 96% on Colosseum and 85% on an over-the-air testbed, demonstrating the capacity of ChARMto exploit the considered spectrum channels.

TLDR

The Channel-Aware Reactive Mechanism (ChARM), a data-driven O-RAN-compliant framework that allows sensing the spectrum to infer the presence of interference and reacting in real time by switching the distributed unit (DU) and radio unit (RU) operational parameters according to a specified spectrum access policy, is proposed.

Scalable Real-Time Bandwidth Fairness in Switches

  • Robert MacDavid, Xiaoqi Chen, J. Rexford

  • December 31, 2021

Network operators want to enforce fair bandwidth sharing between users without solely relying on congestion control running on end-user devices. However, in edge networks (e.g., 5G), the number of user devices sharing a bottleneck link far exceeds the number of queues supported by today’s switch hardware; even accurately tracking per-user sending rates may become too resource-intensive. Meanwhile, traditional software- based queuing on CPUs struggles to meet the high throughput and low latency demanded by 5G users. We propose Approximate Hierarchical Allocation of Bandwidth (AHAB), a per-user bandwidth limit enforcer that runs fully in the data plane of commodity switches. AHAB tracks each user’s approximate traffic rate and compares it against a bandwidth limit, which is iteratively updated via a real-time feedback loop to achieve max-min fairness across users. Using a novel sketch data structure, AHAB avoids storing per-user state, and therefore scales to thousands of slices and millions of users. Furthermore, AHAB supports network slicing, where each slice has a guaranteed share of the bandwidth that can be scavenged by other slices when under-utilized. Evaluation shows AHAB can achieve fair bandwidth allocation within 3.1ms, 13x faster than prior data-plane hierarchical schedulers.

TLDR

Approximate Hierarchical Allocation of Bandwidth (AHAB) is proposed, a per-user bandwidth limit enforcer that runs fully in the data plane of commodity switches and can achieve fair bandwidth allocation within 3.1ms, 13x faster than prior data-plane hierarchical schedulers.

PreGAN: Preemptive Migration Prediction Network for Proactive Fault-Tolerant Edge Computing

  • S. Tuli, G. Casale, N. Jennings

  • IEEE INFOCOM 2022 - IEEE Conference on Computer Communications

  • December 4, 2021

Building a fault-tolerant edge system that can quickly react to node overloads or failures is challenging due to the unreliability of edge devices and the strict service deadlines of modern applications. Moreover, unnecessary task migrations can stress the system network, giving rise to the need for a smart and parsimonious failure recovery scheme. Prior approaches often fail to adapt to highly volatile workloads or accurately detect and diagnose faults for optimal remediation. There is thus a need for a robust and proactive fault-tolerance mechanism to meet service level objectives. In this work, we propose PreGAN, a composite AI model using a Generative Adversarial Network (GAN) to predict preemptive migration decisions for proactive fault-tolerance in containerized edge deployments. PreGAN uses co-simulations in tandem with a GAN to learn a few-shot anomaly classifier and proactively predict migration decisions for reliable computing. Extensive experiments on a Raspberry-Pi based edge environment show that PreGAN can outperform state-of-the-art baseline methods in fault-detection, diagnosis and classification, thus achieving high quality of service. PreGAN accomplishes this by 5.1% more accurate fault detection, higher diagnosis scores and 23.8% lower overheads compared to the best method among the considered baselines.

TLDR

PreGAN, a composite AI model using a Generative Adversarial Network to predict preemptive migration decisions for proactive fault-tolerance in containerized edge deployments and can outperform state-of-the-art baseline methods in fault-detection, diagnosis and classification, thus achieving high quality of service.

ISCA

Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters

  • Kaiyang Zhao, Kaiwen Xue, Ziqi Wang, Dan Schatzberg, Leon Yang, Antonis Manousis, Johannes Weiner, Rik Van Riel, Bikash Sharma, Chunqiang Tang, Dimitrios Skarlatos

  • International Symposium on Computer Architecture

  • June 17, 2023

The unabating growth of the memory needs of emerging datacenter applications has exacerbated the scalability bottleneck of virtual memory. However, reducing the excessive overhead of address translation will remain onerous until the physical memory contiguity predicament gets resolved. To address this problem, this paper presents Contiguitas, a novel redesign of memory management in the operating system and hardware that provides ample physical memory contiguity. We identify that the primary cause of memory fragmentation in Meta's datacenters is unmovable allocations scattered across the address space that impede large contiguity from being formed. To provide ample physical memory contiguity by design, Contiguitas first separates regular movable allocations from unmovable ones by placing them into two different continuous regions in physical memory and dynamically adjusts the boundary of the two regions based on memory demand. Drastically reducing unmovable allocations is challenging because the majority of unmovable pages cannot be moved with software alone given that access to the page cannot be blocked for a migration to take place. Furthermore, page migration is expensive as it requires a long downtime to (a) perform TLB shootdowns that scale poorly with the number of victim TLBs, and (b) copy the page. To this end, Contiguitas eliminates the primary source of unmovable allocations by introducing hardware extensions in the last-level cache to enable the transparent and efficient migration of unmovable pages even while the pages remain in use. We build the operating system component of Contiguitas into the Linux kernel and run our experiments in a production environment at Meta's datacenters. Our results show that Contiguitas's OS component successfully confines unmovable allocations, drastically reducing unmovable 2MB blocks from an average of 31% scattered across the address space down to 7% confined in the unmovable region, leading to significant performance gains. Specifically, we show that for three major production services, Contiguitas achieves end-to-end performance improvements of 2--9% for partially fragmented servers, and 7--18% for highly fragmented servers, which account for nearly a quarter of Meta's fleet. We further use full-system simulations to demonstrate the effectiveness of the hardware extensions of Contiguitas. Our evaluation shows that Contiguitas-HW enables the efficient migration of unmovable allocations, scales well with the number of victim TLBs, and does not affect application performance. We are currently in the process of upstreaming Contiguitas into Linux.

TLDR

Contiguitas, a novel redesign of memory management in the operating system and hardware that provides ample physical memory contiguity, is presented and hardware extensions in the last-level cache enable the transparent and efficient migration of unmovable pages even while the pages remain in use.

NvMR: non-volatile memory renaming for intermittent computing

  • A. Bhattacharyya, Abhijith Somashekhar, Joshua San Miguel

  • Proceedings of the 49th Annual International Symposium on Computer Architecture

  • June 18, 2022

Intermittent systems on energy-harvesting devices have to frequently back up data because of an unreliable energy supply to make forward progress. These devices come with non-volatile memories like Flash/FRAM on board that are used to back up the system state. However, quite paradoxically, writing to a non-volatile memory consumes a lot of energy that makes backups expensive. Idem-potency violations inherent to intermittent programs are major contributors to the problem, as they render system state inconsistent and force backups to occur even when plenty of energy is available. In this work, we first characterize the complex persist dependencies that are unique to intermittent computing. Based on these insights, we propose NvMR, an intermittent architecture that eliminates idempotency violations in the program by renaming non-volatile memory addresses. This can reduce the number of backups to their theoretical minimum and decouple the decision of when to perform backups from the memory access constraints imposed by the program. Our evaluations show that compared to a state-of-the-art intermittent architecture, NvMR can save about 20% energy on average when running common embedded applications.

TLDR

NvMR, an intermittent architecture that eliminates idempotency violations in the program by renaming non-volatile memory addresses is proposed, which can reduce the number of backups to their theoretical minimum and decouple the decision of when to perform backups from the memory access constraints imposed by the program.

ISSTA

An Empirical Study of Functional Bugs in Android Apps

  • Yiheng Xiong, Mengqian Xu, Ting Su, Jingling Sun, Jue Wang, He Wen, G. Pu, Jifeng He, Z. Su

  • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

  • July 12, 2023

Android apps are ubiquitous and serve many aspects of our daily lives. Ensuring their functional correctness is crucial for their success. To date, we still lack a general and in-depth understanding of functional bugs, which hinders the development of practices and techniques to tackle functional bugs. To fill this gap, we conduct the first systematic study on 399 functional bugs from 8 popular open-source and representative Android apps to investigate the root causes, bug symptoms, test oracles, and the capabilities and limitations of existing testing techniques. This study took us substantial effort. It reveals several new interesting findings and implications which help shed light on future research on tackling functional bugs. Furthermore, findings from our study guided the design of a proof-of-concept differential testing tool, RegDroid, to automatically find functional bugs in Android apps. We applied RegDroid on 5 real-world popular apps, and successfully discovered 14 functional bugs, 10 of which were previously unknown and affected the latest released versions—all these 10 bugs have been confirmed and fixed by the app developers. Specifically, 10 out of these 14 found bugs cannot be found by existing testing techniques. We have made all the artifacts (including the dataset of 399 functional bugs and RegDroid) in our work publicly available at https://github.com/Android-Functional-bugs-study/home.

TLDR

This study conducts the first systematic study on 399 functional bugs from 8 popular open-source and representative Android apps to investigate the root causes, bug symptoms, test oracles, and the capabilities and limitations of existing testing techniques.

Improving Bit-Blasting for Nonlinear Integer Constraints

  • Fuqi Jia, Rui Han, Pei Huang, Minghao Liu, Feifei Ma, Jian Zhang

  • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

  • July 12, 2023

Nonlinear integer constraints are common and difficult in the verification and analysis of software/hardware. SMT(QF_NIA) generalizes such constraints, which is a boolean combination of nonlinear integer arithmetic constraints. A classical method to solve SMT(QF_NIA) is bit-blasting, which reduces them to boolean satisfiability problems. Currently, the existing pure bit-blasting based solvers are noncompetitive with other state-of-the-art SMT solvers. The bit-blasting based methods have some problems: First, the bit-blasting method is hampered by nonlinear multiplication operations; second, it sometimes does not search in a proper search space; and third, it contains some redundancy. In this paper, we focus on improving the efficiency of bit-blasting based method. To decide on a proper search space, we proposed an adaptive function for hard nonlinear multiplications, and heuristic strategies to analyze specific constraints. We also found that different orders in successive additions will result in bit vectors with different bit-widths. We proposed an optimal order decision algorithm to save redundancy in successive additions. We implement a solver with the proposed methods named BLAN. Experiments demonstrate that BLAN outperforms other state-of-the-art SMT solvers (APROVE, CVC5, MATHSAT, YICES2, Z3) on the satisfiable SMT(QF_NIA) instances in SMT-LIB. We provide an outlook of BLAN on solving unsatisfiable instances via combining with other solvers. Sensitivity analysis also demonstrates the effectiveness of the proposed methods.

TLDR

This paper proposes an adaptive function for hard nonlinear multiplications, and heuristic strategies to analyze specific constraints, and implements a solver with the proposed methods named BLAN, which outperforms other state-of-the-art SMT solvers on solving unsatisfiable instances via combining with other solvers.

Guiding Greybox Fuzzing with Mutation Testing

  • Vasudev Vikram, Isabella Laybourn, Ao Li, Nicole Nair, Kelton OBrien, Rafaello Sanna, Rohan Padhye

  • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

  • July 12, 2023

Greybox fuzzing and mutation testing are two popular but mostly independent fields of software testing research that have so far had limited overlap. Greybox fuzzing, generally geared towards searching for new bugs, predominantly uses code coverage for selecting inputs to save. Mutation testing is primarily used as a stronger alternative to code coverage in assessing the quality of regression tests; the idea is to evaluate tests for their ability to identify artificially injected faults in the target program. But what if we wanted to use greybox fuzzing to synthesize high-quality regression tests? In this paper, we develop and evaluate Mu2, a Java-based framework for incorporating mutation analysis in the greybox fuzzing loop, with the goal of producing a test-input corpus with a high mutation score. Mu2 makes use of a differential oracle for identifying inputs that exercise interesting program behavior without causing crashes. This paper describes several dynamic optimizations implemented in Mu2 to overcome the high cost of performing mutation analysis with every fuzzer-generated input. These optimizations introduce trade-offs in fuzzing throughput and mutation killing ability, which we evaluate empirically on five real-world Java benchmarks. Overall, variants of Mu2 are able to synthesize test-input corpora with a higher mutation score than state-of-the-art Java fuzzer Zest.

TLDR

This paper develops and evaluates Mu2, a Java-based framework for incorporating mutation analysis in the greybox fuzzing loop, with the goal of producing a test-input corpus with a high mutation score, and describes several dynamic optimizations implemented in Mu2 to overcome the high cost of performing mutation analysis with every fuzzer-generated input.

GrayC: Greybox Fuzzing of Compilers and Analysers for C

  • Karine Even-Mendoza, Arindam Sharma, A. Donaldson, Cristian Cadar

  • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

  • July 12, 2023

Fuzzing of compilers and code analysers has led to a large number of bugs being found and fixed in widely-used frameworks such as LLVM, GCC and Frama-C. Most such fuzzing techniques have taken a blackbox approach, with compilers and code analysers starting to become relatively immune to such fuzzers. We propose a coverage-directed, mutation-based approach for fuzzing C compilers and code analysers, inspired by the success of this type of greybox fuzzing in other application domains. The main challenge of applying mutation-based fuzzing in this context is that naive mutations are likely to generate programs that do not compile. Such programs are not useful for finding deep bugs that affect optimisation, analysis, and code generation routines. We have designed a novel greybox fuzzer for C compilers and analysers by developing a new set of mutations to target common C constructs, and transforming fuzzed programs so that they produce meaningful output, allowing differential testing to be used as a test oracle, and paving the way for fuzzer-generated programs to be integrated into compiler and code analyser regression test suites. We have implemented our approach in GrayC, a new open-source LibFuzzer-based tool, and present experiments showing that it provides more coverage on the middle- and back-end stages of compilers and analysers compared to other mutation-based approaches, including Clang-Fuzzer, PolyGlot, and a technique similar to LangFuzz. We have used GrayC to identify 30 confirmed compiler and code analyser bugs: 25 previously unknown bugs (with 22 of them already fixed in response to our reports) and 5 confirmed bugs reported independently shortly before we found them. A further 3 bug reports are under investigation. Apart from the results above, we have contributed 24 simplified versions of coverage-enhancing test cases produced by GrayC to the Clang/LLVM test suite, targeting 78 previously uncovered functions in the LLVM codebase.

TLDR

A novel greybox fuzzer for C compilers and analysers is designed by developing a new set of mutations to target common C constructs, and transforming fuzzed programs so that they produce meaningful output, allowing differential testing to be used as a test oracle, and paving the way for fuzzer-generated programs to be integrated into compiler and code analyser regression test suites.

Automated Generation of Security-Centric Descriptions for Smart Contract Bytecode

  • Yufei Pan, Zhichao Xu, Li Li, Yunhe Yang, Mu Zhang

  • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

  • July 12, 2023

Smart contract and DApp users are taking great risks, as they do not obtain necessary knowledge that can help them avoid using vulnera- ble and malicious contract code. In this paper, we develop a novel system Tx2TXT that can automatically create security-centric textual descriptions directly from smart contract bytecode. To capture the security aspect of financial applications, we formally define a funds transfer graph to model critical funds flows in smart contracts. To ensure the expressiveness and conciseness of the descriptions de- rived from these graphs, we employ a GCN-based model to identify security-related condition statements and selectively add them to our graph models. To convert low-level bytecode instructions to human- readable textual scripts, we leverage robust API signatures to recover bytecode semantics. We have evaluated Tx2TXT on 890 well-labeled vulnerable, malicious and safe contracts where developer-crafted descriptions are available. Our results have shown that Tx2TXT out- performs state-of-the-art solutions and can effectively help end users avoid risky contracts

TLDR

A novel system Tx2TXT is developed that can automatically create security-centric textual descriptions directly from smart contract bytecode, and leverages robust API signatures to recover bytecode semantics.

More Precise Regression Test Selection via Reasoning about Semantics-Modifying Changes

  • Yu Liu, Jiyang Zhang, Pengyu Nie, Miloš Gligorić, Owolabi Legunsen

  • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

  • July 12, 2023

Regression test selection (RTS) speeds up regression testing by only re-running tests that might be affected by code changes. Ideal RTS safely selects all affected tests and precisely selects only affected tests. But, aiming for this ideal is often slower than re-running all tests. So, recent RTS techniques use program analysis to trade precision for speed, i.e., lower regression testing time, or even use machine learning to trade safety for speed. We seek to make recent analysis-based RTS techniques more precise, to further speed up regression testing. Independent studies suggest that these techniques reached a “performance wall” in the speed-ups that they provide. We manually inspect code changes to discover those that do not require re-running tests that are only affected by such changes. We categorize 29 kinds of changes that we find from five projects into 13 findings, 11 of which are semantics-modifying. We enhance two RTS techniques---Ekstazi and STARTS---to reason about our findings. Using 1,150 versions of 23 projects, we evaluate the impact on safety and precision of leveraging such changes. We also evaluate if our findings from a few projects can speed up regression testing in other projects. The results show that our enhancements are effective and they can generalize. On average, they result in selecting 41.7% and 31.8% fewer tests, and take 33.7% and 28.7% less time than Ekstazi and STARTS, respectively, with no loss in safety.

TLDR

This work seeks to make recent analysis-based RTS techniques more precise, to further speed up regression testing, and evaluates the impact on safety and precision of leveraging such changes.

Eunomia: Enabling User-Specified Fine-Grained Search in Symbolically Executing WebAssembly Binaries

  • Ningyu He, Zhehao Zhao, Jikai Wang, Yubin Hu, Shengjian Guo, Haoyu Wang, Guangtai Liang, Ding Li, Xiangqun Chen, Yao Guo

  • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

  • April 14, 2023

Although existing techniques have proposed automated approaches to alleviate the path explosion problem of symbolic execution, users still need to optimize symbolic execution by applying various searching strategies carefully. As existing approaches mainly support only coarse-grained global searching strategies, they cannot efficiently traverse through complex code structures. In this paper, we propose Eunomia, a symbolic execution technique that supports fine-grained search with local domain knowledge. Eunomia uses Aes, a DSL that lets users specify local searching strategies for different parts of the program. Eunomia also isolates the context of variables for different local searching strategies, avoiding conflicts. We implement Eunomia for WebAssembly, which can analyze applications written in various languages. Eunomia is the first symbolic execution engine that supports the full features of WebAssembly. We evaluate Eunomia with a microbenchmark suite and six real-world applications. Our evaluation shows that Eunomia improves bug detection by up to three orders of magnitude. We also conduct a user study that shows the benefits of using Aes. Moreover, Eunomia verifies six known bugs and detects two new zero-day bugs in Collections-C.

TLDR

Eunomia is the first symbolic execution engine that supports the full features of WebAssembly and uses Aes, a DSL that lets users specify local searching strategies for different parts of the program, avoiding conflicts.

Beware of the Unexpected: Bimodal Taint Analysis

  • Yiu Wai Chow, Max Schäfer, Michael Pradel

  • Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

  • January 25, 2023

Static analysis is a powerful tool for detecting security vulnerabilities and other programming problems. Global taint tracking, in particular, can spot vulnerabilities arising from complicated data flow across multiple functions. However, precisely identifying which flows are problematic is challenging, and sometimes depends on factors beyond the reach of pure program analysis, such as conventions and informal knowledge. For example, learning that a parameter name of an API function locale ends up in a file path is surprising and potentially problematic. In contrast, it would be completely unsurprising to find that a parameter command passed to an API function execaCommand is eventually interpreted as part of an operating-system command. This paper presents Fluffy, a bimodal taint analysis that combines static analysis, which reasons about data flow, with machine learning, which probabilistically determines which flows are potentially problematic. The key idea is to let machine learning models predict from natural language information involved in a taint flow, such as API names, whether the flow is expected or unexpected, and to inform developers only about the latter. We present a general framework and instantiate it with four learned models, which offer different trade-offs between the need to annotate training data and the accuracy of predictions. We implement Fluffy on top of the CodeQL analysis framework and apply it to 250K JavaScript projects. Evaluating on five common vulnerability types, we find that Fluffy achieves an F1 score of 0.85 or more on four of them across a variety of datasets.

TLDR

Fluffy is presented, a bimodal taint analysis that combines static analysis, which reasons about data flow, with machine learning, which probabilistically determines which flows are potentially problematic.

MOBICOM

Magnetoelectric backscatter communication for millimeter-sized wireless biomedical implants

  • Zhanghao Yu, Fatima T. Alrashdan, Wei Wang, M. Parker, Xinyu Chen, Frank Y. Chen, Joshua Woods, Zhiyu Chen, Jacob T. Robinson, Kaiyuan Yang

  • Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

  • October 14, 2022

This paper presents the design, implementation, and experimental evaluation of a wireless biomedical implant platform exploiting the magnetoelectric effect for wireless power and bi-directional communication. As an emerging wireless power transfer method, magnetoelectric is promising for mm-scaled bio-implants because of its superior misalignment sensitivity, high efficiency, and low tissue absorption compared to other modalities [46, 59, 60]. Utilizing the same physical mechanism for power and communication is critical for implant miniaturization, but low-power magnetoelectric uplink communication has not been achieved yet. For the first time, we design and demonstrate near-zero power magnetoelectric backscatter from the mm-sized implants by exploiting the converse magnetostriction effects. The system for demonstration consists of an 8.2-mm3 wireless implantable device and a custom portable transceiver. The implant's ASIC interfacing with the magnetoelectric transducer encodes uplink data by changing the transducer's load, resulting in resonance frequency changes for frequency-shift-keying modulation. The magnetoelectrically backscattered signal is sensed and demodulated through frequency-to-digital conversion by the external transceiver. With design optimizations in data modulation and recovery, the proposed system archives > 1-kbps data rate at the 335-kHz carrier frequency, with a communication distance greater than 2 cm and a bit error rate less than 1E-3. Further, we validate the proposed system for wireless stimulation and sensing, and conducted ex-vivo tests through a 1.5-cm porcine tissue. The proposed magnetoelectric backscatter approach provides a path towards miniaturized wireless bio-implants for advanced biomedical applications like closed-loop neuromodulation.

TLDR

The proposed magnetoelectric backscatter approach provides a path towards miniaturized wireless bio-implants for advanced biomedical applications like closed-loop neuromodulation.

NSDI

LeakyScatter: A Frequency-Agile Directional Backscatter Network Above 100 GHz

  • Atsu Kludze, Yasaman Ghasempour

  • Symposium on Networked Systems Design and Implementation

  • December 31, 2022

Wireless backscattering has been deemed suitable for various emerging energy-constrained applications given its low-power architectures. Although existing backscatter nodes often operate at sub-6 GHz frequency bands, moving to the sub-THz bands offers significant advantages in scaling low-power connectivity to dense user populations; as concurrent transmissions can be separated in both spectral and spatial domains given the large swath of available bandwidth and laser-shaped beam directionality in this frequency regime. However, the power consumption and complexity of wireless devices increase significantly with frequency. In this paper, we present LeakyScatter, the first backscatter system that enables directional, low-power, and frequency-agile wireless links above 100 GHz. LeakyScatter departs from conventional backscatter designs and introduces a novel architecture that relies on aperture reciprocity in leaky-wave devices. We have fabricated LeakyScatter and evaluated its performance through extensive simulations and over-the-air experiments. Our results demonstrate a scalable wireless link above 100 GHz that is retrodirective and operates at a large bandwidth (tens of GHz) and ultra-low-power (zero power consumed for directional steering and ≤ 1 mW for data modulation).

TLDR

LeakyScatter departs from conventional backscatter designs and introduces a novel architecture that relies on aperture reciprocity in leaky-wave devices and demonstrates a scalable wireless link above 100 GHz that is retrodirective and operates at a large bandwidth and ultra-low-power.

CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation

  • Abdullah Alomar, Pouya Hamadanian, Arash Nasr-Esfahany, A. Agarwal, MohammadIman Alizadeh, D. Shah

  • Symposium on Networked Systems Design and Implementation

  • January 5, 2022

We present CausalSim, a causal framework for unbiased trace-driven simulation. Current trace-driven simulators assume that the interventions being simulated (e.g., a new algorithm) would not affect the validity of the traces. However, real-world traces are often biased by the choices algorithms make during trace collection, and hence replaying traces under an intervention may lead to incorrect results. CausalSim addresses this challenge by learning a causal model of the system dynamics and latent factors capturing the underlying system conditions during trace collection. It learns these models using an initial randomized control trial (RCT) under a fixed set of algorithms, and then applies them to remove biases from trace data when simulating new algorithms. Key to CausalSim is mapping unbiased trace-driven simulation to a tensor completion problem with extremely sparse observations. By exploiting a basic distributional invariance property present in RCT data, CausalSim enables a novel tensor completion method despite the sparsity of observations. Our extensive evaluation of CausalSim on both real and synthetic datasets, including more than ten months of real data from the Puffer video streaming system shows it improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines. Moreover, CausalSim provides markedly different insights about ABR algorithms compared to the biased baseline simulator, which we validate with a real deployment.

TLDR

CausalSim improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines and provides markedly different insights about ABR algorithms compared to the biased baseline simulator, which is validated with a real deployment.

Learning to Communicate Effectively Between Battery-free Devices

  • Kai Geissdoerfer, Marco Zimmerling

  • Symposium on Networked Systems Design and Implementation

  • December 31, 2021

Successful wireless communication requires that sender and receiver are operational at the same time. This requirement is difficult to satisfy in battery-free networks, where the energy harvested from ambient sources varies across time and space and is often too weak to continuously power the devices. We present Bonito , the first connection protocol for battery-free systems that enables reliable and efficient bi-directional communication between intermittently powered nodes. We collect and analyze real-world energy-harvesting traces from five diverse scenarios involving solar panels and piezoelectric harvesters, and find that the nodes’ charging times approximately follow well-known distributions. Bonito learns a model of these distributions online and adapts the nodes’ wake-up times so that sender and receiver are operational at the same time, enabling successful communication. Experiments with battery-free prototype nodes built from off-the-shelf hardware components demonstrate that our design improves the average throughput by 10–80 × compared with the state of the art.

TLDR

Bonito is presented, the first connection protocol for battery-free systems that enables reliable and efficient bi-directional communication between intermittently powered nodes, and it is found that the nodes’ charging times approximately follow well-known distributions.

Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets

  • Hamid Ghasemirahni, Tom Barbette, Georgios P. Katsikas, Alireza Farshin, Amir Roozbeh, Massimo Girondi, Marco Chiesa, Gerald Maguire, Dejan Kostic

  • Symposium on Networked Systems Design and Implementation

  • December 31, 2021

Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system’s caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits. In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that ( i ) the performance of a variety of widely deployed applications degrades substantially with even the slightest lack of traffic locality, and ( ii ) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing µs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.

TLDR

Reframer is a software solution that deliberately delays packets and reorders them to increase traffic locality and it is shown that Reframer increases the throughput of a network service chain by up to 84% and reduces the completion time of a web server while improving its throughput by 20%.

OSDI

Ensō: A Streaming Interface for NIC-Application Communication

  • Hugo Sadok, Nirav Atre, Zhipeng Zhao, Daniel S. Berger, J. Hoe, Aurojit Panda, Justine Sherry, Ren Wang

  • USENIX Symposium on Operating Systems Design and Implementation

  • December 31, 2022

Today, most communication between the NIC and software involves exchanging fixed-size packet buffers. This packetized interface was designed for an era when NICs implemented few offloads and software implemented the logic for translating between application data and packets. However, both NICs and networked software have evolved: modern NICs implement hardware offloads, e.g., TSO, LRO, and serialization offloads that canmore efficiently translate between application data and packets. Furthermore, modern software increasingly batches network I/O to reduce overheads. These changes have led to a mismatch between the packetized interface, which assumes that the NIC and software exchange fixed-size buffers, and the features provided by modern NICs and used by modern software. This incongruence between interface and data adds software complexity and I/O overheads, which in turn limits communication performance. This paper proposes Ensō, a new streaming NIC-to-software interface designed to better support how NICs and software interact today. At its core, Ensō eschews fixed-size buffers, and instead structures communication as a stream that can be used to send arbitrary data sizes. We show that this change reduces software overheads, reduces PCIe bandwidth requirements, and leads to fewer cache misses. These improvements allow an Ensōbased NIC to saturate a 100Gbps link with minimum-sized packets (forwarding at 148.8Mpps) using a single core, improve throughput for high-performance network applications by 1.5– 6×, and reduce latency by up to 43%.

TLDR

Ens¯o, a new streaming NIC-to-software interface designed to better support how NICs and software interact today, is proposed, which eschews fixed-size buffers, and instead structures communication as a stream that can be used to send arbitrary data sizes.

MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime

  • Chenxi Wang, Haoran Ma, Shicheng Liu, Yifan Qiao, Jon Eyolfson, Christian Navasca, Shan Lu, G. Xu

  • USENIX Symposium on Operating Systems Design and Implementation

  • December 31, 2021

Far-memory techniques that enable applications to use remote memory are increasingly appealing in modern data cen-ters, supporting applications’ large memory footprint and improving machines’ resource utilization. Unfortunately, most far-memory techniques focus on OS-level optimizations and are agnostic to managed runtimes and garbage collections (GC) underneath applications written in high-level languages. With different object-access patterns from applications, GC can severely interfere with existing far-memory techniques, breaking remote memory prefetching algorithms and causing severe local-memory misses. We developed MemLiner, a runtime technique that improves the performance of far-memory systems by “lining up” memory accesses from the application and the GC so that they follow similar memory access paths, thereby (1) reducing the local-memory working set and (2) improving remote-memory prefetching through simplified memory access patterns. We implemented MemLiner in two widely-used GCs in OpenJDK: G1 and Shenandoah. Our evaluation with a range of widely-deployed cloud systems shows MemLiner improves applications’ end-to-end performance by up to 2.5 × .

TLDR

MemLiner is developed, a runtime technique that improves the performance of far-memory systems by “lining up” memory accesses from the application and the GC so that they follow similar memory access paths, thereby reducing the local-memory working set and improving remote-memory prefetching through simplified memory access patterns.

PLDI

Visualization question answering using introspective program synthesis

  • Yanju Chen, Xifeng Yan, Yu Feng

  • Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

  • June 9, 2022

While data visualization plays a crucial role in gaining insights from data, generating answers over complex visualizations from natural language questions is far from an easy task. Mainstream approaches reduce data visualization queries to a semantic parsing problem, which either relies on expensive-to-annotate supervised training data that pairs natural language questions with logical forms, or weakly supervised models that incorporate a larger corpus but fail on long-tailed queries without explanations. This paper aims to answer data visualization queries by automatically synthesizing the corresponding program from natural language. At the core of our technique is an abstract synthesis engine that is bootstrapped by an off-the-shelf weakly supervised model and an optimal synthesis algorithm guided by triangle alignment constraints, which represent consistency among natural language, visualization, and the synthesized program. Starting with a few tentative answers obtained from an off-the-shelf statistical model, our approach first involves an abstract synthesizer that generates a set of sketches that are consistent with the answers. Then we design an instance of optimal synthesis to complete one of the candidate sketches by satisfying common type constraints and maximizing the consistency among three parties, i.e., natural language, the visualization, and the candidate program. We implement the proposed idea in a system called Poe that can answer visualization queries from natural language. Our method is fully automated and does not require users to know the underlying schema of the visualizations. We evaluate Poe on 629 visualization queries and our experiment shows that Poe outperforms state-of-the-arts by improving the accuracy from 44% to 59%.

TLDR

This paper aims to answer data visualization queries by automatically synthesizing the corresponding program from natural language by using an abstract synthesis engine that is bootstrapped by an off-the-shelf weakly supervised model and an optimal synthesis algorithm guided by triangle alignment constraints.

Low-latency, high-throughput garbage collection

  • Wenyu Zhao, S. Blackburn, K. McKinley

  • Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

  • June 9, 2022

To achieve short pauses, state-of-the-art concurrent copying collectors such as C4, Shenandoah, and ZGC use substantially more CPU cycles and memory than simpler collectors. They suffer from design limitations: i) concurrent copying with inherently expensive read and write barriers, ii) scalability limitations due to tracing, and iii) immediacy limitations for mature objects that impose memory overheads. This paper takes a different approach to optimizing responsiveness and throughput. It uses the insight that regular, brief stop-the-world collections deliver sufficient responsiveness at greater efficiency than concurrent evacuation. It introduces LXR, where stop-the-world collections use reference counting (RC) and judicious copying. RC delivers scalability and immediacy, promptly reclaiming young and mature objects. RC, in a hierarchical Immix heap structure, reclaims most memory without any copying. Occasional concurrent tracing identifies cyclic garbage. LXR introduces: i) RC remembered sets for judicious copying of mature objects; ii) a novel low-overhead write barrier that combines coalescing reference counting, concurrent tracing, and remembered set maintenance; iii) object reclamation while performing a concurrent trace; iv) lazy processing of decrements; and v) novel survival rate triggers that modulate pause durations. LXR combines excellent responsiveness and throughput, improving over production collectors. On the widely-used Lucene search engine in a tight heap, LXR delivers 7.8× better throughput and 10× better 99.99% tail latency than Shenandoah. On 17 diverse modern workloads in a moderate heap, LXR outperforms OpenJDK’s default G1 on throughput by 4% and Shenandoah by 43%.

TLDR

LXR is introduced, where stop-the-world collections use reference counting (RC) and judicious copying, where LXR combines excellent responsiveness and throughput, improving over production collectors.

Synthesizing analytical SQL queries from computation demonstration

  • Xiangyu Zhou, R. Bodík, Alvin Cheung, Chenglong Wang

  • Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

  • April 14, 2022

Analytical SQL is widely used in modern database applications and data analysis. However, its partitioning and grouping operators are challenging for novice users. Unfortunately, programming by example, shown effective on standard SQL, are less attractive because examples for analytical queries are more laborious to solve by hand. To make demonstrations easier to author, we designed a new end-user specification, programming by computation demonstration, that allows the user to demonstrate the task using a (possibly incomplete) cell-level computation trace. This specification is exploited in a new abstraction-based synthesis algorithm to prove that a partially formed query cannot be completed to satisfy the specification, allowing us to prune the search tree. We implemented our approach in a tool named Sickle and tested it on 80 real-world analytical SQL tasks. Results show that even from small demonstrations, Sickle can solve 76 tasks, in 12.8 seconds on average, while the prior approaches can solve only 60 tasks and are on average 22.5 times slower. Furthermore, our user study with 13 participants reveals that our specification increases user efficiency and confidence on challenging tasks.

TLDR

A new end-user specification, programming by computation demonstration, is designed that allows the user to demonstrate the task using a (possibly incomplete) cell-level computation trace and is exploited in a new abstraction-based synthesis algorithm to prove that a partially formed query cannot be completed to satisfy the specification.

Kleene algebra modulo theories: a framework for concrete KATs

  • Ryan Beckett, E. Campbell, M. Greenberg

  • Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

  • July 10, 2017

Kleene algebras with tests (KATs) offer sound, complete, and decidable equational reasoning about regularly structured programs. Interest in KATs has increased greatly since NetKAT demonstrated how well extensions of KATs with domain-specific primitives and extra axioms apply to computer networks. Unfortunately, extending a KAT to a new domain by adding custom primitives, proving its equational theory sound and complete, and coming up with an efficient implementation is still an expert’s task. Abstruse metatheory is holding back KAT’s potential. We offer a fast path to a “minimum viable model” of a KAT, formally or in code through our framework, Kleene algebra modulo theories (KMT). Given primitives and a notion of state, we can automatically derive a corresponding KAT’s semantics, prove its equational theory sound and complete with respect to a tracing semantics (programs are denoted as traces of states), and derive a normalization-based decision procedure for equivalence checking. Our framework is based on pushback, a generalization of weakest preconditions that specifies how predicates and actions interact. We offer several case studies, showing tracing variants of theories from the literature (bitvectors, NetKAT) along with novel compositional theories (products, temporal logic, and sets). We derive new results over unbounded state, reasoning about monotonically increasing, unbounded natural numbers. Our OCaml implementation closely matches the theory: users define and compose KATs with the module system.

TLDR

This work offers a fast path to a “minimum viable model” of a KAT, formally or in code through their framework, Kleene algebra modulo theories (KMT), and offers several case studies, showing tracing variants of theories from the literature along with novel compositional theories.

PODS
S&P

WaVe: a verifiably secure WebAssembly sandboxing runtime

  • Evan Johnson, Evan Laufer, Stanford, Zijie Zhao, S. Savage, D. Stefan, Ucsd, Fraser Brown

  • December 31, 2021

The promise of software sandboxing is flexible, fast and portable isolation; capturing the benefits of hardware-based memory protection without requiring operating system involvement. This promise is reified in WebAssembly (Wasm), a popular portable bytecode whose compilers automatically insert runtime checks to ensure that data and control flow are constrained to a single memory segment. Indeed, modern compiled Wasm implementations have advanced to the point where these checks can themselves be verified, removing the compiler from the trusted computing base. However, the resulting integrity properties are only valid for code executing strictly inside the Wasm sandbox. Any interactions with the runtime system, which manages sandboxes and exposes the WebAssembly System Interface (WASI) used to access operating system resources, operate outside this contract. The resulting conundrum is how to maintain Wasm’s strong isolation properties while still allowing such programs to interact with the outside world (i.e., with the file system, the network, etc.). Our paper presents a solution to this problem, via WaVe, a verified secure runtime system that implements WASI. We mechanically verify that interactions with WaVe (including OS side effects) not only maintain Wasm’s memory safety guarantees, but also maintain access isolation for the host OS’s storage and network resources. Finally, in spite of completely removing the runtime from the trusted computing base, we show that WaVe offers performance competitive with existing industrial (yet unsafe) Wasm runtimes.

TLDR

This paper mechanically verify that interactions with WaVe not only maintain Wasm’s memory safety guarantees, but also maintain access isolation for the host OS’'s storage and network resources.

Characterizing Everyday Misuse of Smart Home Devices

  • Phoebe Moh, †. PubaliDatta, N. Warford, †. AdamBates, Nathan Malkin, Michelle L. Mazurek

  • How smart devices are misused — used without permission in a manner that causes harm — by device owners’ everyday associates such as friends, family, and romantic partners is explored.

  • December 31, 2021

TLDR

Exploration of Internet of Things (IoT) security often focuses on threats posed by external and technically-skilled attackers. While it is important to understand these most extreme cases, it is equally important to understand the most likely risks of harm posed by smart device ownership. In this paper, we explore how smart devices are misused — used without permission in a manner that causes harm — by device owners’ everyday associates such as friends, family, and romantic partners. In a preliminary characterization survey ( n = 100 ), we broadly capture the kinds of unauthorized use and misuse incidents participants have experienced or engaged in. Then, in a prevalence survey ( n = 483 ), we assess the prevalence of these incidents in a demographically-representative population. Our findings show that unauthorized use of smart devices is widespread (experienced by 43% of participants), and that misuse is also common (experienced by at least 19% of participants). However, highly individual factors determine whether these unauthorized use events constitute misuse. Through a focus on everyday abuses, this work sheds light on the most prevalent security and privacy threats faced by smart-home owners today.

Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions

  • H. Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, R. Karri

  • 2022 IEEE Symposium on Security and Privacy (SP)

  • August 20, 2021

There is burgeoning interest in designing AI-based systems to assist humans in designing computing systems, including tools that automatically generate computer code. The most notable of these comes in the form of the first self-described ‘AI pair programmer’, GitHub Copilot, which is a language model trained over open-source GitHub code. However, code often contains bugs—and so, given the vast quantity of unvetted code that Copilot has processed, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot’s code contributions. In this work, we systematically investigate the prevalence and conditions that can cause GitHub Copilot to recommend insecure code. To perform this analysis we prompt Copilot to generate code in scenarios relevant to high-risk cybersecurity weaknesses, e.g. those from MITRE’s “Top 25” Common Weakness Enumeration (CWE) list. We explore Copilot’s performance on three distinct code generation axes—examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains. In total, we produce 89 different scenarios for Copilot to complete, producing 1,689 programs. Of these, we found approximately 40% to be vulnerable.

TLDR

This work systematically investigates the prevalence and conditions that can cause GitHub Copilot to recommend insecure code, and explores Copilot’s performance on three distinct code generation axes—examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains.

SIGCOMM

Software-defined network assimilation: bridging the last mile towards centralized network configuration management with Nassim

  • Huangxun Chen, Yukai Miao, Li Chen, Haifeng Sun, Hong Chao Xu, Libin Liu, Gong Zhang, Wei Wang

  • Proceedings of the ACM SIGCOMM 2022 Conference

  • August 22, 2022

On-boarding new devices into an existing SDN network is a pain for network operations (NetOps) teams, because much expert effort is required to bridge the gap between the configuration models of the new devices and the unified data model in the SDN controller. In this work, we present an assistant framework NAssim, to help NetOps accelerate the process of assimilating a new device into a SDN network. Our solution features a unified parser framework to parse diverse device user manuals into preliminary configuration models, a rigorous validator that confirm the correctness of the models via formal syntax analysis, model hierarchy validation and empirical data validation, and a deep-learning-based mapping algorithm that uses state-of-the-art neural language processing techniques to produce human-comprehensible recommended mapping between the validated configuration model and the one in the SDN controller. In all, NAssim liberates the NetOps from most tedious tasks by learning directly from devices' manuals to produce data models which are comprehensible by both the SDN controller and human experts. Our evaluation shows, NAssim can accelerate the assimilation process by 9.1x. In this process, we also identify and correct 243 errors in four mainstream vendors' device manuals, and release a validated and expert-curated dataset of parsed manual corpus for future research.

TLDR

This work presents an assistant framework NAssim, to help NetOps accelerate the process of assimilating a new device into a SDN network, and identifies and correct 243 errors in four mainstream vendors' device manuals.

SIGMETRICS

Mean-field Analysis for Load Balancing on Spatial Graphs

  • Daan Rutten, Debankur Mukherjee

  • Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

  • January 9, 2023

A pivotal methodological tool behind the analysis of large-scale load balancing systems is mean-field analysis. The high-level idea is to represent the system state by aggregate quantities and characterize their rate of change as the system size grows large. An assumption for the above scheme to work is that the aggregate quantity is Markovian such that its rate of change can be expressed as a function of its current state. If the aggregate quantity is not Markovian, not only does this technique break down, the mean-field approximation may even turn out to be highly inaccurate. In load balancing systems, if servers are exchangeable, then the aggregate quantity is indeed Markovian. However, the growing heterogeneity in the types of tasks processed by modern data centers has recently motivated the research community to consider systems beyond the exchangeability assumption. The main reason stems from data locality, i.e., the fact that servers need to store resources to process tasks of a particular type locally and have only limited storage space. An emerging line of work thus considers a bipartite graph between task types and servers [2, 3, 5 -7]. In this compatibility graph, an edge between a server and a task type represents the server's ability to process these tasks. In practice, storage capacity or geographical constraints force a server to process only a small subset of all task types, leading to sparse network topologies. This motivates the study of load balancing in systems with suitably sparse bipartite compatibility graphs.

TLDR

The growing heterogeneity in the types of tasks processed by modern data centers has recently motivated the research community to consider systems beyond the exchangeability assumption, and an emerging line of work considers a bipartite graph between task types and servers.

WISEFUSE: Workload Characterization and DAG Transformation for Serverless Workflows

  • Ashraf Y. Mahgoub, Edgardo Barsallo Yi, Karthick Shankar, Eshaan Minocha, S. Elnikety, S. Bagchi, S. Chaterji

  • Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

  • June 6, 2022

We characterize production workloads of serverless DAGs at a major cloud provider. Our analysis highlights two major factors that limit performance: (a) lack of efficient communication methods between the serverless functions in the DAG, and (b) stragglers when a DAG stage invokes a set of parallel functions that must complete before starting the next DAG stage. To address these limitations, we propose WISEFUSE, an automated approach to generate an optimized execution plan for serverless DAGs for a user-specified latency objective or $ budget. We introduce three optimizations: (1) Fusion combines in-series functions together in a single VM to reduce the communication overhead between cascaded functions. (2) Bundling executes a group of parallel invocations of a function in one VM to improve resource sharing among the parallel workers to reduce skew. (3) Resource Allocation assigns the right VM size to each function or function bundle in the DAG to reduce the E2E latency and cost. We implement WISEFUSE to evaluate it experimentally using three popular serverless applications with different DAG structures, memory footprints, and intermediate data sizes. Compared to competing approaches and other alternatives, WISEFUSE shows significant improvements in E2E latency and cost. Specifically, for a machine learning pipeline, WISEFUSE achieves P95 latency that is 67% lower than Photons, 39% lower than Faastlane, and 90% lower than SONIC without increasing the $ cost.

TLDR

This work proposes WISEFUSE, an automated approach to generate an optimized execution plan for serverless DAGs for a user-specified latency objective or budget and implements it experimentally, showing significant improvements in E2E latency and cost.

WWW

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

  • C. Hays, Zachary Schutzman, Manish Raghavan, Erin Walk, Philipp Zimmer

  • Proceedings of the ACM Web Conference 2023

  • January 17, 2023

Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection. These tools employ machine learning and often achieve near-perfect performance for classification on existing datasets, suggesting bot detection is accurate, reliable and fit for use in downstream applications. We provide evidence that this is not the case and show that high performance is attributable to limitations in dataset collection and labeling rather than sophistication of the tools. Specifically, we show that simple decision rules — shallow decision trees trained on a small number of features — achieve near-state-of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets. Our findings reveal that predictions are highly dependent on each dataset’s collection and labeling procedures rather than fundamental differences between bots and humans. These results have important implications for both transparency in sampling and labeling procedures and potential biases in research using existing bot detection tools for pre-processing.

TLDR

It is shown that simple decision rules — shallow decision trees trained on a small number of features — achieve near-state- of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets.

Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways

  • Francesco Fabbri, Yanhao Wang, F. Bonchi, C. Castillo, M. Mathioudakis

  • Proceedings of the ACM Web Conference 2022

  • February 1, 2022

Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards more and more radicalized content, eventually being trapped in what we call a “radicalization pathway”. In this paper, we study the problem of mitigating radicalization pathways using a graph-based approach. Specifically, we model the set of recommendations of a “what-to-watch-next” recommender as a d-regular directed graph where nodes correspond to content items, links to recommendations, and paths to possible user sessions. We measure the “segregation” score of a node representing radicalized content as the expected length of a random walk from that node to any node representing non-radicalized content. High segregation scores are associated to larger chances to get users trapped in radicalization pathways. Hence, we define the problem of reducing the prevalence of radicalization pathways by selecting a small number of edges to “rewire”, so to minimize the maximum of segregation scores among all radicalized nodes, while maintaining the relevance of the recommendations. We prove that the problem of finding the optimal set of recommendations to rewire is NP-hard and NP-hard to approximate within any factor. Therefore, we turn our attention to heuristics, and propose an efficient yet effective greedy algorithm based on the absorbing random walk theory. Our experiments on real-world datasets in the context of video and news recommendations confirm the effectiveness of our proposal.

TLDR

This paper models the set of recommendations of a “what-to-watch-next” recommender as a d-regular directed graph where nodes correspond to content items, links to recommendations, and paths to possible user sessions, and proposes an efficient yet effective greedy algorithm based on the absorbing random walk theory.

Databases

SIGMOD

Predicate Pushdown for Data Science Pipelines

  • Cong Yan, Yin Lin, Yeye He

  • Proceedings of the ACM on Management of Data

  • June 13, 2023

Predicate pushdown is a widely adopted query optimization. Existing systems and prior work mostly use pattern-matching rules to decide when a predicate can be pushed through certain operators like join or groupby. However, challenges arise in optimizing for data science pipelines due to the widely used non-relational operators and user-defined functions (UDF) that existing rules would fail to cover. In this paper, we present MagicPush, which decides predicate pushdown using a search-verification approach.MagicPush searches for candidate predicates on pipeline input, which is often not the same as the predicate to be pushed down, and verifies that the pushdown does not change pipeline output with full correctness guarantees. Our evaluation on TPC-H queries and 200 real-world pipelines sampled from GitHub Notebooks shows that MagicPush substantially outperforms a strong baseline that uses a union of rules from prior work - it is able to discover new pushdown opportunities and better optimize 42 real-world pipelines with up to 99% reduction in running time, while discovering all pushdown opportunities found by the existing baseline on remaining cases.

TLDR

This paper presents MagicPush, which decides predicate pushdown using a search-verification approach, and is able to discover new pushdown opportunities and better optimize 42 real-world pipelines with up to 99% reduction in running time.

Detecting Logic Bugs of Join Optimizations in DBMS

  • Xiu Tang, Sai Wu, Dongxiang Zhang, F. Li, Gang Chen

  • Proceedings of the ACM on Management of Data

  • May 30, 2023

Generation-based testing techniques have shown their effectiveness in detecting logic bugs of DBMS, which are often caused by improper implementation of query optimizers. Nonetheless, existing generation-based debug tools are limited to single-table queries and there is a substantial research gap regarding multi-table queries with join operators. In this paper, we propose TQS, a novel testing framework targeted at detecting logic bugs derived by queries involving multi-table joins. Given a target DBMS, TQS achieves the goal with two key components: Data-guided Schema and Query Generation (DSG) and Knowledge-guided Query Space Exploration (KQE). DSG addresses the key challenge of multi-table query debugging: how to generate ground-truth (query, result) pairs for verification. It adopts the database normalization technique to generate a testing schema and maintains a bitmap index for result tracking. To improve debug efficiency, DSG also artificially inserts some noises into the generated data. To avoid repetitive query space search, KQE forms the problem as isomorphic graph set discovery and combines the graph embedding and weighted random walk for query generation. We evaluated TQS on four popular DBMSs: MySQL, MariaDB, TiDB and PolarDB. Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems. It successfully detected 115 bugs within 24 hours, including 31 bugs in MySQL, 30 in MariaDB, 31 in TiDB, and 23 in PolarDB respectively.

TLDR

Experimental results show that TQS is effective in finding logic bugs of join optimization in database management systems, and successfully detected 115 bugs within 24 hours.

PG-Schema: Schemas for Property Graphs

  • A. Bonifati, Stefania Dumbrava, G. Fletcher, J. Hidders, Bei Li, L. Libkin, W. Martens, Filip Murlak, Stefan Plantikow, Ognjen Savkovi'c, Juan Sequeda, S. Staworko, Dominik Tomaszuk, H. Voigt, Domagoj Vrgovc, Mingxi Wu

  • Proceedings of the ACM on Management of Data

  • November 20, 2022

Property graphs have reached a high level of maturity, witnessed by multiple robust graph database systems as well as the ongoing ISO standardization effort aiming at creating a new standard Graph Query Language (GQL). Yet, despite documented demand, schema support is limited both in existing systems and in the first version of the GQL Standard. It is anticipated that the second version of the GQL Standard will include a rich DDL. Aiming to inspire the development of GQL and enhance the capabilities of graph database systems, we propose PG-Schema, a simple yet powerful formalism for specifying property graph schemas. It features PG-Schema with flexible type definitions supporting multi-inheritance, as well as expressive constraints based on the recently proposed PG-Keys formalism. We provide the formal syntax and semantics of PG-Schema, which meet principled design requirements grounded in contemporary property graph management scenarios, and offer a detailed comparison of its features with those of existing schema languages and graph database systems.

TLDR

PG-Schema is proposed, a simple yet powerful formalism for specifying property graph schemas that meets principled design requirements grounded in contemporary property graph management scenarios, and a detailed comparison of its features with those of existing schema languages and graph database systems is offered.

R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys

  • Wei Dong, Juanru Fang, K. Yi, Yuchao Tao, Ashwin Machanavajjhala

  • Proceedings of the 2022 International Conference on Management of Data

  • June 10, 2022

Answering SPJA queries under differential privacy (DP), including graph pattern counting under node-DP as an important special case, has received considerable attention in recent years. The dual challenge of foreign-key constraints and self-joins is particularly tricky to deal with, and no existing DP mechanisms can correctly handle both. For the special case of graph pattern counting under node-DP, the existing mechanisms are correct (i.e., satisfy DP), but they do not offer nontrivial utility guarantees or are very complicated and costly. In this paper, we propose the first DP mechanism for answering arbitrary SPJA queries in a database with foreign-key constraints. Meanwhile, it achieves a fairly strong notion of optimality, which can be considered as a small and natural relaxation of instance optimality. Finally, our mechanism is simple enough that it can be easily implemented on top of any RDBMS and an LP solver. Experimental results show that it offers order-of-magnitude improvements in terms of utility over existing techniques, even those specifically designed for graph pattern counting.

TLDR

This paper proposes the first DP mechanism for answering arbitrary SPJA queries in a database with foreign-key constraints, and shows that it offers order-of-magnitude improvements in terms of utility over existing techniques, even those specifically designed for graph pattern counting.

VLDB

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

  • Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chauduri

  • Proceedings of the VLDB Endowment

  • July 1, 2023

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables "in the wild". Our survey of real spreadsheet-tables and web-tables shows that over 30% of such tables do not conform to the relational standard, for which complex table-restructuring transformations are needed before these tables can be queried easily using SQL-based tools. Unfortunately, the required transformations are non-trivial to program, which has become a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Tableau forums. We develop an Auto-Tables system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages), to transform non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations. We compile an extensive benchmark for this new task, by collecting 244 real test cases from user spreadsheets and online forums. Our evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.

TLDR

An Auto-Tables system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages), to transform non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations.

Latest News & Updates

Case Study: Iterative Design for Skimming Support

Case Study: Iterative Design for Skimming Support

How might we help researchers quickly assess the relevance of scientific literature? Take a closer look at Skimming, Semantic Reader’s latest AI feature, and the collaborative design process behind it.

Behind the Scenes of Semantic Scholar’s New Author Influence Design

Behind the Scenes of Semantic Scholar’s New Author Influence Design

We released a new version of Author Influence interface to help scholars better discover other scholars in their fields. Here's how we identified user insights and made those design choices.

Artificial-intelligence search engines wrangle academic literature

Artificial-intelligence search engines wrangle academic literature

Nature had a chat with Dan Weld, Chief Scientist at Semantic Scholar, to discuss how search engines are helping scientists explore and innovate by making it easier to draw connections from a massive collection of scientific literature.

Experience a smarter way to search and discover scholarly research.

Create Your Account