当前位置:   article > 正文

[论文阅读] (18)英文论文Model Design和Overview如何撰写及精句摘抄——以系统AI安全顶会为例_overview怎么写



前一篇介绍CCS2019的Powershell去混淆工作,这篇文章质量非常高,来自于浙江大学的李振源老师。这篇文章将从个人角度介绍英文论文模型设计(Model Design)和概述(Overview)如何撰写,并以系统AI安全的顶会论文为例。一方面自己英文太差,只能通过最土的办法慢慢提升,另一方面是自己的个人学习笔记,并分享出来希望大家批评和指正。希望这篇文章对您有所帮助,这些大佬是真的值得我们去学习,献上小弟的膝盖~fighting!


由于作者之前做NLP和AI,现在转安全方向,因此本文选择的论文主要为近四年篇AI安全和系统安全的四大顶会(S&P、USENIX Sec、CCS、NDSS)。同时,作者能力有限,只能结合自己的实力和实际阅读情况出发,也希望自己能不断进步,每个部分都会持续补充。可能五年十年后,也会详细分享一篇英文论文如何撰写,目前主要以学习和笔记为主。大佬还请飘过O(∩_∩)O





该部分回顾和参考周老师的博士课程内容,感谢老师的分享。典型的论文框架包括两种(The typical “anatomy” of a paper),如下所示:


  • Title and authors
  • Abstract
  • Introduction
  • Related Work (可置后)
  • Materials and Methods
  • Results
  • Acknowledgements
  • References


  • Title and authors
  • Abstract
  • Introduction
  • Related Work (可置后)
  • System Model
  • Mathematics and algorithms
  • Experiments
  • Acknowledgements
  • References

System Model(系统模型)

  • 应该足够详细,让另一个科学家来理解这个问题
  • Objective(客观)
  • Constraints(约束)
  • Difficulties and Challenges

Mathematics and algorithms(算法)

  • 这些部分是论文的技术核心
  • 通过阅读高质量论文来学习对应的框架


  • restate unclear points in your own words
  • fill in missing details (assumptions, algebraic steps, proofs, pseudocode)
  • annotate mathematical objects with their types
  • come up with examples that illustrate the author’s ideas, and examples that would be problematic for the author
  • draw connections to other methods and problems you know about
  • ask questions about things that aren’t stated or that don’t make sense
  • challenge the paper’s claims or methods dream up followup work that you (or someone) should do






  • 抓住重点。初学者容易花最多的篇幅写自己实验的预处理、数据筛选等。其实其中大量的细节可以笼统地写,或者放到补充材料中。如果文章中这些无助于读者理解科学发现的细节过多,必然分散读者的注意力,消耗他们过多的能量,从而影响他们对文章最重要部分的理解。同时,该写的内容没有写清,不该写的内容却写得太多。


  • 对于一篇文章来说,不管语法多正确,用词多精确,句子多有逻辑,如果没有把故事讲好,它很难成为一篇好的文章。故事是整篇文章的灵魂,它奠定了文章的主要贡献和创新点。大纲是一篇文章的骨架,大纲的写法能将故事呈现出来,它决定了整篇文章的组织架构的合理性和逻辑性。


在评估一个科研成果的科学价值时,最重要的是创新性和研究意义。创新性是指研究者不是单纯地跟随或重复别人的研究,而是有自己的独到的新贡献。据说,研究要经历三个阶段:“me too”、“me better”、“me only”。同样,创新性也可以套用三个阶段描述。

  • 旧范式新条件
    发现某一现象在不同条件下有不一样的表现,这类创新多体现于用了新样本,结语“me too”和“me better”之间
  • 旧范式新技术
    属于“me better”级别创新,新方法比原来方法好
  • 新的数据处理方法
    统计学和人工智能发展催生很多新的数据处理方法,用这些模型处理旧数据得出新结论,解决新问题,属于“me better”级别创新
  • 新范式
    用一种全新的方法来研究问题,介于“me better”和“me only”之间
  • 新问题
    提出新问题的研究属于“me only”层面,提出一个全新问题的同时往往伴随着一种新的研究方法,提出的新问题的价值应该是非常重要的和需要被论证的,当然也存在风险






  • Title and authors
  • Abstract
  • Introduction
  • Related Work (可置后)
  • System Model
  • Mathematics and algorithms
  • Experiments
  • Acknowledgements
  • References



  • 首先,阅读大量相关方向的论文,只有多读才会写,然后总结现有方法的优缺点,找到你需要解决的问题或方法(idea难),如果能发现新问题并提出解决方法最好。
  • 其次,我会结合自己的方法进行简单实验,实验证明成功之后构建论文或模型的框架(论文的“龙骨”),接着也会融入一些类似于剪枝的细节处理算法。
  • 接着,我会叙述模型的整体框架,可以将Overview放到Model Design第一部分叙述,也可以置于前作为一大块叙述。
  • 再次,根据整体框架分别实现各个部分,以深度学习为例,通常包括数据采集、数据预处理、特征选择、模型构建、分类任务等。最重要的是你提出的算法或具有贡献的部分应该进行详细描述,通过算法、公式或图表,当然也包括一些约束。
  • 最后,整个故事应该围绕文章贡献、观点及实验叙述,更好地突出论文的卖点。


(1) Chuanpu Fu, et al. Realtime Robust Malicious Traffic Detection via Frequency Domain Analysis. CCS21. (频域恶意流量入侵检测)


  • 4.1 Frequency Feature Extraction Module
  • 4.2 Automatic Parameters Selection Module
  • 4.3 Statistical Clustering Module



(2) Yuankun Zhu, et al. Hermes Attack: Steal DNN Models with Lossless Inference Accuracy. USENIX Sec21. (DNN模型攻击&推理信息)

3 Attack Design

  • 3.1 Overview
  • 3.2 Traffic Processing
  • 3.3 Extraction
    – 3.3.1 Header Extraction
    – 3.3.2 Command Extraction
  • 3.4 Reconstruction
    – 3.4.1 Semantic Reconstruction
    – 3.4.2 Model Reconstruction


(3) Xuezixiang Li, et al. PalmTree: Learning an Assembly Language Model for Instruction Embedding, CCS21 (Bert预训练指令向量)


  • 3.1 Overview
  • 3.2 Input Generation
  • 3.3 Tokenization
  • 3.4 Assembly Language Model
    – 3.4.1 PalmTree model
    – 3.4.2 Training task 1: Masked Language Model
    – 3.4.3 Training task 2: Context Window Prediction
    – 3.4.4 Training task 3: Def-Use Prediction
    – 3.4.5 Instruction Representation
    – 3.4.6 Deployment of the model


(4) Jinfeng Li, et al. TextShield: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation, USENIX Sec20. (文本对抗样本,多模态和实验值得我学习)

3 Design of TEXTSHIELD

  • 3.1 Problem Definition and Threat Model
  • 3.2 Overview of TEXTSHIELD Framework
  • 3.3 Adversarial Translation
  • 3.4 Multimodal Embedding
  • 3.5 Multimodal Fusion


(5) Xueyuan Han, et al. UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats, NDSS20. (溯源图检测APT)


  • A. Provenance Graph
  • B. Constructing Graph Histograms
  • C. Generating Graph Sketches
  • D. Learning Evolutionary Models
  • E. Detecting Anomalies


(6) Zhenyuan Li, et al. Effective and Light-Weight Deobfuscation and Semantic-Aware Attack Detection for PowerShell Scripts, CCS19. (Powershell解混淆)


  • 4.1 Subtree-based Deobfuscation Approach Overview
  • 4.2 Extract Suspicious Subtrees
  • 4.3 Subtree-based Obfuscation Detection
  • 4.4 Emulation-based Recovery
  • 4.5 AST Update
  • 4.6 Post processing












二.Model Design撰写及精句


  • 一幅优美的顶会论文框架图
  • 论文整体架构的英文描述(Overview)
  • 论文撰写的前后关联及转折关键词
  • 给自己的模型或系统取一个恰当的名字,论文描述也方便
  • 深度学习与系统安全结合的论文方向


引入主要是各个部分(Section)详细介绍前,会说明该部分的相关分布情况。其描述大同小异,通常介绍各个部分的内容即可“In this section, we present…”。此外,部分论文也会通过“To tackle the challenges mentioned above”引入。下面给出12个示例,来源于模型设计或相关工作部分。

  • (1) We next discuss in detail our proposed euphemism detection approach in Section IV-A and the proposed euphemism identification approach in Section IV-B.

  • (2) In this section, we present the design details of Whisper, i.e., the design of three main modules in Whisper.

  • (3) In this section, we present the design details of DEEPNOISE. First, we show the overall workflow of DEEPNOISE. Then, we present the details of each component of DEEPNOISE.

  • (4) In this section, we first introduce audit log analysis and its challenges with a motivating example. We then analyze the problem of behavior abstraction with our insights, as well as describing the threat model.

  • (5) In this section, we briefly summarize the related similarity-based phishing detection approaches, then we introduce our threat model.

  • (6) In this section, we introduce the stealthy malware we focus on in this study and present our insights of using provenance analysis to detect such malware.

  • (7) In this section, we detail the pipeline of DEEPREFLECT as well as the features and models it uses.

  • (8) In this section, we first present our threat model and then describe overarching assumptions and principles used throughout the paper.

  • (9) In this section, we describe our process for curating this ground truth dataset and the features that we use for our classifier. We then present the accuracy of the resulting classifier and how we use it to measure the phenomenon and abuse of unintended URLs in the wild. Our analysis pipeline is illustrated in Figure 3.

  • (10) In this section, we introduce the methodology of exercising text inputs for Android apps. We start from introducing the workflow of TextExerciser and then present each phase of TextExerciser individually.

  • (11) To tackle the challenges mentioned above, we develop an automated tool VIEM by combining and customizing a set of state-of-the-art natural language processing (NLP) techniques. In this section, we briefly describe the design of VIEM and discuss the reasons behind our design. Then, we elaborate on the NLP techniques that VIEM adopts.

  • (12) In this section, we detail the ATLAS architecture introduced in Figure 3. We start with an audit log pre-processing phase that constructs and optimizes the causal graph for scalable analysis (Sec. 4.1). We then present a sequence construction and learning phase that constructs attack and non-attack sequences for model learning (Sec. 4.2). Lastly, we present an attack investigation phase that uses the model to identify attack entities, which helps build the attack story (Sec. 4.3).



Hermes Attack(USENIX Sec21)

[1] Yuankun Zhu, et al. Hermes Attack: Steal DNN Models with Lossless Inference Accuracy. USENIX Sec21.

Attack Overview. The methodology of our attack can be divided into two phases: offline phase and online phase. During the offline phase, we use white-box models to build a database with the identified command headers, the mappings between GPU kernel (binaries) and DNN layer types, and the mappings between GPU kernels and offsets of hyperparameters. Specifically, the traffic processing module ( ① in Figure 5) sorts the out-of-order PCIe packets intercepted by PCIe snooping device. The extraction module ( ② ) has two sub-modules: header extraction module and command extraction module. The header extraction module extracts command headers from the sorted PCIe packets (Section 3.3.1). The extracted command headers will be stored in the database, accelerating command extraction in the online phase. The command extraction module in the offline phase helps get the kernel binaries (Section 3.3.2). The semantic reconstruction module within the reconstruction module ( ③ ) takes the inputs from the command extraction module and the GPU profiler to create the mappings between the kernel (binary) and the layer type, as well as the mappings between the kernel and the offset of hyper-parameters, facilitating the module reconstruction in the online phase (Section 3.4.1).

During the online phase, the original (victim) model is used for inference on a single image. The victim model is a black-box model and thoroughly different from the white-box models used in the offline phase. PCIe traffics are intercepted and sorted by the traffic processing module. The command extraction module ( ② ) extracts K (kernel launch related) and D (data movement related) commands as well as the GPU kernel binaries, using the header information profiled from the offline phase (Section 3.3.2). The entire database are feed to the model reconstruction module ( ③ ) to fully reconstruct architecture, hyper-parameters, and parameters (Section 3.4.2). All these steps need massive efforts of reverse engineering.


  • 段落优点:该工作是新型DNN模型提取攻击,在Overview中将框架分为两部分,分别进行描述,同时框架图中标注对应的模块,结合内容标注描述实现过程。


[2] Xuezixiang Li, et al. PalmTree: Learning an Assembly Language Model for Instruction Embedding, CCS21.

To meet the challenges summarized in Section 2, we propose PalmTree, a novel instruction embedding scheme that automatically learns a language model for assembly code. PalmTree is based on BERT [9], and incorporates the following important design considerations.

First of all, to capture the complex internal formats of instructions, we use a fine-grained strategy to decompose instructions: we consider each instruction as a sentence and decompose it into basic tokens. Then, in order to train the deep neural network to understand the internal structures of instructions, we make use of a recently proposed training task in NLP to train the model: Masked Language Model (MLM) [9]. This task trains a language model to predict the masked (missing) tokens within instructions.

Moreover, we would like to train this language model to capture the relationships between instructions. To do so, we design a training task, inspired by word2vec [28] and Asm2Vec [10], which attempts to infer the word/instruction semantics by predicting two instructions’ co-occurrence within a sliding window in control flow. We call this training task Context Window Prediction (CWP), which is based on Next Sentence Prediction (NSP) [9] in BERT. Essentially, if two instructions i and j fall within a sliding window in control flow and i appears before j, we say i and j have a contextual relation. Note that this relation is more relaxed than NSP, where two sentences have to be next to each other. We make this design decision based on our observation described in Section 2.2.2: instructions may be reordered by compiler optimizations, so adjacent instructions might not be semantically related.


Furthermore, unlike natural language, instruction semantics are clearly documented. For instance, the source and destination operands for each instruction are clearly stated. Therefore, the data dependency (or def-use relation) between instructions is clearly specified and will not be tampered by compiler optimizations. Based on these facts, we design another training task called Def-Use Prediction (DUP) to further improve our assembly language model. Essentially, we train this language model to predict if two instructions have a def-use relation.

Figure 1 presents the design of PalmTree. It consists of three components: Instruction Pair Sampling, Tokenization, and Language Model Training. The main component (Assembly Language Model) of the system is based on the BERT model [9]. After the training process, we use mean pooling of the hidden states of the second last layer of the BERT model as instruction embedding. The Instruction Pair Sampling component is responsible for sampling instruction pairs from binaries based on control flow and def-use relations.

In Section 3.2, we introduce how we construct two kinds of instruction pairs. In Section 3.3, we introduce our tokenization process. Then, we introduce how we design different training tasks to pre-train a comprehensive assembly language model for instruction embedding in Section 3.4.

  • 段落优点:本文提出对一种名为PalmTree的预训练汇编语言模型,通过在大规模无标记二进制语料库中自监督训练来生成通用指令Embedding。引入Bert预训练进行指令嵌入,结合解决挑战来描述Overview,同时段落递进描述得比较好。

TextShield(USENIX Sec20)

[3] Jinfeng Li, et al. TextShield: Robust Text Classification Based on Multimodal Embedding and Neural Machine Translation, USENIX Sec20.

We present the framework overview of TEXTSHIELD in Fig.1, which is built upon multimodal embedding, multimodal fusion and NMT. Generally, we first feed each text into an NMT model trained with a plenty of adversarial–benign text pairs for adversarial correction. Then, we input the corrected text into the DLTC model for multimodal embedding to extract features from semantic-level, glyph-level and phoneticlevel. Finally, we use a multimodal fusion scheme to fuse the extracted features for the following regular classifications. Below, we will elaborate on each of the backbone techniques.

  • 3.3 Adversarial Translation
  • 3.4 Multimodal Embedding
  • 3.5 Multimodal Fusion


Since the variation strategies adopted by malicious users in the real scenarios are mainly concentrated on glyph-based and phonetic-based perturbations [47], we therefore dedicatedly propose three embedding methods across different modalities to handle the corresponding variation types, i.e., semantic embedding, glyph embedding and phonetic embedding. They are also dedicatedly designed to deal with the sparseness and diversity unique to Chinese adversarial perturbations.

Since multiple modalities can provide more valuable information than a single one by describing the same content in various ways, it is highly expected to learn effective joint representation by fusing the features of different modalities. Therefore, after multimodal embedding, we first fuse the features extracted from different modalities by multimodal fusion and then feed the fused features into a classification model for regular classification. In this paper, we experiment with two different fusion strategies, i.e., early multimodal fusion and intermediate multimodal fusion as shown in Fig. 10 in Appendix A.


  • 段落优点:提出一种基于多模态嵌入和神经机器翻译的文本分类器(TEXTSHIELD),多模态融合描述值得学习,即多模态融合优于单模态的原因描述。


[4] Xueyuan Han, et al. UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats, NDSS20.

UNICORN is a host-based intrusion detection system capable of simultaneously (同时) detecting intrusions on a collection of networked hosts. We begin with a brief overview of UNICORN and then follow with a detailed discussion of each system component in the following sections. Fig.1 illustrates UNICORN’s general pipeline.

①Takes as input a labeled, streaming provenance graph.
UNICORN accepts a stream of attributed edges produced by a provenance capture system running on one or more networked hosts. Provenance systems construct a single, whole-system provenance DAG with a partial-order guarantee, which allows for efficient streaming computation (§ IV-B) and fully contextualized analysis (L2). We present UNICORN using CamFlow [100], although it can obtain provenance from other systems, such as LPM [16] and Spade [44], the latter of which interoperates with commodity audit systems such as Linux Audit and Windows ETW.

② Builds at runtime an in-memory histogram.
UNICORN efficiently constructs a streaming graph histogram (直方图) that represents the entire history of system execution, updating the counts of histogram elements as new edges arrive in the graph data stream. By iteratively exploring larger graph neighborhoods, it discovers causal relationships between system entities providing execution context. This is UNICORN’s first step in building an efficient data structure that facilitates contextualized graph analysis (L2). Specifically, each element in the histogram describes a unique substructure of the graph, taking into consideration the heterogeneous label(s) attached to the vertices and edges within the substructure, as well as the temporal order of those edges.

To adapt to expected behavioral changes during the course of normal system execution, UNICORN periodically discounts the influence of histogram elements that have no causal relationships with recent events (L3). Slowly “forgetting” irrelevant past events allows us to effectively model metastates (§ IV-D) throughout system uptime (e.g., system boot, initialization, serving requests, failure modes, etc.). However, it does not mean that UNICORN forgets informative execution history; rather, UNICORN uses information flow dependencies in the graph to keep up-to-date important, relevant context information. Attackers can slowly penetrate the victim system in an APT, hoping that a time-based IDS eventually forgets this initial attack, but they cannot break the information flow dependencies that are essential to the success of the attack [87].


③ Periodically, computes a fixed-size graph sketch.
In a pure streaming environment, the number of unique histogram elements can grow arbitrarily large as UNICORN summarizes the entire provenance graph. This variation in size makes it challenging to efficiently compute similarity between two histograms and impractical to design algorithms for later modeling and detection. UNICORN employs a similarity-preserving hashing technique [132] to transform the histogram to a graph sketch [7]. The graph sketch is incrementally maintainable, meaning that UNICORN does not need to keep the entire provenance graph in memory; its size is constant (L4). Additionally, graph sketches preserve normalized Jaccard similarity [64] between two graph histograms. This distance-preserving property is particularly important to the clustering algorithm in our later analysis, which is based on the same graph similarity metric.

④ Clusters sketches into a model.
UNICORN builds a normal system execution model and identifies abnormal activities without attack knowledge (L1). However, unlike traditional clustering approaches, UNICORN takes advantage of its streaming capability to generate models that are evolutionary. The model captures behavioral changes within a single execution by clustering system activities at various stages of its execution, but UNICORN does not modify models dynamically during runtime when the attacker may be subverting the system (L3). It is therefore more suitable for long-running systems under potential APT attacks.


  • 段落优点:这篇论文是经典的APT溯源图,其Overview写得非常好,这里将其全部引出供自己和大家学习。

PowerShell Deobfuscation(CCS19)

[5] Zhenyuan Li, et al. Effective and Light-Weight Deobfuscation and Semantic-Aware Attack Detection for PowerShell Scripts, CCS19.

As shown in §2.3, obfuscation is highly effective in bypassing today’s the PowerShell attack detection. To combat such threat, it is thus highly desired to design a effective and light-weight deobfuscation mechanism for PowerShell scripts. In this paper, we are the first to design such a mechanism and use it as the key building block to develop the first semantic-aware PowerShell attack detection system. As shown in Figure 3, the detection process can be divided into three phases:


  • Deobfuscation phase.
    In the deobfuscation phase, we propose a novel (新颖) subtree-based approach leveraging the features of the PowerShell scripts. We treat the AST subtrees as the minimum units of obfuscation, and perform recovery on the subtrees, and finally construct the deobfuscated scripts. The deobfuscated scripts are then used in both training and detection phases. Note that such deobfuscation function can benefit not only the detection of PowerShell attacks in this paper but the analysis and forensics of them as well, which is thus a general contribution to the PowerShell attack defense area.

  • Training and detection phases.
    After the deobfuscation phase, the semantics of the malicious PowerShell scripts are exposed and thus enable us to design and implement the first semantic-aware PowerShell attack detection approach. As shown on the right side of Figure 3, we adopt the classic Objective-oriented Association (OOA) mining algorithm [68] on malicious PowerShell script databases, which is able to automatically extract 31 OOA rules for signature matching. Besides, we can adapt existing anti-virus engines and manual analysis as extensions.

  • Application scenarios.
    Our deobfuscation-based semantic-aware attack detection approach is mostly based on static analysis. Thus, compared to dynamic analysis based attack detection approaches, our approach has higher code coverage, much lower overhead, and also does not require modification to the system or interpreter. Compared to existing static analysis based attack detection approaches [26, 32, 53, 55], our approach is more resilient to obfuscation and also more explainable as our detection is semantics based. With these advantages over alternative approaches, our approach can be deployed in various application scenarios, including but not limited to:
    – Real-time attack detection.
    – Large-scale automated malware analysis.


  • 段落优点:这篇论文是Powershell最经典的一篇论文,介绍去混淆工作,整篇论文的写作也值得我们学习,包括框架图及AST变换。

DeepReflect(USENIX Sec21)

[6] Evan Downing, et al. DeepReflect: Discovering Malicious Functionality through Binary Reconstruction, USENIX Sec21.

The goal of DEEPREFLECT is to identify malicious functions within a malware binary. In practice, it identifies functions which are likely to be malicious by locating abnormal basic blocks (regions of interest - RoI). The analyst must then determine if these functions exhibit malicious or benign behaviors. There are two primary steps in our pipeline, illustrated in Figure 2: (1) RoI detection and (2) RoI annotation. RoI detection is performed using an autoencoder, while annotation is performed by clustering all of the RoIs per function and labeling those clusters.

  • 异常基本块即ROI感兴趣区域识别

Terminology. First, we define what we mean by “malicious behaviors.” We generate our ground-truth based on identifying core components of our malware’s source code (e.g., denial-of-service function, spam function, keylogger function, command-and-control (C&C) function, exploiting remote services, etc.). These are easily described by the MITRE ATT&CK framework [9], which aims to standardize these terminologies and descriptions of behaviors. However, when statically reverse engineering our evaluation malware binaries (i.e., in-the-wild malware binaries), we sometimes cannot for-certain attribute the observed low-level functions to these higher-level descriptions. For example, malware may modify registry keys for a number of different reasons (many of which can be described by MITRE), but sometimes determining which registry key is modified for what reason is difficult and thus can only be labeled loosely as “Defense Evasion: Modify Registry” in MITRE. Even modern tools like CAPA [3] identify these types of vague labels as well. Thus in our evaluation, we denote “malicious behaviors” as functions which can be described by the MITRE framework.


RoI Detection. The goal of detection is to automatically identify malicious regions within a malware binary. For example, we would like to detect the location of the C&C logic rather than detect the specific components of that logic (e.g, the network API calls connect(), send(), and recv()). The advantage of RoI detection is that an analyst can be quickly pointed to specific regions of code responsible for launching and operating its malicious actions. Prior work only focuses on creating ad hoc signatures that simply identify a binary as malware or some capability based on API calls alone. This is particularly helpful for analysts scaling their work (i.e., not relying on manual reverse engineering and domain expertise alone).

RoI Annotation (标注). The goal of annotation is to automatically label the behavior of the functions containing the RoIs. In other words, this portion of our pipeline identifies what this malicious functionality is doing. Making this labeling nonintrusive to an analyst’s workflow and scalable is crucial(至关重要). The initial work performed by an analyst for labeling clusters is a long-tail distribution. That is, there is relatively significant work upfront but less work as they continue to label each cluster. The advantage of this process is simple: it gives the analyst a way to automatically generate reports and insights about an unseen sample. For example, if a variant of a malware sample contains similar logic as prior malware samples (but looks different enough to an analyst to be unfamiliar), our tool gives them a way to realize this more quickly.

  • 段落优点:本文提出一种二进制重构的恶意函数发现方法(DeepReflect),该系统流程为:将未解压的恶意软件样本作为输入,从每个输入(基本块)中提取CFG特征,并将它们应用于预训练的自编码器模型,以突出显示ROI(感兴趣的区域),最后聚类并标记这些区域。

Phishpedia(USENIX Sec21)

[7] Yun Lin, et al. Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages, USENIX Sec21.

Figure 3 provides an overview of our proposed system, Phishpedia. Phishpedia takes as input a URL and a target brand list describing legitimate brand logos and their web domains; it then generates a phishing target (if the URL is considered as phishing) as output. We refer to the logo that identifies with the legitimate brand as the identity logo of that brand. Moreover, input boxes are the small forms where a user inputs credential information such as username and password.

Given a URL, we first capture its screenshot in a sandbox. Then, we decompose the phishing identification task into two: an object-detection task and an image recognition task. First, we detect important UI components, specifically identity logos and input boxes, in the screenshot with an object detection algorithm [57, 58] (Section 3.1). As the next step, we identify the phishing target by comparing the detected identity logo with the logos in the target brand list via a Siamese model [33]
(Section 3.2). Once a logo in the target brand list (e.g., that of Paypal) is matched, we consider its corresponding domain (e.g., paypal.com) as the intended domain for the captured screenshot. Subsequently(随后), we analyze the difference between the intended domain and the domain of the given URL to report the phishing result. Finally, we combine the reported identity logo, input box, and phishing target to synthesize a visual phishing explanation (as shown in Figure 2).


  • 段落优点:本文设计了一个混合深度学习系统(Phishpedia),包括目标检测和图像识别,从而解决钓鱼网站识别的两个技术挑战,即(i)准确识别网页截图上的标识;(ii)匹配同一品牌的标识变体。该Overview比较简洁,但这篇论文的思想和背景值得我学习(目标检测=>安全场景)。



[8] Yuyu He, et al. TextExerciser: Feedback-driven Text Input Exercising for Android Applications, S&P21.

  • System Workflow

TextExerciser is a feedback-driven text exerciser that understands hints shown on user interfaces of Android apps and then extracts corresponding constraints (约束条件). The high-level idea of understanding these hints is based on an observation that these hints with similar semantics often have a similar syntax structure—and therefore TextExerciser can cluster these hints based on their syntax structures and then extract the constraints from the syntax structure. Now, let us give some details of TextExerciser’s workflow.

The exercising has three phases, seven steps as shown in Figure 2. First, TextExerciser extracts all the texts in the app’s UI (Step 1) and then identifies static hints via a learning-based method and dynamic hints via a structure-based differential analysis (Step 2). Second, TextExerciser parses all the extracted hints via three steps: classifying hints into different categories (Step 3), generating syntax trees for each hint (Step 4), and interpreting the generated tree into a constraint representation form (Step 5). Lastly, TextExerciser generates a concrete input by feeding constraints into a solver (Step 6), e.g., Z3. Then, TextExerciser solves the problem, feeds generated inputs back to the target Android app and extracts feedbacks, such as success and another hint (Step 7). In the case of another hint, TextExerciser will iterate the entire procedure until TextExerciser finds a valid input.


Now let us look at our motivating example in §II again to explain TextExerciser’s workflow. We start from the sign-up page, which has three text input fields, i.e., “username”, “password” and “confirm password”. TextExerciser generates a random input to the username field: If the username is used in the database, Yippi returns a “username used” hint. TextExerciser will then parse the hint and generate a new username. The “password” and “confirm password” are handled together by TextExerciser: based on the hint that “Both password has to be the same”1, TextExerciser will convert the hint into a constraint that the value of both fields need to be the same and then generate corresponding inputs.

  • 示例

After TextExerciser generates inputs for the first sign-up page, Yippi asks the user to input a code that is sent to a phone number. TextExerciser will first extract hints related to the phone number page, understand that this is a phone number, and then input a pre-registered phone number to the field. Next, TextExerciser will automatically extract the code from the SMS and solve the constraints by inputting the code to Yippi. In order to find the aforementioned vulnerability in §II, TextExerciser also generates text inputs to the “Change Password” page. Particularly, TextExerciser extracts the password matching hint and another hint that distinguishes old and new passwords, converts them into constraints and then generates corresponding inputs so that existing dynamic analysis tools can find the vulnerability.


有鉴于此,本文提出了一种面向移动应用的自动文本输入生成方法(TextExerciser)。其基于的insight是:只要文本输入不符合应用要求,应用软件都会将提示信息通过自然语言显示在人机交互界面上。本文通过结合自然语言处理和机器学习等技术,对应用提示信息进行解析,理解提示信息包含的输入限制,并据此自动生成输入文本。该过程是迭代进行,直到产生合适的文本输入。在实验过程中,本文将此文本生成方法与现有的动态测试和分析工具结合,验证了此方法不但能提高应用在测试过程中的代码覆盖,还能找到基于特定输入事件的程序漏洞和隐私泄露问题。相关研究成果发表在信息安全领域顶级会议S&P 2020上。


[9] Yue Duan, et al. DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing, NDSS20.

Figure 1 delineates the system architecture of DEEP- BINDIFF. Red squares represent generated intermediate data during analysis. As shown, the system takes as input two binaries and outputs the basic block level diffing results. The system solves the two tasks mentioned in Section II-A by using two major techniques. First, to calculate sim(mi) that quantitatively measures basic block similarity, DEEPBINDIFF embraces an unsupervised learning approach to generate embeddings and utilizes them to efficiently calculate the similarity scores between basic blocks. Second, our system uses a k-hop greedy (贪心) matching algorithm to generate the matching M(p1, p2).

The whole system consists of three major components: 1) pre-processing; 2) embedding generation and 3) code diffing. Pre-processing, which can be further divided into two sub-components: CFG generation and feature vector generation, is responsible for generating two pieces of information: inter-procedural control-flow graphs (ICFGs) and feature vectors for basic blocks. Once generated, the two results are sent to embedding generation component that utilizes TADW technique [48] to learn the graph embeddings for each basic block. DEEPBINDIFF then makes use of the generated basic block embeddings and performs a k-hop greedy matching algorithm for code diffing at basic block level.


  • 段落优点:DeepBinDiff是一篇经典的二进制相关论文,值得学习。


[10] Chenxiong Qian, Slimium: Debloating the Chromium Browser with Feature Subsetting, CCS20.

Figure 3 shows an overview of Slimium for debloating Chromium. Slimium consists of three main phases: i) feature-code mapp generation, ii) prompt website profiling based on page visits, and iii) binary instrumentation based on i) and ii).

Feature-Code Mapping. To build a set of unit features for debloating, we investigate source code [35] (Figure 2), previously-assigned CVEs pertaining to Chromium, and external resources [8, 47] for the Web specification standards (Step ① in Figure 3). Table 1 summarizes 164 features with four different categories. Once the features have been prepared, we generate a feature-code map that aids further debloating from the two sources (①’ and ②’). From the light-green box in Figure 3, consider the binary that contains two CUs to which three and four consecutive binary functions (i.e., { f0 − f2} and { f3 − f6}) belong, respectively. The initial mapping between a feature and source code relies on a manual discovery process that may miss some binary functions (i.e., from the source generated at compilation). Then, we apply a new means to explore such missing functions, followed by creating a call graph on the IR (Intermediate Representation) (Step ②, Section 4.2).


Website Profiling. The light-yellow box in Figure 3 enables us to trace exercised functions when running a Chromium process. Slimium harnesses a website profiling to collect non-deterministic code paths, which helps to avoid accidental code elimination. As a baseline, we perform differential analysis on exercised functions by visiting a set of websites (Top 1000 from Alexa [3]) multiple times (Step ③). For example, we mark any function non-deterministic if a certain function is not exercised for the first visit but is exercised for the next visit. Then, we gather exercised functions for target websites of our interest with a defined set of user activities (Step ④). During this process, profiling may identify a small number of exercised functions that belong to an unused feature (i.e., initialization). As a result, we obtain the final profiling results that assist binary instrumentation (③’ and ④’).

Binary Rewriting. The final process creates a debloated version of a Chromium binary with a feature subset (Step ⑤ in Figure 3). In this scenario, the feature in the green box has not been needed based on the feature-code mapping and profiling results, erasing the functions { f0, f1, f3} of the feature. As an end user, it is sufficient to take Step ④ and ⑤ for binary instrumentation where pre-computed feature-code mapping and profiling results are given as supplementary information (补充信息).

  • 段落优点:如今,Chromium已经成为移动端和PC端主流的浏览器。随着其功能的日益完善,Chromium的代码也日渐臃肿,这给攻击者带来了很多可利用的机会。考虑到这点,作者在本篇论文中提出了SLIMIUM——浏览器的精简化框架,通过对不必要代码的删减,以达到缩小攻击面的目的。作者研究了CVEs和Web规范标准的外部资源,构建对应的特征。






(By:Eastmount 2022-04-01 晚上12点 http://blog.csdn.net/eastmount/ )

