Tez No İndirme Tez Künye Durumu
403358
Behavior based malware classification using online machine learning /
Yazar:ABDURRAHMAN PEKTAŞ
Danışman: Prof. Dr. JEAN-CLAUDE FERNANDEZ ; Dr. TANKUT ACARMAN
Yer Bilgisi: Université de Grenoble / Yurtdışı Enstitü
Konu:Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol = Computer Engineering and Computer Science and Control ; Bilim ve Teknoloji = Science and Technology
Dizin:
Onaylandı
Doktora
İngilizce
2015
155 s.
Recently, malware (short for malicious software) has greatly evolved and has became a major threat to the home users, enterprises, and even to the governments. Despite the extensive use and availability of various anti-malware tools such as antiviruses, intrusion detection systems, firewalls etc., malware authors can readily evade these precautions by using obfuscation techniques. To mitigate this problem, malware researchers have proposed various data mining and machine learning approaches for detecting and classifying malware samples according to the their static or dynamic feature set. Although the proposed methods are effective over small sample sets, the scalability of these methods for large data-sets is under investigation and has not been solved yet. Moreover, it is well-known that the majority of malware is a variant of previously known samples. Consequently, the volume of new variants created far outpaces the current capacity of malware analysis. Thus developing a malware classification to cope with the increasing number of malware is essential for the security community. The key challenge in identifying the family of malware is to achieve a balance between increasing number of samples and classification accuracy. To overcome this limitation, unlike existing classification schemes which apply machine learning algorithms to stored data, (i.e. they are off-line algorithms) we propose a new malware classification system employing online machine learning algorithms that can provide instantaneous update about the new malware sample by following its introduction to the classification scheme. To achieve our goal, firstly we developed a portable, scalable and transparent malware analysis system called VirMon for dynamic analysis of malware targeting the Windows OS. VirMon collects the behavioral activities of analyzed samples in low kernel level through its developed mini-filter driver. Secondly, we set up a cluster of three machines for our online learning framework module (i.e. Jubatus), which allows to handle large scale data. This configuration allows each analysis machine to perform its tasks and delivers the obtained results to the cluster manager.Essentially, the proposed framework consists of three major stages. The first stage consists of extracting the behavior of the sample file under scrutiny and observing its interactions with the OS resources. At this stage, the sample file is run in a sandboxed environment. Our framework supports two sandbox environments: VirMon and Cuckoo. During the second stage, we apply feature extraction to the analysis report. The label of each sample is determined by using Virustotal, an online multiple anti-virus scanner framework consisting of 46 engines. Then at the final stage, the malware dataset is partitioned into training and testing sets. The training set is used to obtain a classification model and the testing set is used for evaluation purposes. To validate the effectiveness and scalability of our method, we have evaluated our method by using 18,000 recent malicious files including viruses, trojans, backdoors, worms, etc., obtained from VirusShare, and our experimental results show that our method performs malware classification with 92% of accuracy. Keywords: Malware classification, dynamic analysis, online machine learning, behavior modeling