Tez No İndirme Tez Künye Durumu
348723
Privacy preserving data publishing with multiple sensitive attributes / Privacy preserving data publishing with multiple sensitive attributes
Yazar:AHMED ABDALAL
Danışman: DOÇ. DR. YÜCEL SAYGIN ; YRD. DOÇ. DR. MEHMET ERCAN NERGİZ
Yer Bilgisi: Sabancı Üniversitesi / Mühendislik ve Fen Bilimleri Enstitüsü / Bilgisayar Mühendisliği Ana Bilim Dalı
Konu:Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol = Computer Engineering and Computer Science and Control
Dizin:
Onaylandı
Doktora
İngilizce
2012
100 s.
Veri madenciliği tahmin edilebilir gizli bilgiyi büyük very tabanlarından çıkarma işlemidir. Devletlere, araştırmacılara ve şirketlere veri ambarlarındaki en önemli bilgilere odaklanmaları konusunda yardım etmek gibi büyük bir potansiyele sahiptir. Veri madenciliğinin yüksek bir etki sağlayabilmesi için yüksek kaliteli veriye ve etkin veri yayıncılığına ihtiyaç duyulur. Buna karşın, yayınlanan veri için kişisel mahremiyetin korunması da açık bir ihtiyaçtır.
Data mining is the process of extracting hidden predictive information from large databases, it has a great potential to help governments, researchers and companies focus on the most significant information in their data warehouses. High quality data and effective data publishing are needed to gain a high impact from data mining process. However there is a clear need to preserve individual privacy in the released data. Privacy-preserving data publishing is a research topic of eliminating privacy threats. At the same time it provides useful information in the released data. Normally datasets include many sensitive attributes; it may contain static data or dynamic data. Datasets may need to publish multiple updated releases with different time stamps. As a concrete example, public opinions include highly sensitive information about an individual and may reflect a person's perspective, understanding, particular feelings, way of life, and desires. On one hand, public opinion is often collected through a central server which keeps a user profile for each participant and needs to publish this data for researchers to deeply analyze. On the other hand, new privacy concerns arise and user?s privacy can be at risk. The user?s opinion is sensitive information and it must be protected before and after data publishing. Opinions are about a few issues, while the total number of issues is huge. In this case we will deal with multiple sensitive attributes in order to develop an efficient model. Furthermore, opinions are gathered andpublished periodically, correlations between sensitive attributes in different releases may occur. Thus the anonymization technique must care about previous releases as well as the dependencies between released issues. This dissertation identifies a new privacy problem of public opinions. In addition it presents two probabilistic anonymization algorithms based on the concepts of k-anonymity [1, 2] and `-diversity [3, 4]diversity to solve the problem of both publishing datasets with multiple sensitive attributes and publishing dynamic datasets. Proposed algorithms provide a heuristic solution for multidimensional quasi-identifier and multidimensional sensitive attributes using probabilistic `-diverse definition. Experimental results show that these algorithms clearly outperform the existing algorithms in term of anonymization accuracy.