Ulusal Tez Merkezi

Tez No	İndirme	Tez Künye	Durumu
559298		Diagnophone: An electronic stethoscope for respiratory audio analysis / Dıagnophone: Solunum sesi analizi için bir elektronik steteskop tasarımı Yazar:EGE YAĞ ÇAKIR Danışman: YRD. DOÇ. DR. GÖKHAN İNCE Yer Bilgisi: İstanbul Teknik Üniversitesi / Fen Bilimleri Enstitüsü / Bilgisayar Mühendisliği Ana Bilim Dalı / Bilgisayar Mühendisliği Bilim Dalı Konu:Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol = Computer Engineering and Computer Science and Control Dizin:	Onaylandı Yüksek Lisans İngilizce 2019 93 s.

Günümüzde solunum hastalıkları, dünyadaki ölümlerin ana nedenlerinden biridir. Türk İstatistik Kurumu verilerine göre, Türkiye'de bu alanda çalışan uzman doktor sayısında ciddi bir eksiklik bulunmaktadır. Bu hastalıkların tanısı için röntgen, tomografi, MR gibi farklı testler mevcut olsa da, bu ekipmanların pahalı ekipmanlar olması sebebiyle her klinikte bulunması zordur. Az bulunması sebebiyle, bu ekipmanlardan yararlanmak için hastalar uzun süre sıra beklemek zorunda olması sebebiyle sonuçlara hemen ulaşamamaktadır.Ayrıca örneğin MR sırasında hastanın hareketsiz şekilde uzun süre beklemesi gerekmektedir. Bu, özellikle klostrofobi sahibi hastalar için daha da stresli bir süreç haline gelmektedir. Bunun yanında, tüm bu ekipmanlardan yararlanan hastalar, yoğun miktarda radyasyona maruz kalmaktadır. Tomografi örneğinde, bu oran daha da fazladır. Yapılan doktor görüşmeleri sonucunda, birçok hastalığın sadece stetoskop ile dinlenerek de anlaşılabileceği, fakat doktorların tanıdan emin olmak için tekrar bu tip ekipmanlarla bir test istediği, bu testleri bir validasyon niteliğinde kullandığı bir çok doktor tarafından belirtilmiştir. Anomali tespitinde bu tip ekipmanların mevcut olmasına rağmen, stetoskop doktorlar için hala ilk başvurulan, en ucuz ve en sık kullanılan tanı cihazıdır. Bu sebeple bu tezde, Makine Öğrenmesi yardımı ile hastalığın teşhisi konusunda hekimlere yardımcı olacak akıllı bir elektronik stetoskop tasarlanmıştır. Tasarlanan bu stetoskop ile duyulan akciğer sesi aynı zamanda kayıt da edilebilecektir. Bu özellik, danışılacak uzman bir hekim bulunmadığı durumlarda telekonferans yapmak için kullanılabileceği gibi, hastanın ses verisinin saklanmasında da kullanılabilecektir. Günümüzde hastalar daha önce yaptırdıkları test sonuçlarına (örneğin kan testi, tomografi sonucu vb.) erişebilmektedirler. Fakat bir önceki akciğer sesi gibi bir bilgi saklanmadığı için, hastalığın takibinde ses verisi kullanılamamaktadır. Oysa astım gibi takip gerektiren hastalıklarda, hastanın akciğer sesinin saklanması, hastalığın takibini kolaylaştıracağı gibi, hastalığın önceye kıyasla durumunun belirlenmesi konusunda da yardımcı rol oynayacaktır. Diagnophone ile kaydedilen bu ses aynı zamanda tıp eğitiminde de kullanılabilecektir. Çünkü günümüzde tıp eğitimi, hasta yatağı başında toplanan öğrencilerin, öğretmenlerinin ardından sıra ile aynı steteskop ile hastanın akciğer seslerinin dinlenmesinden ibarettir. Yapılan kullanıcı görüşmeleri sonucu, bunun efektif bir öğrenim çeşidi olmadığı çıkarımı yapılmıştır. Fakat Diagnophone sayesinde hastadan kaydedilen sesler, cep telefonu hoparlöründen tekrar dinletilerek öğretmen tarafından anomalininolduğu yerin özellikle belirtilmesi, ya da Diagnophone tarafından sonucun öğrencilere gösterilmesi, tıp eğitimini öğrenciler için daha verimli ve anlaşılır hale getirebilecektir. Tüm bunları gerçeklemeden önce, gerçekleme sırasında ve sonrasında kullanıcılar ile görüşülmüş, çeşitli prototiplerin kullanıcılar tarafından deneyimlenmesi sağlanmış, bu sırada çeşitli gözlemler yapılmış ve anketler ile yapılan deneylerin başarısı ölçülerek, tasarımların bu doğrultuda evrimleşmesi ve son halini alması sağlanmıştır. Hekimlerin kullanıcı deneyimi ve bilgisayar insan etkileşimi açısından tüm ihtiyaçlarını karşılayacak bir tasarım oluşturabilmesi için, çeşitli hastanelerden 10 doktor ve 5 tıp fakültesi öğrencisi ile görüşmeler ve kullanıcı testleri yapılmıştır. Bu çalışmada iki farklı veri kümesi kullanılmıştır. Bunlardan ilki, Uluslararası Biyomedikal ve Sağlık Bilişimi Konferansı tarafından yayınlanan, 920 adet ses dosyasından oluşan, 126 hastadan toplanmış, 6898 solunum sesi içeren veri seti, ikinci ise 44 hastadan (18 çatırtı, 5 hırıltı, 11 çatırtı ve hışıltılı, 10 sağlıklı) 6 farklı akciğer lobundan toplanan kayıtları içeren veri kümesidir. İkinci ise, Ümraniye Eğitim ve Araştırma Hastanesi'nde Diagnophone aracılığıyla toplanan ve uzman doktorlar tarafından etiketlenen 370 ses kaydından oluşan veri kümesidir. Her iki veri kümesi de sınıflandırma adımından önce ön işleme aşamasından geçirilmiştir. Daha fazla eğitim verisi elde etmek için, iki farklı teknikle veri çoklama işlemi gerçekleştirilmiştir. İlk olarak, sinyallere zaman ekseni boyunca rastgele bir oranda uzatma veya daraltma uygulanarak ses dosyaları kopyalanmıştır. İkinci olarak ise üretilen Mel spektogram görüntüleri, frekans ekseninde rastgele doğrusal çözgü (random linear warping) Vokal Kanal Uzunluğu Pertürbasyonu (VTLP) kullanılarak dönüştürülmüştür. Bunu takiben, her bir ses kaydı bir solunum döngüsünden oluşacak şekilde bölütlenmiştir. Elde edilen yeni seslerin yaklaşık %95'inde bir nefes alış ve veriş döngüsünün 5 saniyede tamamlandığı sonucuna varılmıştır. Bu sebeple, tüm sinyaller, 5'er saniyelik parçalara bölünmüştür ve elde edilen bölütler 5 saniyeden azsa, bölütüm kalanı sıfır dolguyla (zero padding) doldurulmuştur. Özellikle ikinci veri kümesinden elde edilen sinyallerde, diyafram hasta vücuduna ilk kez dokunduğunda, limit eşik değerlerini aşması nedeniyle gürültü artışı ve seste kırpılma (clipping) meydana geldiği gözlenmiştir. Bu bölümlerin belirlenerek, belirli limitler içinde kalacak hale getirilmesi sağlanmıştır. Ön işleme tabi tutulmuş sesler one-hot etiketleme kullanılarak etiketlendikten sonra, özellik çıkarma adımı gerçekleştirilmiştir. Bu adımda, seslerin çeşitli zamansal ve spektral özellikleri, sinyallerin Mel Frekanslı Cepstral Katsayıları ile birlikte hesaplanmıştır. Bunu takiben, bu özelliklerden en etkin olanları Prensip Bileşen Analizi kullanılarak seçilmiş ve sınıflandırma aşamasında kullanılmıştır. Sınıflandırma adımında ilk önce veriler %80 eğitim, %10 test ve %10 validasyon için kullanılmak üzere bölümlere ayrılmıştır. Daha sonra, sınıflama aşamasında iki çeşit yaklaşım izlenmiştir. Birinci olarak elde edilen bu öznitelikler, Destek Vektör Makineleri, K-En Yakın Komşu ve çok sınıflı Adaboot Karar Ağacı algoritması yardımı ile sınıflandırılarak başarıları kıyaslanmıştır. İkinci olarak ise, sinyallerin Mel spektrogram görüntüleri oluşturulmuş ve bu görüntüler Konvolüsyonel Sinir Ağları'na girdi olarak verilerek sınıflanması sağlanmıştır. Sonuç olarak, CNN algoritması %81.1 ile iki veri kümesi için de en yüksek doğruluk elde edilen algoritma olmuştur. Literatürde akciğer oskültasyonunun vücudun hem ön hem de arkasından yapılmaktadır. Bu sebeple hastaların hem sırt hem de göğüs bölgesinden toplanan sesler işlenmiştir. Fakat göğüs bölgesinden toplanan ses dosyalarında kalp seslerinin akciğer seslerini baskıladığı ve bunun toplam başarıyı baskıladığı gözlemlenmiştir. Daha sonra doktorlar ile yapılan gözlemler ve görüşmeler sonucunda, teorik bilgide göğüsten oskültasyon önerilse de, pratikte doktorların akciğer oskültasyonunu sadece hastaların sırt bölgesinden yaptığı, göğüs bölgesini es geçtiği gözlemlenmiştir. Bu iki sebeple, projenin kapsamında sadece sırt bölgesinden alınan sesler benimsenmiştir.

Today, pulmonary diseases are one of the major causes of mortality in the world. According to the Turkish Institution of Statistics, there is a serious lack of expert doctors in Turkey trying to help people with such deadly diseases. There are different tests available in order to diagnose an anomaly related to the lungs such as X-ray and tomography. Unfortunately, these types of equipment are not easy to find in every health clinic and they are both expensive and require time-consuming procedures for the patients. Because of the low availability of these types of equipment, patients cannot reach the results immediately because they have to wait a long time period for the queue to use them. In addition, for example, during MRI, the patient must remain idle for a long time which can be stressful for especially patients with claustrophobia. Furthermore, the patients who benefit from all these equipment are exposed to intense radiation during the use. In the tomography case, this rate is even greater. As a result of the interviews conducted with many doctors, it is concluded that many diseases can be understood only by listening with a stethoscope, yet still, these tests are being applied by these kinds of equipment in order to validate the doctor's initial diagnosis. Even though there are various diagnostic tests available, the stethoscope is still the first, cheapest and the most frequently used diagnosis device for the physicians. Therefore, in this thesis; a smart electronic stethoscope has been designed to help physicians with the identification of the anomaly and the diagnosis of the disease using Machine Learning. With this stethoscope, the lung sound can be heard and recorded at the same time. This feature can be used for teleconferencing when there is no specialist physician to be consulted, and also for storing the patient's audio data. Nowadays, patients have access to the old test results (eg, blood tests, tomography results, etc.). Since data such as the previous lung sound is not stored in hospitals, the sound data cannot be used in the follow-up of the disease. However, in patients requiring follow-up such as asthma, storing the lung sound of the patient will facilitate the follow-up of the disease and will also help to determine the condition of the disease compared to the previous. The audio recorded with Diagnophone can also be used in medical education. Today's medical education consists of listening to the lung sounds of the patients with the same stethoscope by the students, after the teachers, who are gathered at the beginning of the patient's bed. As a result of the user interviews, it was inferred that this was not an effective type of learning. However, the sounds recorded from the patient through Diagnophone can be played with the speaker of the mobile phone by the teacher while pointing out the important parts to the students or these anomalies can be identified by the Diagnophone. That way better learning experience for the students will be achieved and this will make medical education more efficient and understandable for the students. Before the development of this system, the users were interviewed before, during and after the implementation, by testing various prototypes of the system while various observations were made relating the experience of the users and the success of the experiments with the questionnaires was measured. As a result, the design was evolved and finalized. In order to create a design that satisfies all the needs of the physicians in terms of user experience and human-computer interaction principles, 10 doctors and 5 medical students from several hospitals have been contacted and interviewed. In this study, two datasets have been used. The first one is the dataset published by the International Conference of Biomedical and Health Informatics, and the second dataset is the one that we have presented which contains recordings from 44 patients (18 crackles, 5 wheezing, 11 crackles and wheezing, 10 healthy), taken from 6 different lung lobes, consisting of 370 audio recordings collected via the Diagnophone and labeled by the specialist doctors in Ümraniye Eğitim ve Araştırma Hastanesi. Both datasets are first pre-processed before the classification step. In order to obtain more training data, data augmentation has been performed with two different techniques. Firstly, audio has been stretched (by speeding up and down randomly) along the time axis and secondly, after the creation of the Mel spectogram images, using Vocal Tract Length Perturbation (VTLP) with random linear warping, the Mel spectogram images have been transformed along the frequency axis randomly. Following that, each audio recording has been divided into breathing cycles and it has been concluded that one breathing cycle takes approximately 5 seconds in 95\% of the audio. Therefore, all of the audio has been divided into 5-second chunks and if the divided chunk is less then 5 seconds, the remaining of the audio has been filled with zero padding. Especially with our data, when the patient firstly touched his body, the noise increase occurred due to exceeding the limit threshold values. This caused clipping and these parts have been identified and reduced from the audio. After the pre-processed audios have been labeled using one-hot labels, the feature extraction step has been carried out. In this step, various temporal and spectral features of the audios have been calculated along with the Mel Frequency Cepstral Coefficients of the signals. Following this, the most efficient of these attributes have been selected using Principle Component Analysis and used in the classification phase. Before the classification step, Mel spectrogram images of the signals have been calculated and these images have been classified alongside the features extracted and the success of both approaches have been compared. In the classification step, first the data has been divided into 80\% training, 10\%test and 10\% validation parts and for two datasets, models have been created using Convolutional Neural Networks, Support Vector Machines, K-Nearest Neighbour and Multi-class Adaboot algorithm with decision trees. The results of these algorithms have been calculated for both datasets. As a result, 81.1\% accuracy was obtained with the CNN algorithm and the efficiency of the system was shown.