Data Mining For Detecting Diabetes Patients

Ayush Tiwari
4 min readApr 18, 2021

Diabetes is a inveterate defect and disturbance resulted from metabolic conk out in carbohydrate metabolism thus it has occupied a globally serious health problem. In general, the detection of diabetes in early stages can greatly has significant impact on the diabetic patients treatment in which lead to drive out its relevant side effects.

Machine learning is an emerging technology that providing high importance prognosis and a deeper understanding for different clustering of diseases such as diabetes. And because there is a lack of effective analysis tools to discover hidden relationships and trends in data, so Health information technology has emerged as a new technology in health care sector in a short period by utilizing Business Intelligence ‘BI’ which is a data-driven Decision Support System.

In this study, we proposed a high precision diagnostic analysis by using k-means clustering technique. In the first stage, noisy, uncertain and inconsistent data was detected and removed from data set through the preprocessing to prepare date to implement a clustering model. Then, we apply k-means technique on community health diabetes related indicators data set to cluster diabetic patients from healthy one with high accuracy and reliability results.

Diabetes

By 2040, researchers and statisticians are expected that about 642 million adults (1 in 10 adults) will have diabetes. moreover, 46.5% of those diabetic adults have not been diagnosed [1]. In order to reduce this high numbers of deaths according to diabetes, it is important for providing many advanced methods and techniques that will help efficiently in diagnosis of diabetes in early stages and be devised, because a large number of deaths between diabetic patients are resulted from the late in diagnosis of diabetes. In order to develop and implement an advanced techniques for the early diagnosis of diabetes, we extensively need to utilize sophisticated information technology solutions, Business Intelligence and data mining is a suitable IT tools for this situations.

Business Intelligence (BI) and Data mining techniques have a critical role in the medical and the healthcare sectors depending on the Patient Electronic Health Record thanks to that the Business intelligence is considered a broad category of methodologies, solutions, and applications for capturing, collecting, maintaining, analyzing, and providing easily data access to help users in making successful and faster decisions. it also include various activities and functions of decision support systems such as querying and reporting, online analytical processing ‘OLAP’, statistical analysis, forecasting, and data and text mining.

METHODOLOGY

a) K Nearest Neighbour(KNN): This algorithm is a supervised learning algorithm, K- means a number of a vector. The working methodology of KNN is pretty simple, it’s predict based on the value of K parameter. Graphical representation of K Number Nearest Neighbours are depicted in following fig.

PROTOTYPE:

DATASETS

PIMA INDIAN DIABETES DATASET.

This dataset is originally from the national insititute of diabetes .

RESULTS

In this section we will discuss regarding our results which we have achieved after experimental design. Following TABLE I. represents an insight description of our Pima Indian Dataset. This dataset is mainly based on the females those were living at Pima Indian heritage. Following 8 features (a-h) of Pima Indian dataset helps us to predict the diabetes of any Individuals with the help of our proposed methodologies.

a) Numbers of time Pregnant

b) Glucose Test

c) Blood Pressure

d) Triceps skinfold thickness

e) 2-Hour Serum Insulin

f) Body Mass Index

g) Diabetes Pedigree function

h) Ag

CONCLUSION

Currently , we have proposed an online application for prediction of Diabetes Diseases from various calculations K Nearest Neighbour gives us most elevated precision with on Indian Pima Dataset. As we have already proposed and used machine learning algorithms to predict the development of diabetes disease, which is a significant potential for the detection of accurate medical data in various fields of medical science. In the near future, our focus is to use a deep learning model and to form a location based dataset from medical dataset to successfully predict diabetes disease.

--

--