This is my course project for Introduction to Data Science (CS 210) course in Sabancı University, Turkey
As an old user of Mi Bands and current user of Garmin Fenix 6, I have been always fascinated by how smartwatches can collect biological data simultaneously while one’s personal life continues. Thanks to these small devices, there is no need to participate in a lab experiment to see your valuable biometrics in expanse of risking your sensitive information.
Wearable technology today still requires huge development regarding accurate measurement. As of this project, I am determined to continue to extract data from my personal wearable devices and deduce personal data to acknowledge my biological patterns.
That is why I chose to work on this project, where I extract and analyze personal sleep and physical activity data with the aim of correlating different features in my dataset.
Before starting to work on this project, I knew that wearable technology companies love to monetize collected personal data of their customers. Of course, why not? So, Garmin’s API was only accessible to their partner companies. I had to find another way to access my OWN personal data!
Scraping Garmin Connect, the companion app for Garmin wearable devices was tricky, which explains the lack of scraping endeavors on the web. It is a dynamic website with rate limitations, IP blocking and authentication protocols that add an extra layer of complexity. Then, everything changed when I encountered Garmin DB.
I’ve collected the data required for my project through GarminDB, a collection of well-structured python scripts for parsing health data from Garmin Connect. With several terminal commands I downloaded my health data in .db format. Using SQLite, I converted .db files into human-readable CSV format, and began to preprocess my data.
All analysis scripts that I used are available in this Jupyter notebook As any successfull data science project should, this project includes different data analysis stages and techniques to interpret the extracted data. This is a concise walkthrough of the stages and the techniques I’ve used:
In this final stage, I put forward some hypothesis that I inferred from the EDA stage and prove or disprove them using statistical analysis.
Techniques used in this stage:
I had many assumptions, hypotheses taken for granted that are related to my health data and daily health patterns. Most importantly, I learned that it is highly likely that they do not align with what the data says.
Here, are some hypothesis tested in this project, related to my sleep and physical activity data:
Contrary to my initial assumption, the data reveals a significant positive correlation between sleep duration and sleep quality. This correlation, with a coefficient of 0.63 and an impressively low p-value of 7.4e-24, emerges as the strongest association within the project.
Surprisingly, there exists a negative correlation, supported by a low p-value, indicating that as I burn more calories during the day, my sleep quality tends to deteriorate. This finding challenges my prior belief that physically active days would result in better sleep quality.
By the statistical analysis performed, the dataset does not provide evidence to support the notion that my daily step count influences an earlier bedtime, contrary to my previously held logical assumption.
By applying chi-square test to the categorical features, I have found that stress level and sleep quality are correlated, thus not independent from each other. Finally, a conclusion aligning with my initial assumption made my day.
In this project, I have suffered from the lack of a big dataset. In the upcoming years, as my health data grows, I am planning to update my findings by feeding new and bigger accumulated data into my analysis methods and seek any contradiction or support in my initial findings.
Also I would like to devise cleaner and more inclusive ways to collect personal data from Garmin’s database. The limitation regarding data collection was time consuming and detrimental for the aim of this project.