About

Why did we do this project?

Our Motivation


Obesity has severe impacts on health and can even affect quality of life. It is a risk factor for age-related diseases and decreases life expectancy by 7 years. Nationally, 41.9 percent of adults have obesity. According to the National Institutes of Health, obesity and overweight together are the second leading cause of preventable death in the United States, close behind tobacco use. An estimated 300,000 deaths per year are due to the obesity epidemic.


Our goal is to make people more aware of habits that can help them curb their risk of obesity or take measures to reduce its severity.

Data Sources

Our project presents a comprehensive dataset for estimating obesity levels in individuals across Mexico, Peru, and Colombia, predicated on their dietary patterns and physical well-being. The dataset encompasses 17 attributes and comprises 2111 records, each distinctly labeled with the class variable "NObeyesdad" denoting Obesity Level. This classification system facilitates the categorization of data into classes such as Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II, and Obesity Type III.


Of the total dataset, 77% was synthetically generated using the Weka tool and the SMOTE filter, enhancing the richness and diversity of the data. The remaining 23% was directly sourced from user inputs through a dedicated web platform, ensuring a real-world dimension to the dataset.


The mix of synthetic and user-generated data was meant to ensure, by the collectors of the data, a robust dataset for meaningful insights and practical applications in obesity research and management.

This data source tracks:

  • Gender
  • Age
  • Height
  • Weight
  • Family history with overweight
  • FAVC (Consume high-calorie foods frequently)
  • FCVC (Number of meals where you usually eat vegetables)
  • NCP (Number of main meals a day)
  • CAEC (Eat food between meals)
  • Smoking
  • NObeyesdad (Obesity Level)
  • CH20 (Consumption of water daily)
  • CALC (Consumption of alcohol)
  • SCC (Calories consumption monitoring)
  • FAF (Physical activity frequency)
  • TUE (Time using technology devices)
  • MTRANS (Transportation used)
Link to Data Source

Data Journey and Caveats


After downloading the .csv file, we transferred the file as text into our colab file to consolidate the number of files and lines of code we would need to use. After the transfer, we made the file into a pandas DataFrame that would allow us to visualize the data more easily. Once we created this, we added another column for Body Mass Index(BMI). This was calculated using the formula BMI= weight (kg) / height (m) ^ 2


Then, we created 9 visualizations based on the 17 other attributes listed above. Using plotly.express and our DataFrame, we were able to quickly and efficiently create visualizations. We were also able to make many updates to our charts using plotly's update functions, that proved especially helpful in the organization of variables and the quality of our presentation of the data.

Learning and Conclusions


Having delved into the analysis of the data, it's imperative to acknowledge that a significant portion, approximately 77%, was synthetically generated, potentially introducing biases into the findings. As we reflect on our observations, we'd like to emphasize the need for future studies in this region to incorporate a more extensive set of real-world data.


Despite this, the analysis has provided valuable insights into the landscape of obesity across Mexico, Peru, and Colombia. Amidst these considerations, the dataset itself remains a rich source of information, offering a rich perspective on the intricate interplay of dietary habits, physical conditions, and obesity levels. The journey of analysis has been an enlightening exploration of the multifaceted nature of obesity. This dataset, with its strengths and limitations, becomes a reflective stepping stone, marking a starting point for better understanding and finding innovative solutions to address obesity's impact on diverse populations in Mexico, Peru, and Colombia.

About the Project Members


Nadxielli Arredondo, currently in her junior year, is pursuing a major in FTT with a concentration in Film, complemented by a supplementary major in Latino Studies.


Brendan Schaefer is a senior from Arlington, Texas, pursuing a degree in Finance. He currently lives on-campus in Keenan Hall.



Thank you for visiting our project!

project-1-04