Please use this identifier to cite or link to this item:
http://dspace.dtu.ac.in:8080/jspui/handle/repository/22898| Title: | PREDICTING CROP YIELDS USING MACHINE LEARNINGWITH SHAP-BASEDEXPLAINABILITY |
| Authors: | SAINI, SHREYA YADAV, TANVI Gupta, Anjana (SUPERVISOR) |
| Keywords: | CROP YIELD PREDICTION REGRESSION MODELS DECISION TREES SHAP ANALYSIS AGRICULTURAL ANALYTICS FEATURE IMPORTANCE |
| Issue Date: | May-2026 |
| Series/Report no.: | TD-8772; |
| Abstract: | The agriculture sector is one of the pillars of food security and development world wide. Accurate estimates of crop production are valuable for making decisions in the field of resource planning, supply chain management and policy making. However, con ventional methods of yield prediction, relying on past patterns or basic statistical models, come with limitations in being able to account for the nonlinear relationships found in agriculture. Predicting yield based on environmental, meteorological and agronomic data is a machine learning approach in this project. This data is taken from the Food and Agri culture Organization (FAO) and is available on Kaggle, which consists of 28,242 agricul tural records from 130 countries from 1990– 2013. Five different model regression were developed and used for testing: Linear Regression, Lasso Regression, Ridge regression, K-Nearest Neighbours (KNN) and Decision Tree Regressor. The input variables with temporal (year), climatic (average rainfall, temperature), agronomic (pesticide use) and categorical variables (crop type and geographic region). Handling missing values, one hot encoding of categorical features, and feature standardization using a scikit-learn’s ColumnTransformer pipeline comprised the preprocessing steps. Results have shown that DT Reg performed best on generalization with an R² value of 0.9793 and Mean Squared Error (MSE) of 3941, an improvement of 23% over the baseline LR model (R² = 0.7473). More interpretability was achieved by using the SHapley Additive exPlanations (SHAP) analysis: crop type, especially potatoes, turned out to be the most important and dom inating factor, while pesticide use and climatic factors followed. These nonlinear rela tionships were identified using feature-level analysis via SHAP dependence plots, with pesticide use showing a strong negative correlation with yield for high productivity sce narios, and temperature and rainfall having complex interactions, which varied by crop type and region. The project is finalized with the deployment-ready predictive system, and the discussion of practical applications in precision agriculture. |
| URI: | http://dspace.dtu.ac.in:8080/jspui/handle/repository/22898 |
| Appears in Collections: | M Sc Applied Maths |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Shreya & Tanvi M.Sc.pdf | 3.28 MB | Adobe PDF | View/Open | |
| Shreya & Tanvi plag.pdf | 31.98 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



