978-1-4577-1591-4/11/$26.00 ©2011 IEEE Infrastructure for Data-Driven Agriculture: Identifying Management Zones for Cotton using Statistical Modeling and Machine Learning Techniques Edmund W. Schuster, Sumeet Kumar, Sanjay E. Sarma Field Intelligence Lab Massachusetts Institute of Technology Cambridge, MA USA [email protected], [email protected], [email protected] Jeffrey L. Willers Genetics and Precision Agriculture Research Unit United States Department of Agriculture, Agricultural Research Service Mississippi State, MS USA [email protected] George A. Milliken Professor Emeritus, Department of Statistics, Kansas State University, Manhattan, Kansas [email protected] Abstract— Advances in many areas of sensing technologies and the widespread use and greater accuracy of global positioning systems offer the prospect of improving agricultural productivity through the intensive use of data. By nature, agriculture is a spatial science characterized by significant variability in terms of yield and concentration of pests and plant diseases. Consequently, precision agriculture seeks to improve the effectiveness of various types of sensing information to give the grower more data and the ability to design the specific treatments for site-specific management of inputs and outputs. The intensive use of data in agriculture is at a relatively early stage and there remains much opportunity to refine modeling approaches and to build information infrastructure. With the overall goal of optimizing inputs to achieve the maximum output in terms of yield, this paper focuses on the application of a clustering algorithm to field data with the goal to identify management zones. We employ two sets of attributes, first yield and second field properties like slope and electrical conductivity to delineate the management zones. By definition, a management zone is a contiguous area defined by one or more features and may take on many different shapes. Building on the established machine learning approach of k-means clustering, we successfully identify a near optimal number of management zones for a cotton field. Keywords - precision agriculture, management zones, k-means, unsupervised learning I. INTRODUCTION Author and investor Jim Rogers recently mentioned as part of an interview with the Wall Street Journal that America has underinvested in agricultural infrastructure during much of the post WWII period [1]. Consequently, since 2005 many agricultural commodities have steadily gained in price, reflecting tightening constraints due to higher energy costs, global supply for inputs to produce food and fiber, and global demand for these resources. Complicating things, according to some estimates the increase in world population to around 9 billion people by 2030, along with rising disposable income, will require a 100 percent increase in food production [2]. For all areas of the world, especially in Asia, constraints on land and water availability will make this goal challenging to achieve. Already, food price inflation is a major issue in China and India and has potential to cause wide ranging civil unrest. Our vision for Precision Agriculture (PA) involves the sophisticated formulation and use of mathematical models for ongoing analysis of spatial data along with Internet computing to rapidly connect models to data resident on farm computers [3]. In this way, PA will become a control system where data feedback from various sensors facilitates optimization of inputs. II. ELEMENTS OF PRECISION AGRICULTURE Representing a change from established philosophies, the essential concept of PA puts forth that crops undergo spatial variation during the growth cycle. For example, physical attributes of the field such as slope, drainage, soil type, and fertility will inherently cause variation in yields over space. As such, PA involves: ‘‘Matching resource application and agronomic practices with soil and crop requirements as they vary in space and time within a field. [4].’’ A. Sensing From an equipment standpoint, advances in the yield monitors mounted on harvest machinery have greatly improved the amount and quality of spatial data. These sensors measure the weight of the harvest per area. For example, in grain crops, the methodology involves a load sensor placed under a conveyer belt on the harvest machine, along with Global Positioning System (GPS) coordinates for the location in the field. Other data acquisition is possible through remote sensing. In particular, the use of multi-spectral imaging provides data over large spaces and at different resolutions. The Normalized Difference Vegetation Index (NDVI) is a type of raw data calculated from multi-spectral images.978-1-4577-1591-4/11/$26.00 ©2011 IEEE B. Control For inputs such as fertilizer and pesticide, Variable Rate Application Technology (VRT) is now available for most agricultural machines such as sprayers. This allows for the application of chemicals to vary over space. However, the precision of VRT is such that change in application amount is not instantaneous. This limits the number of rate changes possible within a given area. For example, switching the application amount every few meters is not possible with current technology. C. Calculating the Prescription through Induction Given sensing data for spatial yield and the capabilities of VRT, it is possible to increase the level of precision for managing agriculture inputs over space. However, few if any deductive approaches exist to calculate the exact pattern of inputs (termed a prescription) needed to maximize yield. Rather, foundational research in PA describes what amounts to a process of induction. The following quote represents a typical viewpoint from the agricultural research community: “Early scientific endeavor employed Baconian principles in experimental designs which involved the construction of scenarios and the collection of response observations in the hope of distilling an answer. [5]” By these means, it is possible to calculate the prescription needed to maximize yield. In turn, the prescription provides the information to implement the VRT. D. Management Zones With improved sensing and control capabilities, the current high priority issue for PA involves using mathematical models to rapidly analyze the field data and to determine the best course of action to optimize spatial yield. Specifically, the research contained in this paper explores the use of machine learning to identify management zones in a field where the combination (and interaction) of physical and variable inputs comprise homogeneous areas that identify similar spatial crop yield responses. Management zones are typically irregular in size, shape, and patterns of interspersion, which make them difficult to identify because of the complexities of the sets of spatial inputs and their interactions with field topography. III. LITERATURE REVIEW In cotton production, various researchers have established diverse criteria to define different management zones within a field. For instance, Landsat Thematic Mapper imagery for 11 consecutive years from the same cotton field was studied as a technique to establish temporally stable regions of similarity [6]. Another group of researchers examined the effect of landscape position and soil series on cotton phosphorous utilization [7]. Using soil electrical conductivity (ECa) measurements, researchers have observed significant correlations with several soil properties such as leaching fraction, pH, plant-available water, and salinity with cotton yield, and provided valuable information for site-specific management [8]. Others have developed software that used a fuzzy c-means unsupervised classification algorithm to apportion field information into management zones [9]. In the pest management of the tarnished plant bug in cotton, unsupervised classification techniques of normalized difference vegetation index (NDVI) values derived from imagery to determine different growth phenology classes [10]. In crops such as grain, there are other techniques for management zone delineation [11]. These include quadratic discriminant analysis (QDA) and k-nearest neighbor discriminant analysis (KNN) [12], fuzzy k-means clustering algorithm along with fuzzy performance index (FPI) modified partition entropy (MPE) to determine the optimum number of clusters [13]; the spatial contiguous k-means clustering algorithm (SC-KM) [14], and the watershed algorithm [15]. Arguably, cotton is different as compared to grain crops. For example, we did not find a strong correlation between NDVI and yield. One reason is that our analyses did not include the treatment structure attributes [16] for nitrogen rates applied to strip plots, which were part of the original experiment that generated the data. A second reason is that the relationship between NDVI and yield is not simple [17]. Further experimentation with this data set will potentially discover other causes (i.e., estimates of effects of nematode stress) for the absence of a strong correlation between NDVI and yield. IV. DATA The data for this research study comes from a cotton field located near Saint Joseph, Louisiana, and is referred to as the “Helena fertility trial.” Besides geodetic information, there are several classes of available data for determining management zones when calculating input-output responses. The first class, representing dependent variables include two measures of yield, bales of cotton per acre and biomass flow. The second class, representing geo-referenced field topographical characteristics, includes NDVI, obtained from multi-spectral imaging of the crop on 5 August 2005 (several weeks before harvest) using airborne sensors. Other measures include apparent soil electrical conductivity (ECa) readings that are useful for soil texture mapping. These data were collected using a Veris® model 3100 sensor cart (Veris Technologies, Salina, Kansas). The Veris® 3100 cart was used in conjunction with a sub-meter accurate Global Positioning System (GPS) receiver, and collected geo-referenced data of shallow and deep soil resistivities at one second intervals. The standard operating width across fields was 40 feet. The soil electrical conductivity data derived from the Veris® system was analyzed using SSToolbox®, an agriculture-oriented geographic information system (GIS), and then converted into a surface utilizing Surfer® for data interpolation. A Real-Time Kinematic (RTK) GPS system with cm accuracy in topographic measurements was used to collect elevation data [18]. This system consisted of a GPS receiver (the rover antenna), a RTK base station, with a data radio link between the two GPS antennas. Other physical characteristics of the field are slope, soil series type, and some operational variables that are part of the treatment structure, like the type of irrigation, seed variety, and chemical treatments (such as amounts of nitrogen) (see Appendix).978-1-4577-1591-4/11/$26.00 ©2011 IEEE The data set used in this research was assembled using GIS and statistical processing methods as described in [19]. In general, various information layers (themes) were registered to earth coordinates and obtained by selected remote and proximal sensing systems. This information, along with the yield monitor data, provided descriptions of the fields’ topology and topography. GIS processing attached data from all spatial layers to the yield monitor coordinates to produce the database table used in this research. The data set represents a single year of observations. As such, it does not allow for temporal analysis. However, if properly identified, management zones change little over time. A data set from a single year is a good start toward development of an algorithm to aid in the identification of management zones before the planting of the crop and imposition of agricultural management practices. The fact of limited data sets will possibly be routine for first time analyses of commercial production fields. V. METHODOLOGY In this paper, we employ a two-step k-means clustering algorithm to identify management zones (MZ). One of the critical factors in using k-means is identifying the best attributes/variables for management zone delineation. In the past, researchers have explored using yield [20], NDVI [21] or soil properties like slope, elevation, and electrical conductivity [22]. Researchers have hypothesized that NDVI is correlated with yield [23]. In our data set we have two estimates of yield, bales of cotton per acre (Y1) and biomass flow (Y2). We observed that the correlation between yield and NDVI (Y3) is low, i.e. ρ(Y1 ,Y3) = 0.5629 and ρ(Y2 ,Y3) = 0.4874 where ρ is the Pearson Correlation coefficient. Hence, we do not consider NDVI as an informative variable to delineate management zones. Our first set of attributes is Y1 and Y2. In addition, we consider another set of attributes that are fixed independent variables, which describe field topography. These include slope, electrical conductivity measured at deep, electrical conductivity measured at shallow and the ratio of deep to shallow electrical conductivity (all obtained from the VERIS® cart). The k-means algorithm is a popular clustering method used extensively for unsupervised learning and identifying structure in dataset. We denote the dataset of attributes by FN×d that includes N samples of d features/attributes. We normalize every column (attribute) of the dataset in the range [0 1] to remove any scale bias. The algorithm implements clustering by minimizing an objective function in a heuristic manner, which is usually the sum of the square of distance of every point from the corresponding cluster centroid, i.e. € O(K) = || f j − f i ||2 f j ∈Ci ∑ i=1 K∑ , (1) where K is the number of clusters, Ci’s is the i’th cluster and the centroid of every cluster is represented by € fi = f j f j ∈Ci ∑ NCi and € NC i is the number of points in cluster Ci. Note that fj’s can represent a vector (d > 1) and ||.|| denotes L2 norm. A critical aspect of management zone delineation is ensuring contiguity of the zones by taking into account spatial autocorrelation. When dividing points into clusters both the geographical proximity and attribute values should be considered. In some scenarios, when the k-means is used without considering a proximity metric for the point, the resulting clusters may not be contiguous (Fig. 1). The aforementioned issue is addressed by including spatial information of the points apart from the attribute values in the k-means algorithm. We augment the dataset by including the normalized geographic spatial information (x, y coordinates) of the points. Furthermore, a two-step k-means algorithm is implemented as described below (Note: Matlab® notation has been used to represent vectors and matrices): 1. Augment the dataset to D1 = [w0x w0y F]N×(d+2), where x and y are the spatial coordinates of the samples and w0 is a weighting factor, chosen to be less than 1 (e.g. w0 = 0.2). 2. Implement k-means on D1 with large number of clusters K0 (~200). In the first step, we aim at segmenting the domain into a large number of clusters where a higher weight is given to dissimilarity among attributes compared to spatial proximity. From the kmeans algorithm, we obtain the centroids of K0 clusters as Q0. 3. Create the data set for the second step as D2 = [w1Q0(:,1) w1Q0(:,2) Q0(:,3:d+2)], where w1 is a weighting factor, chosen to be greater than 1 (e.g. w1 = 10). 4. Implement k-means on D2 with a small number of clusters K1 (~15). In this step, the aim is to merge the clusters previously formed into a smaller number of clusters. The overall weight of spatial coordinates at this step is w0w1. The domain has already been segmented into smaller clusters (K0) where each cluster represents segments of the domain with homogenous attribute values. These clusters are now merged into K1 clusters where both spatial proximity and similarity between attributes are considered. It is worth noting that the Helena fertility trial data has undergone previous analysis to determine the management zones needed for the optimal nitrogen prescription through the approach of design of experiments (DOE) described briefly in Section VIII. As part of the next section, we compare the results of the two-step k-means approach with the results from the DOE analysis. VI. RESULTS AND DISCUSSION The algorithm described involves four different parameters w0, w1, K0 and K1. Generally, delineating management zones involves maximizing homogeneity of every zone in terms of attribute values and contiguity. These two objectives may be978-1-4577-1591-4/11/$26.00 ©2011 IEEE of conflicting nature as seen in Fig. 1 when similar attributes are scattered all across the domain. We ran simulations with the following values of the parameters and identified the one that gave the smallest objective O(K1) involving only the yield features (Eqn. 2 with [Y1 Y2] and K = K1): K0 = 200, K1 = (2:2:20)’, w0 = [0.05 0.1 0.25 0.5 0.75 1 2]’, w1 = [0.5 1 1.5 2 3 4 5 10]’. We found that K1 = 18, w0 = 0.1 and w1 = 10 gave the minimum O(K1). Intuitively, as the weights are increased the importance of spatial proximity increases and for high weights the management zones are more contiguous but contain higher attribute variance within cluster. Hence, there is a tradeoff between contiguity and homogeneity of management zones. The number of management zones (K1) is an important parameter. A higher K1 will lead to a lower within zone variance but may be impractical from VRT perspective. The issue here is at least twofold. First, the problem is if the VRT controller has the precision to deliver the required inputs to have a useful distinction for a higher number of zones. Second, as indicated in Fig. 1 and 3, the edges of contrast among zones is quite irregular and sometimes includes a gradient of interspersion. Fig. 2 plots the variation in the objective O(K1) and its derivative with the number of clusters for the two sets of attributes. We use the following parameters for the simulations: K0 = 200, w0 = 0.1, w1 = 10. After K1 = 15, marginal decreases were observed in the objective function O(K1) and hence K1 = 15 is a good choice for number of management zones. Figure 1. Delineation of management zones through k-means considering only attribute variables with the number of clusters = 15; (a) yield (Y1 and Y2) (b) fixed independent variables. The axes are the normalized (x,y) cooridinate values derived from the original Universal Transverse Mercator Easting and Northing coordinate values (m). Figure 2. Variation of the objective (a) and its derivative (b) with the final number of clusters (K1). The objective considered here is the sum of the square of the distance of every feature from its centroid feature and was evaluated for both sets of attributes as descibed Section V. Additional simulations where ran, setting w0 = 0.5, w1 = 4, K0 = 200 and K1 = 15, to identify the different management zones by our two-step clustering algorithm (Fig. 3). Compared to clustering without considering the spatial information (Fig. 1) the two-step algorithm delineates more contiguous management zones. With these conditions, the ratio of O(K1) to O(1) (where O(1) represents the total variability of attributes across the domain) was O(K1)/O(1) = 0.5462 for the yield attributes and was O(K1)/O(1) = 0.0807 for the topography attributes. Hence, there is a significant reduction in variability for both types of attributes when grouped into management zones compared to the overall variability in the domain. A comparison of the DOE management zone predictions to the k-means approach shows similarities and differences. Both the single step cluster method and the DOE approaches depict more variability in the upper halves (northeast) of each east-west field portion compared to more homogeneity and greater areal extent of the zones derived for the lower halves (southwest) (Fig. 1 and 4). The single step cluster method, using the two yield measures, also recovered information on the effects of the nitrogen rates applied to the strip plots (Fig. 1a).978-1-4577-1591-4/11/$26.00 ©2011 IEEE Figure 3. Fifteen management zones delineated through the two-step clustering algorithm for (a) yield attributes and (b) fixed independent attributes. On the other hand, the two-step cluster method clearly shows more equitable apportionment of the field and removal of the nitrogen treatment effects (Fig. 3a). The DOE approach is somewhat similar, since once the relationship between the amount of nitrogen applied and the topographic characteristics was modeled by Eqn. 2, refinements to this covariance model could be derived to predict what amount of nitrogen should be applied at which location with or without irrigation (Fig. 4). The next reasonable step is to consider implementation of the clusters mapped by the two-step approach to simply, with the DOE methodology, the number of terms that involve the topographical characteristics, Xgijklmn. This type of investigation could lead to innovative solutions for better automation of data processing techniques to handle the copious quantities of geographical information involved with PA. VII. CONCLUSION AND FUTURE WORK While the machine learning approach outlined in this paper successfully delineates management zones for cotton, an important issue continues to exist. It involves introduction of VRT constraints (rate of change of the application) directly into the machine-learning algorithm. Such a change would increase the performance of machine learning to match real world operational conditions. In addition, farming equipment travels in parallel paths following the directions of the crop row. These application paths are spaced at a distance that relates to the size of the sprayer boom (or tool bar width) of the application equipment. Thus, the geographical intersections of the characteristics of VRT equipment and the edges of the zone are spatially complex and are not symmetrical. Therefore, how much of a beneficial increase in yield is required to recover the costs involved with optimal delineation of management zones versus the cost and operational characteristics of the VRT equipment are a topic for more research. Figure 4. Management zones derived from a mixed, analysis of covariance model which indicates different management for rates of nitrogen with irrigation (a) or without irrigation (b). Finally, the concept of using machine learning for identifying management zones does not just apply to yield. For an agricultural field, other phenomena tend to cluster. A good example is cool air during the fall season for harvesting grapes in the Northern part of the US. The cool air tends to cluster in low-lying areas causing fruit to freeze [24]. Identifying these areas through temperature sensing technology and machine learning algorithms represents another new application for the research in this paper. VIII. APPENDIX This data set has been previously analyzed using methods equivalent to [25] and [26]. Generally, the intersecting geometries of the various topography zones and farm equipment characteristics defined the design structure of the original experiment. The blanket or site-specific management (SSM) practices applied to the field by the producer/researcher described the treatment structure. These design and treatment structures were put together to build a general, linear, mixed analysis of covariance model for an analysis. For the Helena978-1-4577-1591-4/11/$26.00 ©2011 IEEE fertility study, the statistical model that describes seed cotton yield monitor data as functions of equipment geometry, several nitrogen rates (applied as either 2 types of blanket or 3 types of SSM tactics), several site characteristics (or topography variables, Xgijklmn), and the randomized, complete block design structure imposed in the experiment is: yijklmn = µ + φirijkl+φr2ijkl + 1 G g= ∑ βgiXgijklmn + BLKk + BLKk×TRT_IDik + H_group(BLK TRT_ID)m(ik) + εijklmn (2) where yijklmn is the yield value of the nth yield monitor reading (or site) in the mth harvest group (H_group, or paired harvest passes within each strip plot) within the lth asymmetrical experimental unit (EU) of the kth block (or block_id) in the jth strip plot (labeled by a plot_id) assigned the ith nitrogen treatment tactic. (The EUs are the spatial intersections of the strip plot boundaries and fertility management zones derived from the ECa data.) The variable r is a regressor, not a class, variable representing the nitrogen rates (r = 60, 75, 90, 105, 120, 135, and 150 lbs/ac) applied to all (or various) locations within the strip plots according to the treatment (trt_id) assigned to them. The term µ is the mean seed cotton yield of the field and error eijklmn is the effect of the nth yield monitor observation (the harvesting unit or yield point) within the mth harvest group within the lth EU in the kth block in the jth strip plot for the ith treatment. Other terms are BLKk, or the effect of the kth block, the BLKk×TRT_IDik interaction, and the harvest group nested within (BLK TRT)m(ik). Keep in mind that the BLKk×TRT_IDik term is nested within block_id and is an alias for the plot_id . REFERENCES [1] J. Rogers, “The big interview,” The Wall Street Journal, June 8, 2011. [2] Agriculture and Food Research Initiative, National Institute for Food and Agriculture, United States Department of Agriculture, March 2010. [3] E.W. Schuster, H-G Lee, R. Ehsani, S.J. Allen, and J.S. Rogers, “Machine-to-machine communication for agricultural systems: an xml – based auxiliary language to enhance semantic interoperability,” Compu. & Elect. in Agr., vol. 78, pp. 150-161, 2011. [4] B.N. Whelan and A.B. McBrantney, “The ‘null’ hypothesis’ of precision agriculture management,” Prec. Ag., vol. 2, pp. 265-279, 2000. [5] Ibid. [6] R. Boydell and A.B. McBrantney, “Identifying potential within-field management zones from cotton-yield estimates,” Prec. Ag., vol. 3, pp. 9-23, 2002. [7] R.G.V. Bronson, J.W. Keeling, J.D. Booker, T.T. Chua, T.A. Wheeler, R.K. Boman, and R.J. Lascano, “Influence of landscape position, soil series, and phosphorus fertilizer on cotton lint yield,” Agron. J., vol. 95, pp 949-957, 2003. [8] D.L. Corwin, S.M. Lesch, P.J. Shouse, R. Soppe, and J.E. Ayars, “Identifying soil properties that influence cotton yield using soil sampling directed by apparent soil electrical conductivity,” Agron. J., vol. 9, pp. 352-364, 2003. [9] J.J. Fridgen, J.J. Kitchen, N.R. Sudduth, S.T. Drummond, W.J. Weibold, and C.W. Fraisse, “Management zone analyst (MZA): software for subfield management zone delineation,” Agron. J., vol. 96, pp. 100-108, 2004. [10] J.L. Willers, J.N. Jenkins, W.L. Lander, P.D. Gerard, D.L. Boykin, K.B. Hood, P.L. McKibben, S.A. Samson, and M.M. Bethel, “Site-specific approaches to cotton insect control, sampling and remote sensing techniques,” Prec. Ag., vol. 6, pp. 431-452, 2005. [11] J.L. Ping, and A. Doberman, “Creating spatially contiguous yield classes for site-specific management,” Agron. J., vol. 95, pp. 1121-1131, 2003. [12] N. Martin, G Bollero, N.R. Kitchen, A.N. Kravchenko, K. Sudduth, W.J. Wiebold, and D. Bullock, “Two classification methods for developing and interpreting productivity zones using site properties,” Plt. Soil, vol. 288, pp. 357-371, 2006. [13] X. Song, J. Wang, W. Huang, L. Liu, G. Yan, and R. Pu, “The deliniation of agricultural management zones with high resolution remotely sensed data,” Prec. Ag., vol. 10, pp. 471-487, 2009. [14] X. Li, Y. Pan, C. Zhang, L. Liu, and J. Wang, “A new algorithm on delineation of management zone,” In: Geosci. and Remote Sens. Symp. IEEE, pp. 546-549, 2005. [15] K.A. Suddeth, N.R. Kitchen, and S.T. Drummond. “Soil conductivity sensing on claypan soils: comparison of electromagnetic induction and direct methods,” In: P. C. Robert et al. (ed) Proceedings 4th International Conference on Precision Agriculture, pp. 971-990, ASA Misc. Publ., ASA, CSSA, and SSSA, Madison, WI, 1999. [16] Milliken, and D.E. Johnson, “Analysis of Messy Data. Designed Experiments,” vol. 1, 2nd Edition, Chapman and Hall/CRC, New York, 2002. [17] R. Campenella, “Testing components toward a remote-sensing-based decision support system for cotton production,” Photogram. Eng. & Rem. Sens., vol. 66, pp. 1219–1227, 2000. [18] R.L. Clark, and R. Lee. 1998. “Development of topographic maps for precision farming with kinematic GPS,” Trans. ASAE, vol. 41 (4), pp. 909-916. [19] J.L. Willers, G.A. Milliken, C.G. O’Hara, and J.N. Jenkins, “Information technologies and the design and analysis of site-specific experiments within commercial cotton fields,” In: Proc. Applied Stat. Agric., Kansas State Univ., Manhattan KS, pp. 41-73, 2004. [20] J.L. Ping, and A. Dobermann, “Processing of yield map data,” Prec. Agr., vol. 6, pp. 193–212, 2005 [21] P.J. Pinter, J.L. Hatfield, J.S. Scheppers, E.M. Barnes, M.S. Moran, C.S.T. Daughtry, and D.R. Upchurch, “Remote sensing for crop management,” Photogrammetric Engineering & Remote Sensing, vol. 69, pp. 647–664, 2003. [22] D.L. Corwin, and S.M. Lesch, “Application of soil electrical conductivity to precision agriculture: theory, principles, and guidelines,” Agron. J., vol. 95, pp. 455–471, 2003 [23] Ibid, reference 17. [24] S.J. Allen and E. W. Schuster, “Controlling the risk for an agricultural harvest,” Manu. & Serv. Oper. Man., vol. 6, pp. 225-236, 2004. [25] J.L. Willers, G.A. Milliken, C.G. O’Hara, and J.N. Jenkins. “Information technologies and the design and analysis of site-specific experiments within commercial cotton fields,” In: Proc. Applied Stat. Agric., Kansas State Univ., Manhattan KS, pp. 41-73, 2004. [26] G. Milliken, J. Willers, K. McCarter, and J. Jenkins, “Designing experiments to evaluate the effectiveness of precision agricultural practices on research fields: Part 1, Concepts for their formulation,” Oper. Res. Int. J., vol. 10, pp. 329-348, 2010.