% %%%%%%%%%%%%%%%%%% % Data-Description % % %%%%%%%%%%%%%%%%%% % % COIL 1999 Competition Data % % Data Type % % multivariate % % Abstract % % This data set is from the 1999 Computational Intelligence and Learning % (COIL) competition. The data contains measurements of river chemical % concentrations and algae densities. % % Sources % % Original Owner % % [1]ERUDIT % European Network for Fuzzy Logic and Uncertainty Modelling % in Information Technology % % Donor % % Jens Strackeljan % Technical University Clausthal % Institute of Applied Mechanics % Graupenstr. 3, 38678 Clausthal-Zellerfeld, Germany % [2]tmjs@itm.tu-clausthal.de % % Date Donated: September 9, 1999 % % Data Characteristics % % This data comes from a water quality study where samples were taken % from sites on different European rivers of a period of approximately % one year. These samples were analyzed for various chemical substances % including: nitrogen in the form of nitrates, nitrites and ammonia, % phosphate, pH, oxygen, chloride. In parallel, algae samples were % collected to determine the algae population distributions. % % Other Relevant Information % % The competition involved the prediction of algal frequency % distributions on the basis of the measured concentrations of the % chemical substances and the global information concerning the season % when the sample was taken, the river size and its flow velocity. The % competition [3]instructions contain additional information on the % prediction task. % % Data Format % % There are a total of 340 examples each containing 17 values. The first % 11 values of each data set are the season, the river size, the fluid % velocity and 8 chemical concentrations which should be relevant for % the algae population distribution. The last 8 values of each example % are the distribution of different kinds of algae. These 8 kinds are % only a very small part of the whole community, but for the competition % we limited the number to 7. The value 0.0 means that the frequency is % very low. The data set also contains some empty fields which are % labeled with the string XXXXX. % % The training data are saved in the file: analysis.data (ASCII format). % % Table 1: Structure of the file analysis.data % % A % % % K % % a % % % g % % CC[1,1] % % % CC[1,11] % % AG[1,1] % % % AG[1,7] % % CC[200,1] % % % CC[200,11] % % AG[200,1] % % % AG[200,7] % % Explanation: % CC[i,j]: Chemical concentration or river characteristic % AG[i,j]: Algal frequency % % The chemical parameters are labeled as A, ..., K. The columns of the % algaes are labeled as a, ..,g. % % Past Usage % % [4]The Third (1999) International COIL Competition Home Page % _________________________________________________________________ % % % [5]The UCI KDD Archive % [6]Information and Computer Science % [7]University of California, Irvine % Irvine, CA 92697-3425 % % Last modified: October 13, 1999 % % References % % 1. http://www.erudit.de/ % 2. mailto:tmjs@itm.tu-clausthal.de % 3. file://localhost/research/ml/datasets/uci/raw/data/ucikdd/coil/instructions.txt % 4. http://www.erudit.de/erudit/activities/ic-99/index.htm % 5. http://kdd.ics.uci.edu/ % 6. http://www.ics.uci.edu/ % 7. http://www.uci.edu/ % % %%%%%%%%%%%%%%%%%% % Task-Description % % %%%%%%%%%%%%%%%%%% % % % Third International Competition % % Protecting rivers and streams by monitoring chemical concentrations and % algae communities. % % % Intelligent Techniques for Monitoring Water Quality using chemical % indicators and algae population % % Recent years have been characterised by increasing concern at the % impact man is having on the environment. % The impact on the environment of toxic waste, from a wide variety % of manufacturing processes, is well known. More recently, however, % it has become clear that the more subtle effects of nutrient level % and chemical balance changes arising from farming land run-off and % sewage water treatment also have a serious, but indirect, effect on % the states of rivers, lakes and even the sea. In temperate climates % across the world summers are characterized by numerous reports excessive % summer algae growth resulting in poor water clarity, mass deaths of % river fish from reduced oxygen levels and the closure of recreational % water facilities on account of the toxic effects of this annual algal bloom. % Reducing the impact of these man-made changes in river nutrient levels % has stimulated much biological research with the aim of identifying % the crucial chemical control variables for the biological % processes. % % The data used in this problem comes from one such study. % During the research study water quality samples were % taken from sites on different European rivers of a period of % approximately one year. These samples were analyzed for various % chemical substances including: nitrogen in the form of nitrates, % nitrites and ammonia, phosphate, pH, oxygen, chloride. % In parallel, algae samples were collected to determine the algae population % distributions. It is well known that the dynamics of the % algae community is determined by external chemical % environment with one or more factors being predominant. % While the chemical analysis is cheap and easily % automated, the biological part involves microscopic examination, % requires trained manpower and is therefore both % expensive and slow. % % Diatoms like Cymbella are major contributors to primary production % throughout the world. The diatom reacts with % large sensitivity to even small changes in acidity . % % Over a three and half billion year history algae have evolved and % adapted as primary plant colonizers of almost % every known habitant in terrestrial and aquatic environments. % They respond very rapidly to man-made environment changes. % % % % The relationship between the chemical and biological features is % complex and can be expected to need the application of advanced % techniques. Typical of such real-life problems, the particular % data set for the problem contains a mixture of (fuzzy) qualiative % variables and numerical measurement values, with much of the data % being incomplete. % % The competition task is the prediction of algal frequency distributions % on the basis of the measured concentrations of the chemical % substances and the global information concerning the season when the sample % was taken, the river size and its flow velocity. The two last variables % are given as linguistic variables. % % 340 data sets were taken and each contain 17 values. The % first 11 values of each data set are the season, the river % size, the fluid velocity and 8 chemical concentrations which % should be relevant for the algae population distribution. % The last 8 values of each data set are the distribution of % different kinds of algae. These 8 kinds are only a very small % part of the whole community, but for the competition we limited % the number to 7. The value 0.0 means that the frequency is very low. % The data set also contains some empty fields which are labeled % with the string XXXXX. % % Each participant in the competition receives 200 complete data sets % (training data) and 140 data sets (evaluation data) containing only % the 11 values of the river descriptions and the chemical concentrations. % % This training data is to be used in obtainin % a 'model' providing a prediction of the algal distributions associated % with the evaluation data. % % % % The training data are saved in the file: % % analysis.txt (ASCII format). % % Structure of the file analysis.txt % % A K a g % CC1,1 ... CC1,11 AG1,1 ... AG1,7 % .... ... ... ... % % % CC200,1 ... CC200,11 AG240,1 ... AG240,7 % % % Explanation: % CCi,j: Chemical concentration j=1,..11 % AGi,k: Algal frequency k=1...7 % % % The chemical parameters are labeled as A, ..., K. % The columns of the algaes are labeled as a, ..,g. % % % Evaluation data are saved in file eval.txt (ASCII format). % % % Table 2: Structure of the file eval.* % A K % CC1,1 ... CC1,11 % % ..... ... % % CC140,1 ... CC140,11 % % _____________________________________________________________ % % Objective % % The objective of the competition is to provide a prediction % model on basis of the training data. Having obtained this % prediction model, each participant must provide the solution % in the form of the results of applying this model to the % evaluation data. The results obtained in this way should % correspond to the results of the evaluation data % (which are known to the organizer). The criteria used to evaluate % the results is given below. % All 7 Algae frequency distributions must be determined. % For this purpose any number of partial models may be developed. % % _____________________________________________________________ % % Judgment of the results % % To judge the results, the sum of squared errors will be calculated. % The following Table describes the results of a particular participant. % % Matrix of results % a g % % Res1,1 ... Res1,7 % % .... ... % % Res140,1 Res140,7 % % % All solutions that lead to a smallest total error will % be regarded as winner of the contest. % % % % Information about the dataset % CLASSTYPE: numeric % CLASSINDEX: last % % ALGAE #: 2/7 @relation coil-test-2 @attribute season {autumn,spring,summer,winter} @attribute river_size {large_,medium,small_} @attribute fluid_velocity {high__,low___,medium} @attribute concentration_1 numeric @attribute concentration_2 numeric @attribute concentration_3 numeric @attribute concentration_4 numeric @attribute concentration_5 numeric @attribute concentration_6 numeric @attribute concentration_7 numeric @attribute concentration_8 numeric @attribute algae_2 numeric @data summer,small_,medium,7.95,5.7,57.333,2.46,273.33301,295.66699,380.0,nan,36.5 winter,small_,medium,7.98,8.8,59.333,7.392,286.66699,33.333,138.0,7.1,0.0 summer,small_,medium,8.0,7.2,80.0,1.957,174.286,47.857,113.714,4.5,23.0 spring,small_,high__,8.35,8.4,68.0,3.026,458.0,45.2,111.8,3.2,38.2 spring,small_,medium,8.1,13.2,19.0,0.0,130.0,6.0,40.0,2.0,55.4 summer,small_,medium,8.37,12.1,12.85,0.84,15.0,5.0,10.507,13.8,2.4 spring,small_,high__,7.31,9.9,6.0,1.395,58.75,6.0,16.0,0.8,1.7 autumn,small_,high__,7.91,11.2,5.0,1.383,6.0,24.333,30.0,32.0,2.0 summer,small_,high__,7.99,10.7,4.0,1.368,117.0,17.25,44.75,0.8,1.7 autumn,small_,high__,7.82,11.5,8.18,1.488,39.0,16.0,139.5,0.4,0.0 summer,small_,high__,6.6,10.8,4.0,1.18,80.0,2.0,59.0,0.6,0.0 autumn,small_,high__,6.79,9.4,11.42,1.966,42.0,3.0,15.0,0.6,0.0 summer,small_,high__,6.78,10.2,10.704,1.46,46.0,3.0,13.714,0.7,0.0 summer,small_,high__,7.8,10.8,14.568,1.228,61.25,34.5,62.0,1.1,0.0 spring,small_,high__,8.3,12.7,27.0,4.04,10.0,363.0,482.0,6.0,2.7 autumn,small_,high__,8.2,11.3,6.0,1.56,10.0,2.0,5.0,nan,0.0 summer,small_,high__,8.2,10.4,3.577,0.788,10.583,1.667,2.088,0.8,0.0 autumn,small_,medium,8.1,6.4,21.2,3.222,44.0,54.8,155.0,61.52,14.7 summer,small_,medium,8.54,12.83,22.545,4.0,170.5,68.0,116.069,41.6,5.0 autumn,small_,medium,8.5,7.8,71.0,11.02,500.0,121.0,nan,7.1,0.0 spring,small_,medium,7.7,6.8,65.0,1.833,782.5,77.25,340.0,9.0,5.1 autumn,small_,high__,8.4,10.5,50.6,10.494,334.0,209.10001,276.66699,20.72,3.0 summer,small_,high__,8.5,11.5,57.292,10.526,312.60001,261.39999,299.39999,23.5,1.5 summer,small_,high__,8.1,12.2,66.0,4.08,10.0,26.0,70.0,1.8,0.0 autumn,small_,low___,6.13,11.23,8.87,0.62,36.0,3.0,14.741,2.1,0.0 winter,small_,medium,nan,12.1,18.0,3.14,10.0,21.0,41.0,4.8,0.0 summer,small_,medium,7.2,10.4,18.0,2.42,80.0,11.0,44.0,2.5,1.1 spring,small_,medium,7.9,8.6,27.65,2.063,62.5,7.75,30.0,nan,0.0 autumn,small_,medium,7.8,9.1,36.124,5.974,169.0,13.091,71.057,3.3,0.0 summer,small_,high__,7.8,9.4,5.714,0.807,22.143,6.0,18.714,1.5,0.0 autumn,small_,high__,7.8,11.35,5.343,1.363,19.75,5.818,8.846,1.9,0.0 autumn,small_,high__,5.9,11.9,nan,1.88,5.0,1.0,2.0,nan,0.0 winter,small_,high__,6.8,9.1,nan,0.78,10.0,1.0,14.0,nan,0.0 summer,small_,medium,6.6,8.8,nan,0.95,20.0,1.0,7.0,nan,0.0 winter,small_,high__,6.6,11.8,nan,2.21,10.0,1.0,4.0,nan,0.0 winter,small_,medium,6.9,9.2,nan,2.21,10.0,2.0,13.0,nan,0.0 winter,small_,high__,7.66,10.8,4.0,0.997,15.0,1.5,7.333,1.0,0.0 autumn,small_,high__,7.6,10.5,3.05,1.002,13.333,1.667,10.833,nan,0.0 winter,medium,medium,8.0,7.0,37.091,2.237,146.364,84.091,172.778,2.3,19.4 spring,medium,medium,8.2,7.8,37.625,1.453,105.714,66.714,143.39999,2.6,5.1 autumn,medium,medium,8.2,10.7,134.66701,4.504,617.77802,49.444,164.778,19.2,23.4 summer,medium,medium,8.0,8.5,131.46899,3.454,792.0,63.1,286.60001,8.2,4.0 autumn,medium,high__,8.9,10.5,34.8,6.0,122.556,41.111,144.11099,27.03,1.0 summer,medium,high__,8.2,9.2,30.037,5.184,174.8,86.6,130.8,3.45,12.3 summer,medium,medium,7.8,8.8,29.078,2.823,263.556,27.0,95.12,11.5,6.9 summer,medium,high__,7.5,10.8,10.357,3.35,127.667,22.0,34.321,1.2,0.0 spring,medium,high__,7.4,9.0,13.75,5.268,58.75,56.25,64.0,2.5,0.0 spring,medium,medium,7.5,8.9,55.8,4.408,389.0,127.4,206.2,5.0,0.0 autumn,medium,medium,9.1,8.0,101.2,4.306,273.75,152.875,290.31299,10.7,1.0 autumn,medium,high__,8.9,8.0,60.2,4.033,306.47101,136.0,242.94099,18.4,7.7 summer,medium,low___,8.5,10.74,56.292,0.694,264.79999,43.4,124.942,30.48,4.1 autumn,medium,high__,8.3,8.6,75.0,5.18,560.0,30.5,170.0,16.7,1.5 spring,medium,high__,7.8,6.3,136.66701,3.734,154.444,35.556,175.33299,2.7,3.1 autumn,medium,medium,7.6,9.2,64.778,6.164,720.0,21.778,242.5,54.2,0.0 summer,medium,medium,7.5,9.2,61.557,7.035,558.33301,24.5,257.33301,19.5,0.0 autumn,medium,low___,7.5,8.6,57.5,7.368,577.0,67.3,254.444,22.0,0.0 spring,medium,low___,7.7,4.8,88.909,1.714,669.091,38.182,205.18201,2.8,1.5 spring,medium,low___,7.9,7.2,55.25,2.235,89.375,17.5,141.5,17.0,31.9 spring,medium,high__,8.06,2.2,39.0,2.085,773.125,90.75,163.25,26.0,12.1 autumn,medium,high__,8.5,7.5,9.3,1.557,260.0,9.6,18.1,3.9,1.2 autumn,medium,medium,8.2,10.4,63.3,0.389,217.14301,24.333,114.0,2.7,1.0 winter,medium,medium,8.0,4.8,58.767,0.308,93.75,33.375,110.875,2.7,15.1 autumn,medium,high__,8.7,10.8,1.118,0.534,26.364,14.818,20.9,1.4,0.0 spring,medium,high__,8.4,11.2,0.5,0.32,10.0,21.6,27.6,0.6,1.2 winter,medium,low___,8.5,8.3,36.583,5.632,440.83301,149.0,266.36401,19.827,3.2 autumn,medium,low___,8.3,8.8,64.768,6.272,357.16699,219.0,302.5,8.267,5.6 autumn,medium,medium,8.4,10.8,47.304,7.773,258.909,145.091,223.04401,13.36,8.6 autumn,medium,high__,7.9,11.9,11.862,2.209,128.636,48.091,69.079,2.755,0.0 autumn,medium,medium,9.13,12.0,30.496,4.971,99.6,64.6,146.265,54.13,25.2 autumn,medium,high__,7.4,11.4,12.031,1.621,176.8,36.3,58.599,36.1,0.0 summer,medium,medium,8.3,8.9,271.5,6.315,375.0,169.0,313.5,2.8,4.5 winter,medium,medium,8.2,10.4,41.0,5.16,410.0,38.0,61.0,6.0,15.9 summer,medium,medium,8.2,11.2,36.0,4.4,32.5,108.0,155.5,3.0,13.9 spring,medium,low___,8.17,6.3,37.3,0.527,82.0,62.0,133.10001,1.4,41.8 autumn,medium,low___,8.33,10.6,36.156,1.137,119.444,92.889,112.855,10.5,1.2 spring,medium,medium,8.5,6.7,45.609,4.411,160.0,88.364,180.364,32.833,22.9 autumn,medium,medium,8.1,9.1,47.267,9.367,169.091,75.0,127.778,3.667,1.0 winter,medium,high__,8.2,11.9,12.25,2.348,121.875,14.0,27.5,4.6,3.0 summer,medium,high__,8.1,9.4,11.0,2.251,48.75,17.375,66.875,2.5,0.0 summer,medium,low___,7.8,7.9,87.0,12.13,652.5,93.25,209.0,6.0,10.1 spring,medium,high__,8.26,5.0,44.818,0.526,97.273,105.455,181.636,20.6,10.3 summer,medium,high__,8.11,6.6,49.857,0.993,194.28,77.0,197.571,13.0,26.5 summer,medium,high__,7.87,1.8,49.25,0.611,357.125,128.25,185.125,4.5,8.8 winter,medium,high__,7.2,10.1,49.5,3.955,55.0,18.0,138.0,49.0,0.0 spring,medium,high__,7.8,8.3,51.5,2.098,30.2,24.6,184.39999,31.3,3.9 winter,medium,medium,7.9,11.3,82.5,6.283,300.0,12.333,53.333,13.7,0.0 summer,medium,medium,8.0,8.8,176.25,0.618,440.0,16.25,79.25,3.5,13.8 winter,large_,low___,7.7,9.3,66.0,3.56,310.0,37.0,nan,17.35,24.5 winter,large_,low___,8.7,5.4,48.0,1.139,144.286,36.714,66.833,22.017,12.0 summer,large_,low___,7.9,5.3,48.0,0.513,138.33299,61.333,89.167,4.0,9.8 autumn,large_,low___,8.7,12.2,32.23,1.887,233.5,17.5,66.167,39.333,7.0 winter,large_,low___,8.6,6.5,43.0,0.668,95.0,10.5,74.667,63.5,9.6 spring,large_,high__,7.4,7.3,19.0,4.39,120.0,74.857,166.286,5.3,0.0 summer,large_,high__,7.8,10.4,22.5,4.72,178.75,116.5,201.0,2.7,0.0 winter,large_,low___,8.5,9.8,70.25,1.644,285.0,68.714,132.0,16.028,4.6 spring,large_,low___,7.98,5.6,47.06,3.088,357.0,311.39999,342.29999,18.53,5.2 summer,large_,low___,7.95,7.2,57.286,3.746,425.71399,291.14301,330.0,4.714,6.2 spring,large_,medium,7.96,5.5,131.364,3.313,810.90002,311.45499,349.81799,20.47,8.3 winter,large_,medium,8.35,2.75,97.733,3.681,137.444,91.0,155.556,2.744,3.4 summer,large_,medium,8.15,10.4,189.567,5.011,162.944,135.778,219.278,2.859,19.2 winter,large_,high__,8.5,10.1,3.0,0.851,37.778,10.778,23.889,0.5,3.6 winter,large_,medium,8.5,11.4,3.0,0.774,10.909,3.727,8.091,3.6,12.4 spring,large_,medium,8.5,8.5,4.025,0.825,23.636,5.583,31.091,2.4,3.2 autumn,large_,medium,8.4,11.43,4.966,0.969,24.111,6.0,18.167,2.133,1.4 spring,large_,high__,8.2,9.9,6.4,0.553,21.429,12.0,76.286,1.3,2.1 autumn,large_,high__,8.0,10.98,9.7,0.874,67.7,26.6,51.034,2.2,0.0 autumn,large_,low___,8.3,8.9,42.058,5.922,116.727,150.58299,220.72301,6.7,45.4 spring,large_,low___,8.7,6.8,16.889,2.139,30.0,37.111,85.444,23.033,12.0 winter,large_,medium,8.6,10.4,15.182,2.502,140.909,31.909,77.7,15.318,0.0 summer,large_,medium,8.0,9.1,15.375,2.118,43.75,48.875,86.5,8.125,0.0 summer,large_,medium,8.2,9.5,17.875,2.363,63.75,44.0,77.0,8.463,29.5 spring,large_,medium,8.5,9.6,16.545,3.849,103.273,34.273,63.4,14.682,17.1 spring,large_,medium,8.04,9.3,130.263,3.776,131.008,97.5,152.966,6.15,8.7 autumn,large_,medium,7.95,9.1,76.886,3.461,93.827,68.333,146.049,3.95,31.2 summer,small_,high__,7.25,9.54,nan,0.642,85.0,14.6,19.45,0.46,0.0 autumn,small_,high__,7.64,10.3,34.235,2.942,41.43,17.0,41.567,7.43,0.0 winter,small_,high__,7.92,8.5,10.867,1.715,199.54,3.222,27.2,1.9,0.0 spring,small_,high__,7.62,9.4,11.055,1.51,13.56,4.0,12.65,1.456,8.6 summer,medium,high__,7.75,10.7,15.5,3.976,57.64,10.5,43.169,3.12,0.0 winter,small_,high__,7.08,8.4,9.45,1.572,26.54,4.0,13.6,0.675,0.0 summer,small_,high__,6.92,11.1,9.1,0.63,21.0,5.0,nan,2.46,1.9 winter,small_,high__,8.1,9.8,14.34,0.73,22.5,23.0,45.5,0.85,0.0 spring,small_,high__,7.2,11.3,8.97,0.23,134.5,13.0,19.0,nan,1.0 spring,large_,medium,8.61,10.1,3.518,0.663,12.22,3.222,7.0,1.3,2.0 summer,large_,medium,8.22,9.5,2.3,0.672,9.87,4.0,6.123,0.8,3.8 winter,large_,medium,8.53,10.5,3.0,0.758,10.35,4.1,nan,4.0,5.0 summer,large_,medium,8.4,10.0,3.51,0.866,29.65,5.8,15.0,2.86,6.7 winter,large_,high__,8.1,10.9,9.056,0.825,41.0,20.0,58.0,nan,19.6 summer,medium,high__,8.12,10.2,7.613,0.699,33.56,28.034,49.658,2.2,1.7 winter,large_,low___,8.43,10.8,35.642,6.225,134.0,103.5,nan,45.375,3.9 winter,large_,low___,8.7,11.7,21.4656,3.765,91.45,38.0,83.0,17.0,4.7 summer,large_,low___,8.1,8.2,26.54,2.805,42.75,48.5,88.125,13.98,12.0 autumn,large_,low___,8.35,11.1,22.56,3.14,76.2,41.0,98.665,17.456,7.0