Appendix 1¾Syllabus                        ©2003, D.F. Parkhurst

“Most real-life statistical problems have one or more nonstandard features.  There are no routine statistical questions; only questionable statistical routines.”  – D.R. Cox

"… much of what statistics teaches seems surprising, counterintuitive, and even unscientific."
– R. Royall

 

E538¾Statistics for Environmental Science¾Spring 2001

School of Public and Environmental Affairs¾Indiana University

 

Professor: David Parkhurst                                                    Section 8442

Office: SPEA 355                                                                  9:30–10:45 a.m. MW, SPEA 273

Phone: 855-4556                                                                    1–3 p.m. F, SPEA 277

E-mail: parkhurs@indiana.edu

Office Hours: Monday 11:00-12:15a                                    Section 8444

                        Tuesday 1:30-2:30p                                        2:30–3:45 p.m. MW, SPEA 277

                        and by appointment other times                       9–11 a.m. F, SPEA 277

Associate instructor:  To be announced

Purpose of statistics

            The primary purpose of statistics is to allow us to use data to improve our knowledge, and, in applied situations, to use that knowledge effectively in making decisions about how to act.  As illustrated here (and as will be described further in class), that process requires knowledge both of the subject being studied, and of statistical and data-analysis methods.  The latter are the focus of this course.

 

Text Box: Data

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Course description:

The popular view of “statistics” seems to correspond with the title of the infamous book, How to Lie with Statistics (D. Huff. 1954. Penguin.),  but a worthy alternate goal for environmental scientists is to use statistical methods to help them learn what is true, and to act appropriately as a result.  We need tools for analyzing data, with their associated variability and uncertainty, to improve our understanding (basic science) and to guide our management decisions and actions (applied science).

To analyze data effectively, environmental scientists need to know some basic statistical “recipes,” but also need to know when those recipes should or should not be used, and how to interpret the results they produce.  Thus the concepts that underlie the recipes are important as well.  Beyond this, statistics is currently undergoing several major revolutions that demonstrate both progress in the field and real problems with the conventional statistical methods that most scientists now use.  In addition to conventional statistics, therefore, we will consider three of these revolutions at least briefly, i.e.:

·       Bayesian methods, which are based on so-called subjective or personal probabilities, and which often give answers much closer to what scientists really want to know, compared with most answers available from “standard statistics,”

·       Elimination of, or greatly reduced emphasis on, null hypothesis significance testing, which has many logical problems and tends to hide adverse environmental effects, and

·       Resampling methods, which reduce the need to make sometimes dubious assumptions about data.

We will also consider what these alternative views of data analysis teach us about the shortcom­ings of conventional methods.

Our overall approach should help you to be effective environmental scientists and managers, not only in the short run but also as your career develops in the longer term.

Course objectives:

The three major goals of this course are that you should:

1.     Learn how to separate the "signal" (or information, or pattern) from the "noise" (or random variation) in data, and then to use the information obtained to increase your knowledge and when appropriate, to help your decision making.

2.     Learn how to obtain data in efficient ways, to provide maximum information for the time and money spent.

3.     Learn how to recognize illogical uses and interpretations of statistical methods by others, to aid in advancing scientific knowledge and for effective environmental management.

I will emphasize conceptual understanding with the hopes that in the future:

·       You will know when standard statistical methods are being properly applied and when they are not.

·       You can modify and extend standard (and non-standard) methods to fit new situations.


For each major topic, I will try to follow a common pattern by discussing:

·       What the concept is, and why it is useful.

·       How to do any necessary calculations (sometimes by hand and sometimes with software).

·       How and why the concept works (simple theory).

·       Examples from environmental science, from the viewpoint of both basic and applied science.

Course format and resources

We deal with complex but important ideas in E538, and many students will have to invest considerable effort in the course.  We will help you as much as we can through:

·       Lectures. Most days, about an hour of the class will be in lecture mode. You should feel free to interrupt and ask questions if you “lose the thread.”

·       In-class problems. I will frequently pose problems for you to work, preferably in groups of two or three. We’ll discuss these after you’ve had some time to attack them.

·       Discussion. Occasionally, lecture time may be replaced with small group discussion, to help you grapple with important philosophical issues.

·       Textbooks and notes.  The Moore and McCabe book is an introductory but substantial text that is appropriate for a strong undergraduate or introductory graduate course in statistics.  I have chosen it because it gives a solid basis for the more advanced consideration we give to certain topics that practicing environmental scientists ought to know.  (Parts of this book will be review, but some of the material will be new for most students.  In either case, you can expect to see at least a few exam problems based on material from the book that is not repeated in lecture.)  Advanced topics are covered in lectures and in the supplementary lecture notes that you should pick up at a local copy shop¾listen for an announcement in lecture.  One such topic is Bayesian statistical methods;  the Iverson book provides a readable entry into that subject.

·       Labs. These Friday sessions allow time, among other things, for sampling exercises in which each student helps generate part of a large dataset to illustrate some critical point. In addition, the lab periods encourage you to work together on problems.  The teaching assistant will be there to provide examples and to help with the assigned problems.

·       Exercises. You will be provided with sets of exercises each week to show how the principles under study apply to environmental science and management. Many of these problems are taken from past exams, so they give you practice solving the kinds of problems you can expect to find on the exams.

Study hint:  We will provide detailed answers for some problems, but we strongly recom­mend that you not look at the details until you have worked the problem yourself. You will learn a thousand times more (that’s a rough estimate) by working a problem through yourself than by looking at a prepared solution and convincing yourself that you understand it.  Remember that in your future work you will have to solve problems yourself, not just understand someone else’s solutions, and the same will be true in exams.

If you help students with a problem, you can do them the most good not by working it for them, but rather by either (a) working some similar problem for them, or (b) finding out where they are “stuck,” and asking leading questions that will get them on the right track.  If you go to the effort to help others in these ways, you will likely learn a lot yourself in the process.

·       Exams.  As a student, I learned a great deal from the process of wrestling with ideas while taking exams, and I try to write questions that will give you that same benefit.  This means that most questions will test your knowledge of underlying concepts, and not just your ability to “spit back” standard recipes.  Questions will be similar in general form and difficulty to the assigned exercises and to those on past exams, but will not be identical to them with just the numbers changed.

·       E-mail.  Each day at the end of class, I will ask part or all of the class to write out a brief note about (a) what you found most interesting that day, (b) what you understood least that day, and (c) any specific questions you might have..  I try within a day or two to provide answers to questions, via e-mail that I send to everyone in the class.  PLEASE READ THESE MESSAGES—I view the information they provide as supplements to lecture, and will assume you have that information when I write exam questions.

·       Office hours:  Both Professor Parkhurst and the associate instructor hold regular office hours, and will be happy to help you when you can’t work a problem or don’t understand some concept.  See Dr. Parkhurst after class or send e-mail to make appointments with him for other times.

·       Homework:  Seven or eight homework assignments spread through the semester will give you practice in analyzing data and interpreting statistical results.

Grading:

Many exam and homework problems involve sequences of calculations and logical steps.  When we find an error in your work, we try to count off once for that error, and then to grade the rest of the problem starting from that point.  That is, we try not to let a small error at the start of the problem cause you to lose too many points overall. 

After each exam, I will give you the conversion from scores to grades, so you know right away the contribution of that exam to your course grade.  I decide on the conversions by the following sequence of reasoning.  First, I treat each score as a percentage of the points available, and assign grades on a scale where 90% = lowest A-, 80% = lowest B-, 70% = lowest C-, etc.  My exams can be challenging, and if that process leads to too many low grades, I will boost the distribution, usually with a conversion of the form  where a and b are appropriate constants. 

Texts:

·       D.S. Moore & G.P. McCabe. 1998. Introduction to the Practice of Statistics (3rd ed.). W.H. Freeman & Co.

·       Class notes.  To be purchased at a local copy shop.

·       G.R. Iverson. 1984. Bayesian Statistical Inference. Sage Publications. Newbury Park, CA.

·       You will need a pocket calculator with log x, ex, å, and regression functions.  Calculators with several internal memories help you avoid writing down intermediate results.  A single storage register is acceptable, however.

·       This may be useful for some:  Gonick, L. and Smith, W. 1993. The Cartoon Guide to Statistics. New York, Harper Perennial.  This is a well written book that might be helpful to you. You can probably find it in local bookstores¾I have not ordered copies for the course, however.

·       Please bring graph paper to every lab.  For consistency, I recommend paper with one-cm squares, and lighter one-mm squares.

Prerequisites:

This course has a prerequisite of an introductory course in statistics, and that background will be assumed.  It is possible you may have learned the necessary concepts in some science class, but if you haven’t specifically had a stats course, please see me right away.You should also have a firm grasp of freshman level algebra and of introductory calculus, including both derivatives and integrals.

We will encourage you to perform (or sometimes to check) your homework calculations using statistical software like SPSS (or perhaps other packages).

Grading:

The two mid-term exams will each contribute 28% to your course grade, and the final exam, which is comprehensive, 34%.  The homework will contribute the remaining 10%.

Course ethics:

Many of my colleagues share with me the frustration that a few students seem increasingly not to know the difference between intellectual honesty and dishonesty.  Cheating on tests is obvious enough, but issues involving out-of-class assignments (like the homework for this class) seem less clear.  These difficulties are increased by my hope that you will work together¾to a point¾but that you will turn in work that is your own.  Computers add another source of ambiguity.  The following discussion is intended to help clarify my expectations about the homework.  Please ask if you want further guidance.

Here is a statement (modified from one provided by Prof. D. Willard) that I use in my undergraduate classes:

ACADEMIC MISCONDUCT.  Academic dishonesty is not common, but is serious and intolerable.  Please familiarize yourself with the Student Handbook guidelines.  I assume that you all know and  understand what plagiarism and cheating are;  if you don't, find out.  The rules are simple.  Do your own work.  Don’t copy or even seem to copy from others.  Allowing someone else to use your work as if it were their own is as serious as using someone else's work without full written acknowledgment in whatever you turn in.  If you make legitimate use of work done by others, always document your sources. 

For E538 homework, I hope that you will work together in the following acceptable ways:

·       Verbally discuss the purposes of an assignment.  What is the point of an analysis like this?  Why would we want to know the results being asked for?  What assumptions are required for a particular kind of analysis?  Why are these methods useful?  In what ways is the exercise like something you have done in the past, or might have to do in the future?  What would the results mean if they came out in various ways?

·       You may also discuss the general ways of dealing with the types of calculation required for the exercise.  If you know how to do the calculation, and want to help a classmate who hasn't figured it out yet, I would be pleased if you did that.  Indeed, probably the best way to learn well how to do something is to show someone else how to do it¾ such cooperation should help both you and the person you are helping.  However, do this by making up a similar set of data and showing your classmate how to analyze the made-up data.  Then, leave them to analyze the real homework exercise themselves.

·       Similarly, if you are helping someone learn to perform statistical analyses (for example, in SPSS), show them the general ideas, but do not leave a computer file in a state such that they can enter the homework data into a ready-made template that you have prepared.

Here are some examples of what is NOT permissible:

·       Copying from another student's written answers, even if you make substantial changes to the wording.  Also, as noted above, it is just as unacceptable to provide your answers to another student as it is for them to use your answers.

·       Analyzing the data in a computer file, then allowing someone else to use or modify that file (or a copy of it) in any way.

·       Giving a copy of any computer file related to the homework assignment to any other student in the class. The one exception to this is that if two or more students are to analyze the same data, one person may type in the data and pass a file containing the raw data only to others.

If you are uncertain about what is fair and what is not, please err on the careful side, then ask for clarification later.

Policy on “Incompletes”:

The University policy on grades of "incomplete" includes the following statement:

CIRCUMSTANCES PERMITTING INCOMPLETES

The grade of Incomplete used on the final grade reports indicates that the work is satisfactory as of the end of the semester but has not been completed.  The grade of Incomplete may be given ONLY when the completed portion of a student's work in the course is of passing quality.  Instructors may award the grade of Incomplete upon a showing of such hardship to a student as would render it unjust to hold the student to the time limits previously fixed for the completion of his or her work.

After discussing this statement with my colleagues, I believe that the "hardship" referred to does not include poor preparation or planning, an overloaded schedule, or similar factors.  Rather, it refers to substantial illness, family emergencies, and the like.  Any incompletes granted in E538 will be based on this University policy.

Policy on exams:

You must take mid-term exams at the times they are scheduled for the lab in which you are enrolled.  (I may, however, attempt to arrange for the exams for both sections to occur at a common time.)  If that creates a conflict with some other course, you may ask to take the exam for the other lab section and I'll most likely give you permission to do so, depending on availability of space.  Such requests will be considered only if made at least two weeks prior to the exam time. 

You must take the final exam at the University's published time (extended slightly as listed just below) for the section in which you are officially enrolled.  It is your responsibility to arrange employment requirements, job interviews, airplane flights, and the like so they do not conflict with any of the scheduled exams.  The times, as extended a bit from page 28 of the semester's schedule of courses, are:

Section 8442 (9:30 a.m. lecture)¾Final exam from 9 a.m.– noon on Wednesday, May 2.

Section 8444 (2:30 p.m. lecture)¾Final exam from 10:15–1:15 p.m. on Monday, April 30.

E538—Spring 2001—Tentative Schedule:

Week of

Topic

Reading[1]

Jan 8

Introduction, Distributions, display, and description of data;  Pre-test (optional)

Preface, Introduction

1

Jan 15

M.L. King Day (No class Jan 15)

Relationships among two or more variables

2 (omit 2.6)

Jan 22

Experimental Design

3

Jan 29

Probability

4;  Iv7–9

Feb 5

Probability, conditional probability

4.4;  Iv12–17

Feb 12

Binomial and Poisson distributions

Properties of sample means

Introduction to estimation

5.1, Notes

5.2

6.1

Feb 16

First mid-term exam, in lab

 

Feb 19

Decision making under uncertainty---I

Decision trees, probabilities, and values

Notes

Feb 26

Decision making under uncertainty---II

Randomization hypothesis tests

Types I and II error,  and power

Notes

6.2–6.4

Mar 5

Decision making under uncertainty---III

t tests

Notes

7.1, 7.2

Mar 12

Spring break!

 

Mar 19

t tests and power

 

Mar 26

Inference about variability

Inference about proportions

7.3

8.1

March 30

Second mid-term exam, in lab

 

Apr 2

Bayesian inference for proportions and for means

Iv9–11, Iv18–39, Iv70–77

(Review earlier Iverson)

Apr 9

Transformations

Notes

Apr 16

One-way anova

Two-way anova

12,13

Notes

Apr 23

Regression statistics, Logistic regression;
Post-test

10, 15

 

Final exams—time and place as above

 

 



[1] Most references are to chapters or sections of Moore & McCabe.  Numbers preceded by “Iv” are page numbers in the Iverson book.  “Notes” refer to readings from the class notes.

Home               Previous section                      Next section                Table of contents