Hello!

My Office: Dickens 009C

Office Hours: 3-4 PM Monday/Wednesday

“Open Door” Policy

Zoom Office Hours

On-request, I won’t sit in a Zoom room by default every week

Email me: rmsholl@ksu.edu

These notes are always available

Burning questions/emergencies and couldn’t catch me in my office

Just email me, \(\approx\) 72-business-hour response time

If I’m past that window, I’m either dead or on-vacation

I will follow up when/if I return

The speech I wish I could have given Monday:

- I was once, literally, right where you are

## Review

Two general forms of statistics:

Descriptive

- Quarterback Passer Rating

Inferential

- Unemployment negatively effects GDP (Okun’s Law)

Population

**Entire collection**of individuals we’re seeking information on

Sample

**A subset**of that populationThe actual

**observed**group

How often do K-State students attend home football games?

- Randomly survey 100 students that walk into the library on a random Monday

Why do we sample?

I want to determine which gas station in Manhattan, KS is the least popular

- Do I ask every person in town?

Simple Random Sampling

- A sample chosen by a method where every selection from the population made is
**equally likely**to make up the sample

- A sample chosen by a method where every selection from the population made is
Stratified Sampling

Divide the population into similar groups (i.e., group students by college)

Randomly sample from those groups (strata)

Cluster Sampling

Divide the population into clusters (i.e., split Manhattan, KS by street block)

Randomly sample from the clusters

Systematic Sampling

Randomly choose a start point in a “lined-up” population

Sample every \(k^{th}\) item

i.e., Starting from the \(4^{th}\) batch of ice cream produced on a given day, Call Hall will check the quality of every \(4^{th}\) batch that comes off the production line

Sample of Convenience

- Class height

Voluntary Response Samples

- Customer support reviews

Parameter

- Describes an entire population

Statistic

- Describes a sample

## Types of Data

Data set

- Collected information

Individuals

**Something**the information is collected onPeople/Places/Things/etc.

Variables

- Characteristics about the
**individuals**we collected information from

- Characteristics about the

We collected

**information**on studentsThe

**variables**are major, exam score, and gradeThe

**values**of those variables are called**data**

How many individuals?

What are the variables?

What are the data for individual 3

### Variables

Qualitative (Categorical) variable

Values represent categories

Identifying labels/names

Can’t really do math with a label or name

We code these into numbers to fix that

- i.e., Cat-owners = 0 | Dog-owners = 1 | Both = 2

Quantitative variable

Values represent meaningful numbers

Height of a person, sales of a product

We can do math with these

A lot of how we do statistics depends on what data we have

Qualitative variables can be

**ordinal**or**nominal**Ordinal variables

Categories/values of the variable have a natural ordering

Letter grade: A, B, C, D

Clothing size: S, M, L

Nominal variable

Categories/values of the variable cannot be ordered

State of residence

Gender

Quantitative variables can be

**discrete**or**continuous**Discrete variable

A countable number of values (0, 1, 2, 3, 4, …)

Number of students in a classroom

Population size of fish in a pond

How many times a coin flip was successfully called

Continous variable

A continuous range of numbers (0, 0.1, 0.11, 0.111, …)

Temperature

Volume of liquid in a glass

Height/Weight

Quantitative variables can be categorized by

**level of measurement**used for obtaining data values:Interval level

Numerical measurement

Differences between values make sense

Ratios don’t make sense because

**zero has no meaning**Temperature in Celsius/Fahrenheit (Does 0 mean there’s no heat?)

Dates (Is there a meaningful ratio you can make out of 1997 and 2020?)

Ratio level

Numerical measurement

Differences between values make sense

Ratios

**also make sense**Zero

**has meaning**, it represents absence of the quantityHeight (If you’re 0 inches tall, do you have height? Is there a meaningful percentage difference in height between 64 and 67 inches?)

Categorize the variables:

Music

- Qualitative

Food quality

- Qualitative

Closing time

- Quantitative

Own money spent

- Quantitative

### Communicating with Data

- Raw data isn’t entirely useful

Statistics is really good at

**summarizing**and**visualizing**dataThis is the primary focus of the next chapter

Choosing the “best” graph for displaying our data depends on our data

What kind of data do we have?

Categorical?

Numerical?

What are we trying to do?

Look at the distribution of our data?

See how two or more variables are related?

Bar graph: One or more

**categorical**variable

- Histogram: One
**numerical**variable

- Scatterplot:
**More than one**numerical variable

## Summarizing Data

Even when clean, data is messy

Interpreting information is how we make decisions

Every decision we make is data driven

- Even when it’s “emotional data”

Statistics gives us tools to summarize and interpret data rapidly

### Frequency Distribution

Credit cards used by the last 10 customers at a store

Population?

Sample?

How many variables?

What type of variable?

Frequency distribution

Groups data into categories

Records the number of observations that fall into each category

“How

**frequently**do these variables occur in my sample?”

Relative frequency distribution

Divide the number in each category by the total number of observations

This gives us the

**proportion**of units in each category“What

**percentage**of my sample is represented by this variable?”

Count up how many times each variable occurs in the sample

For each variable, divide the

**occurrences**of the variable by the**sample total**\(4\) customers use Visa

\(10\) customers total in the sample

\({4 \over 10}=0.4\)

\(0.4*100\%=40\%\)

How is this useful?

What percentage of the class drinks coffee?

What percentage of the class drinks tea?

Whats our sample?

Population?

What are the variables and variable types?

### Bar Graphs

Graphs are prettier than tables

- Barely a subjective statement

How many customers does this business lose if they stop taking Discover?

Take the frequency and relative frequency distributions:

Credit Card | Frequency | Relative Frequency |
---|---|---|

Master Card | 11 | 0.22 |

Visa | 23 | 0.46 |

Am. Express | 9 | 0.18 |

Discover | 7 | 0.14 |

One or more categorical variables

- So we use a
**bar graph**

- So we use a

- We can make a bar graph using
**relative frequency**too

- We can also just flip this horizontal

This is useful for when you have longer category names

**Side-by-side**bar graphs can be used to**compare**two or more**categorical**variables with the*same*categories

### Pie Charts

- Bar graphs showing frequency can be converted into pie charts

Generally a pie chart will show relative frequency

- “What’s my piece of the pie?”

They’re very pretty

Not very useful

Interpretability is everything

Approximately how large is Borneo?

Approximately how much larger is New Guinea than Sumatra

Someone says that Madagascar and Baffin Island together are larger than New Guinea:

- Is this correct?