§3 Rolling the Dice 🎲

R Basics

Welcome to your first hands-on project with R! We’ll create a virtual pair of dice to learn fundamental R concepts. Just as historians analyze artifacts or literary scholars examine texts, we’ll be examining the building blocks of data analysis.

Why Dice?

Dice serve as an excellent starting point for learning R basics because:

  1. They’re simple and familiar objects, making the concepts easier to grasp.
  2. Simulating dice rolls allows us to practice creating variables, using functions, and generating random numbers - all fundamental programming concepts.
  3. Results are easily verifiable, helping you build confidence in your code.
Learning Objectives
  • 🖥️ Get comfortable with the RStudio interface - your new digital work space
  • 🧱 Learn to create and manipulate R objects - the building blocks of R
  • 🛠️ Explore R functions - tools to automate repetitive tasks in your research
  • 📘 Introduction to Quarto documents - seamlessly combine your analysis and narrative

1 RStudio User Interface

1.1 Understanding the RStudio Interface 🖥️

When you open RStudio, you’ll see four main panes:

  1. Console (usually bottom-left): This is where you type R commands and see results. Think of it as a conversation with R - you ask questions, and R answers.

  2. Source (usually top-left): Here you write and edit R scripts. It’s like a digital notebook for your code.

  3. Environment (usually top-right): This shows data and objects in R’s memory. Imagine it as a shelf where R stores all the information you’ve given it.

  4. Files/Plots/Packages/Help (usually bottom-right): This multi-purpose pane is like a Swiss Army knife, offering file management, visualizations, add-ons, and documentation.

1.2 Running R Code 🏃‍♂️

There are several ways to execute R code in RStudio:

  1. Using the Console:
    • Type code directly into the Console and press Enter.
  2. Using the Source Editor:
    • Write code in the Source Editor.
    • Run a single line: Place cursor on the line and press Ctrl+Enter (Cmd+Enter on Mac).
    • Run multiple lines: Highlight lines and press Ctrl+Enter (Cmd+Enter on Mac).
    • Run entire script: Click “Run” button or press Ctrl+Shift+Enter (Cmd+Shift+Enter on Mac).
Pro Tip

Writing code in the Source Editor allows you to save, edit, and rerun your entire analysis easily.

1.3 Learning Check 🏁

1.4 Hands-On Coding 💻

Try the following:

  1. Open RStudio and create a new R script named “my_first_script.R”
  2. In the script, write a simple calculation (e.g., 5 * 7)
  3. Run this calculation using different methods:
    • Console
    • Running a single line
    • Running the entire script
  4. Observe where the result appears in the RStudio interface

2 Objects in R 📦

In R, we store data in things called objects. You can think of an object as a container with a label 📦. The label is the name we give to the object, and inside the container is our data.

For example, we can create an object called my_age and store a number in it:

my_age <- 22

This is similar to how you might label a box “Books” before putting books inside it. Now, whenever R sees my_age, it knows to use the value 22.

Key Concepts
  • Objects: Containers for storing data in R.
  • Assignment Operator (<-): Used to assign values to objects.
Naming Objects

When naming objects in R:

  • Names can’t start with a number
  • Avoid special symbols like ^, !, $, @, +, -, /, or *
  • R is case-sensitive, so Name and name are different objects
  • Avoid using names of existing functions or objects

2.1 Creating a Die as an Object 🎲

Let’s start by creating a single die:

die <- 1:6
die
[1] 1 2 3 4 5 6
Key Concepts
  • Vector: A basic container that can hold multiple items of the same type.
The Colon Operator

The colon operator (:) returns every integer between two integers. It’s an easy way to create a sequence of numbers.

🙋🏻‍♀️ What if I want to create a vector that’s not a sequence of numbers?

2.1.1 Combining Data: c()

In R, c() stands for “combine” or “concatenate”. It’s used to create vectors, which are sequences of data elements of the same type. For example:

# Creating a vector of numbers
numbers <- c(2, 4, 6, 8, 10)
numbers
[1]  2  4  6  8 10
# Creating a vector of characters (text)
fruits <- c("apple", "banana", "cherry")
fruits
[1] "apple"  "banana" "cherry"
# Creating a vector of logical values
logical_values <- c(TRUE, FALSE, TRUE, TRUE)
logical_values
[1]  TRUE FALSE  TRUE  TRUE
# You can also mix data types, but R will convert them to a single type
mixed <- c(1, "two", TRUE)
mixed # R converts all to characters
[1] "1"    "two"  "TRUE"

The c() function is versatile and essential for creating custom datasets in R. You can use it to make lists of any kind of data you’re working with in your research.

2.2 Manipulating Objects

You can perform various operations on your die object:

die - 1
[1] 0 1 2 3 4 5
die * 2
[1]  2  4  6  8 10 12
die / 2
[1] 0.5 1.0 1.5 2.0 2.5 3.0

🤨 What do you notice about the results of the above operations?

Vectorized Operations in R

R performs these operations element-wise on the die object because it uses vectorized operations. This means:

  1. Each element in the die vector is individually operated on.
  2. The operation is applied to every element simultaneously.
  3. A new vector of the same length is returned as the result.

This vectorized approach makes R efficient for handling data and allows for concise, readable code when working with vectors or larger datasets.

2.3 Learning Check 🏁

2.4 Hands-On Coding 💻

Try the following:

  1. Create an object called favorite_number and assign it your favorite number.
  2. Create a vector called lucky_numbers with 3-5 numbers you consider lucky.
  3. Use the length() function to find out how many lucky numbers you have.
  4. Create a new vector called unlucky_numbers by subtracting 1 from each of your lucky_numbers.
  5. Create a vector called all_numbers that combines your lucky_numbers and unlucky_numbers.

3 Functions in R 🛠️

In R programming, functions are like specialized tools that help you perform specific tasks. Just as you might use a hammer to drive a nail or a screwdriver to tighten a screw, you use functions in R to perform particular operations on your data.

3.1 Key Concepts of Functions in R

  1. Purpose: Functions are reusable blocks of code that perform a specific task.

  2. Structure: A function typically has three parts:

    • Name: What you call the function (e.g., sum, mean, plot)
    • Arguments: Input values the function needs to do its job
    • Body: The code that defines what the function does

    Here’s the generic structure of a function in R:

    name_of_function <- function(argument1, argument2, ...) {
      # Body of the function
      # Perform operations using the arguments
    
      return(result)  # Output of the function
    }

    In this structure:

    • name_of_function is where you specify the function’s name
    • function() keyword defines it as a function
    • (argument1, argument2, ...) are the inputs the function accepts
    • The code between { and } is the body of the function
    • return() specifies what the function outputs

    Now, let’s look at a specific example:

    double_value <- function(x) {
      result <- x * 2
      return(result)
    }

    In this example:

    • The function name is double_value
    • It has one argument: x
    • The body is the code between the curly braces { }
    • The return() statement specifies what the function outputs

    You can use this function like this:

    doubled_value <- double_value(5)
    doubled_value
    [1] 10
  3. Input and Output: Functions often take input (arguments) and return output (results).

  4. Built-in vs. Custom: R has many built-in functions, but you can also create your own. You can also set default values for the arguments so that the function can still run without input.

  5. Syntax: To use a function, type its name followed by parentheses containing any arguments:

    function_name(argument1, argument2, ...)

  6. Documentation: You can learn about a function’s use and arguments with ?function_name in the console.

Remember, functions are powerful tools that help you organize and streamline your code, making your analysis more efficient and readable.

3.2 Using Built-in Functions

R comes with many built-in functions that you can use right away. These are like the basic tools that come in a starter toolkit. Let’s look at a few examples:

# The round() function rounds numbers
round(3.14159)
[1] 3
# The sum() function adds numbers together
sum(1, 2, 3, 4, 5)
[1] 15
# The length() function tells you how many items are in a list or vector
length(die)
[1] 6

Using a function is straightforward. You write the name of the function followed by parentheses. Inside the parentheses, you put the information (called “arguments”) that the function needs to do its job.

Comments in R

Did you notice the # … part in the above code chunk?

In R, the # symbol is used to create comments. Anything after a # on a line is ignored by R when running the code. Comments are useful for:

  1. Explaining what your code does
  2. Temporarily disabling code without deleting it
  3. Organizing your script into sections

3.2.1 Simulating Randomness: sample()

To simulate rolling a die, we need a function that can randomly select a number. The sample() function is perfect for this task. It’s like reaching into a bag and pulling out a random item.

# Roll the die once
sample(x = die, size = 1)
[1] 3
# Roll the die again
sample(x = die, size = 1)
[1] 1

In this case, x = die tells the function what to choose from (our die object), and size = 1 says to pick one number.

Sampling With Replacement

By default, sample() doesn’t put the number back in the “bag” after selecting it. For dice rolling, we want to allow the same number to be selected multiple times. Use replace = TRUE in the sample() function to achieve this.

# Roll two dice
sample(die, size = 2, replace = TRUE)
[1] 5 6

3.3 Writing Your Own Functions

Let’s create a function to roll two dice:

roll <- function() {
  dice <- sample(die, size = 2, replace = TRUE)
  sum(dice)
}

Let’s break this down:

  1. roll <- function() { ... } creates a new function named roll.
  2. Inside the curly braces { } is the “body” of the function - the instructions for what it should do.
  3. dice <- sample(die, size = 2, replace = TRUE) rolls two dice.
  4. sum(dice) adds up the numbers rolled.

Now, whenever you want to roll two dice and get the sum, you can simply use:

roll()
[1] 3
roll()
[1] 8

3.4 Learning Check 🏁

3.5 Hands-On Coding 💻

Let’s apply what we’ve learned to simulate flipping a coin. This is similar to rolling a die, but with only two possible outcomes.

  1. Create an object called coin that represents the two sides of a coin (use the numbers 1 and 2).

  2. Use the sample() function to simulate flipping the coin once.

  3. Create a function called flip_coin() that simulates flipping a coin and returns the result.

  4. Modify your flip_coin() function to have an argument named n_flips so that the user could specify the number of time the coin is flipped.

4 Introduction to Quarto 📘

Quarto is a powerful tool that allows you to combine R code, its output, and explanatory text in one document. It is like a special notebook where you can:

  1. Write normal text (like in Microsoft Word)
  2. Add R code (and other programming lanugage too!) to create charts or analyze data
  3. See the results of your code right next to your text

4.1 Why Use Quarto?

  1. Reproducibility: Quarto documents contain both code and narrative that make it easy to reproduce your analyses.
  2. Professional Presentation: Create polished, professional-looking documents.
  3. Flexibility: Output to various formats like HTML, PDF, or slideshows.
The Importance of Reproducibility

In the humanities, as in all academic fields, being able to reproduce the analytical process is crucial. When you write your R code in scripts, you’re creating a record of every step in your analysis. This is like creating detailed footnotes or citations in a research paper - it allows others (or future you!) to follow your work exactly.

Using scripts also makes it easy to make changes and rerun your entire analysis, which is much more efficient than trying to remember and retype everything in the console.

4.2 Quarto Document

A Quarto document has two main parts: - A header at the top (called YAML header) - The main content (text and code)

Below is an example:

yaml
---
title: "My First Quarto Document"
author: "Your Name"
format: html
---
This is where you write your content.

You can use markdown to format your text or use the "visual editing mode" to have an experience similar to Microsoft Word editor.

4.3 Essential Quarto Concepts

4.3.1 Editing Quarto Files: Source vs. Visual Editor

Quarto documents can be edited in two modes: Source and Visual.

  1. Source Editor: This mode shows you the raw Markdown and YAML content of your document. It’s great for:
    • Precise control over your document’s structure
    • Editing code chunks directly
    • Working with advanced Quarto features
  2. Visual Editor: This mode provides a WYSIWYG (What You See Is What You Get) interface, similar to word processors like Microsoft Word. It’s useful for:
    • Formatting text without knowing Markdown syntax
    • Easily inserting tables, images, and other elements
    • Collaborating with team members who may not be familiar with Markdown
Markdown Basics

Markdown is a lightweight markup language used in Quarto documents. Here are some basic syntax examples:

  • # Heading 1, ## Heading 2, ### Heading 3, etc. for headings
  • *italic* or _italic_ for italic text
  • **bold** or __bold__ for bold text
  • - item for unordered lists
  • 1. item for ordered lists
  • [link text](URL) for links
  • ![alt text](image.jpg) for images

For a more comprehensive guide, check out Quarto’s Markdown Basics.

4.3.2 Rendering

Rendering is the process of converting your Quarto document into its final output format (e.g., HTML, PDF, or Word). To render your document:

  1. Click the “Render” button in RStudio, or
  2. Use the keyboard shortcut Ctrl+Shift+K (Cmd+Shift+K on Mac)

When you render, Quarto executes all the code in your document and combines the results with your text to create the final output.

4.3.3 (Optional) Code Chunk Options

Code chunks in Quarto can be customized using options. These options control how the code is executed and displayed. Here are some common options:

```{r}
#| label: chunk-name
#| echo: true
#| eval: true
#| warning: false
#| message: false

# Your R code here
```
  • label: Gives the chunk a unique name
  • echo: Controls whether the code is displayed in the output (true/false)
  • eval: Determines if the code should be executed (true/false)
  • warning: Shows or hides warnings (true/false)
  • message: Shows or hides messages (true/false)

You can set these options globally in the YAML header or for individual chunks.

Pro Tip

Use code chunk options to control what your readers see. For example, you might hide code for complex calculations but show the results.

4.4 Learning Check 🏁

4.5 Hands-On Coding 💻

Let’s practice creating a simple Quarto document about our coin flipping simulation:

  1. Create a new Quarto document in RStudio.
  2. Add a title “Coin Flipping Simulation” and your name as the author.
  3. Write a brief introduction about coin flipping simulations.
  4. Insert a code chunk that defines your coin object and flip_coin() function.
  5. Add another code chunk that uses your flip_coin() function to simulate flipping a coin 10 times.
  6. Add some text explaining the results.
  7. Render it as an html file.

5 Conclusion

Key Takeaways

In this chapter, we’ve covered:

  • The basics of R programming through a virtual dice rolling simulation
  • An introduction to the RStudio interface
  • Creating and manipulating R objects
  • Defining and using functions
  • Creating reproducible documents with Quarto

These foundational skills serve as building blocks for our journey into the world of digital humanities. As we progress, we’ll build upon these concepts to perform more complex data analysis and create compelling visualizations.

RStudio

R Basics

Functions

Quarto