§4 Dealing With Cards 🃏

R Objects and Data Frames

Welcome to your second project with R! We’ll create a virtual deck of poker cards to explore more advanced R concepts.

Learning Objectives
  • 🧱 Understand different R data types (numeric, character, logical, factor)
  • 📊 Create and manipulate data frames in R
  • 🔍 Learn how to access and filter data within data frames
  • 🃏 Apply R concepts to build and analyze a virtual deck of cards

1 R Data Types 🧱

In R, just as in language, we have different types of information. Let’s explore some common data types:

Basic Data Types
  • Numeric: For numbers (integers and decimals)
  • Character: For text data
  • Logical: For true/false values
  • Factor: For categorical data Each type serves a specific purpose in organizing and analyzing information.

1.1 Numeric

We’ve already seen this with our dice. It includes both integers and decimal numbers:

my_integer <- 42
my_deci <- 3.14
class(my_integer)
[1] "numeric"
class(my_deci)
[1] "numeric"

1.2 Character

This is used for text data, like words or sentences:

my_name <- "Shakespeare"
my_quote <- "To be or not to be"
class(my_name)
[1] "character"
class(my_quote)
[1] "character"

1.3 Logical

This represents true/false values:

is_sunny <- TRUE
is_raining <- FALSE
class(is_sunny)
[1] "logical"
class(is_raining)
[1] "logical"

1.4 Factor

This is used for categorical data, like genres in literature:

genres <- factor(c("Poetry", "Prose", "Drama"))
class(genres)
[1] "factor"
What is the Difference Between Factor and Character?

Character and factor may seem similar, but they serve different purposes:

  1. Character
    • This is simply text data.
    • It can be any combination of letters, numbers, or symbols.
    • Examples: "apple", "banana", "cherry123"
  2. Factor
    • This is for categorical data, made up of distinct levels or categories.
    • It’s used when data falls into a limited number of specific groups.
    • Examples: Days of the week, book genres, or rating scales

Key differences:

  • Characters can be any text, while factors consist of predefined levels or categories.
  • Each level in a factor is assigned a unique integer behind the scenes, making factors more efficient for certain analyses.
  • Factors are ideal for data with a fixed set of possible values or categories.

1.5 Learning Check 🏁

1.6 Hands-On Coding 💻

Try the following:

  1. Create a numeric variable called year_published and assign it the year your favorite book was published.
  2. Create a character variable called book_title with the title of your favorite book.
  3. Create a logical variable called is_fiction indicating whether your favorite book is fiction (TRUE) or non-fiction (FALSE).
  4. Create a factor variable called book_genre with a few genres (e.g., “Mystery”, “Romance”, “Science Fiction”).

2 Data Frames 📊

A data frame is like a spreadsheet in R. It’s a collection of vectors of equal length, each representing a column. This structure is perfect for our deck of cards!

Let’s create a simple data frame to represent books:

books <- data.frame(
title = c("1984", "Pride and Prejudice", "The Great Gatsby"),
author = c("Orwell", "Austen", "Fitzgerald"),
year = c(1949, 1813, 1925)
)
print(books)
                title     author year
1                1984     Orwell 1949
2 Pride and Prejudice     Austen 1813
3    The Great Gatsby Fitzgerald 1925
str(books)
'data.frame':   3 obs. of  3 variables:
 $ title : chr  "1984" "Pride and Prejudice" "The Great Gatsby"
 $ author: chr  "Orwell" "Austen" "Fitzgerald"
 $ year  : num  1949 1813 1925

The str() function in R stands for “structure”. It’s a very useful tool for quickly examining the structure of any R object, especially data frames. Here’s what it shows you:

  1. The type of object (e.g., data frame, list, vector)
  2. The number of observations (rows) and variables (columns) for data frames
  3. The name and type of each variable (column)
  4. A preview of the data in each variable

For example, when we use str(books), it might show:

'data.frame': 3 obs. of 3 variables:
$ title : chr "1984" "Pride and Prejudice" "The Great Gatsby"
$ author: chr "Orwell" "Austen" "Fitzgerald"
$ year : num 1949 1813 1925

This tells us: - We have a data frame - It has 3 observations (rows) and 3 variables (columns) - The variables are: - title: character type (text) - author: character type (text) - year: numeric type (numbers)

Using str() is a quick way to get an overview of your data, which is especially helpful when working with large datasets.

2.1 Accessing Data in a Data Frame

You can access specific parts of a data frame using various methods.

$ and [row_number, column_number]
  1. Columns: Use $ followed by the column name (e.g., books$title).
    • This gives you all the values in that column.
  2. Rows: Use [row_number, ] (e.g., books[2, ]).
    • The comma is important! It means “all columns”.
    • This gives you all the data in that specific row.
  3. Individual Cells: Use [row_number, column_number] (e.g., books[1, 2]).
    • This gives you the value in a specific row and column.
Acessing the books Data Frame

Data frames are essentially just spread sheet tables. Our books data frame looks like this:

title author year
1984 Orwell 1949
Pride and Prejudice Austen 1813
The Great Gatsby Fitzgerald 1925
  • books$title would give: 1984, Pride and Prejudice, The Great Gatsby
  • books[2, ] would give: Pride and Prejudice, Austen, 1813
  • books[1, 2] would give: Orwell
# Get a specific column
books$title
[1] "1984"                "Pride and Prejudice" "The Great Gatsby"   
# Get a specific row
books[2, ]
                title author year
2 Pride and Prejudice Austen 1813
# Get a specific cell
books[1, 2]
[1] "Orwell"
# Get a specific cell using row number and column name
books[2, "title"] # Get the second title
[1] "Pride and Prejudice"
# Get the record based on a specific value
books[books$author == "Austen", ] # Get all records by Austen
                title author year
2 Pride and Prejudice Austen 1813
Understanding Nested Functions

The line books[books$author == "Austen", ] might look complex, but let’s break it down:

  1. books$author: This part gets the author column from our books data frame.
  2. books$author == "Austen": This compares each author name to “Austen”.
  • It creates a list of TRUE/FALSE values (TRUE where the author is Austen, FALSE otherwise).
  • For example, if we had 3 books and only the second was by Austen, this would give: [FALSE, TRUE, FALSE]
  1. books[...]: This is like asking R to look inside the books data frame.
  2. books[books$author == "Austen", ]: This combines steps 2 and 3.
  • It tells R: “From the books data frame, give me all rows where the author is Austen”.
  • The comma at the end means “give me all columns for these rows”.

Think of it like a librarian (R) searching through a catalog (the data frame):

  • You ask: “Can you find all books by Austen?”
  • The librarian checks each book’s author (TRUE/FALSE for Austen).
  • Then returns all information about the books that matched.
Filtering Data Based on Conditions

This example demonstrates a powerful technique called “filtering”:

  1. Creating a condition: books$author == "Austen" creates a logical vector (TRUE/FALSE) for each row where the author is Austen.
  2. Using the condition: We use this vector inside the square brackets [ ] to select only those rows that meet our condition.
  3. Selecting columns: The comma after the condition (books [condition, ]) means “select all columns for these rows”.

This method allows you to extract specific records based on any condition you specify. For example:

  • Find all books published after 1900: books[books$year > 1900, ]
  • Find all books with “War” in the title: books[grepl(“War”, books$title), ]

Filtering is a crucial skill in data manipulation and analysis, allowing you to focus on specific subsets of your data.

2.2 Learning Check 🏁

2.3 Hands-On Coding 💻

Let’s apply our data frame skills to a new context: famous paintings. We’ll create a data frame of artworks and practice accessing its data in various ways.

Try the following exercises:

  1. Create a data frame called paintings with columns for title, artist, and year_created.
Example Data

Feel free to use your own favorite paintings for this exercise! If you prefer, you can use the following code:

paintings <- data.frame(
    title = c("Mona Lisa", "The Starry Night", "The Persistence of Memory"),
    artist = c("Leonardo da Vinci", "Vincent van Gogh", "Salvador Dali"),
    year_created = c(1503, 1889, 1931)
)
  1. Access the ‘artist’ column of the data frame.
  2. Get the third row of the data frame.
  3. Find the year the second painting in the data frame was created.
  4. (Optional challenge) Add a new column called ‘style’ to the data frame (e.g., “Renaissance”, “Post-Impressionism”, “Surrealism”).

3 Building Your Deck of Cards 🃏

Now that you have learned about different data types and how to work with data frames, let’s put your skills to the test by creating a virtual deck of cards!

How to Use This Section
  1. Read each task carefully.
  2. Try solving it on your own first.
  3. Use support tools if needed:
    • “Show Hint” 💡: For a gentle nudge
    • “Show Template” 🧩: For a fill-in-the-blank version
    • “Show Solution” ✅: For the complete answer

Remember, everyone starts as a beginner, and mistakes are part of learning. Focus on understanding the concepts rather than perfection.

Happy coding! 🚀

3.1 Create the Deck

Structure of a Deck

Remember, a standard deck of cards consists of:

  • 4 suits: Hearts♥️, Diamonds♦️, Clubs♣️, and Spades♠️
  • 13 ranks in each suit: Ace, 2, 3, …, 10, Jack, Queen, King
  • A total of 52 cards (4 suits × 13 ranks)

Create a data frame called deck with three columns: suit, rank, and value.

  • suit should contain all four suits, repeated 13 times each.
  • rank should contain all 13 ranks, repeated for each suit.
  • value should assign numeric values to the ranks (Ace = 1, Jack = 11, Queen = 12, King = 13, others as their numeric value).

3.2 Accessing Information

Now that we have our deck, let’s practice accessing information from it.

3.2.1 How many cards are in the deck?

3.2.2 What are all the unique suits in the deck?

3.2.3 View high value cards

View only the cards with a value greater than 10.

4 Conclusion

Key Takeaways

In this chapter, we’ve covered:

  • The four basic R data types: numeric, character, logical, and factor
  • Creating and working with data frames, including accessing specific columns, rows, and cells
  • Filtering data based on conditions using logical indexing
  • Building a virtual deck of cards as a data frame and performing operations on it

These skills provide a solid foundation for working with structured data in R. Understanding data types and manipulating data frames are crucial skills in digital humanities research. As we progress, we’ll build upon these concepts to perform more complex data analysis and create compelling visualizations of humanities data.

R Data Types

Data Frames

Data Access

Data Filtering