my_integer <- 42
my_deci <- 3.14
class(my_integer)
[1] "numeric"
class(my_deci)
[1] "numeric"
R Objects and Data Frames
Welcome to your second project with R! We’ll create a virtual deck of poker cards to explore more advanced R concepts.
In R, just as in language, we have different types of information. Let’s explore some common data types:
We’ve already seen this with our dice. It includes both integers and decimal numbers:
This is used for text data, like words or sentences:
This represents true/false values:
This is used for categorical data, like genres in literature:
Character and factor may seem similar, but they serve different purposes:
"apple"
, "banana"
, "cherry123"
Key differences:
Try the following:
year_published
and assign it the year your favorite book was published.book_title
with the title of your favorite book.is_fiction
indicating whether your favorite book is fiction (TRUE) or non-fiction (FALSE).book_genre
with a few genres (e.g., “Mystery”, “Romance”, “Science Fiction”).A data frame is like a spreadsheet in R. It’s a collection of vectors of equal length, each representing a column. This structure is perfect for our deck of cards!
Let’s create a simple data frame to represent books:
books <- data.frame(
title = c("1984", "Pride and Prejudice", "The Great Gatsby"),
author = c("Orwell", "Austen", "Fitzgerald"),
year = c(1949, 1813, 1925)
)
print(books)
title author year
1 1984 Orwell 1949
2 Pride and Prejudice Austen 1813
3 The Great Gatsby Fitzgerald 1925
str(books)
'data.frame': 3 obs. of 3 variables:
$ title : chr "1984" "Pride and Prejudice" "The Great Gatsby"
$ author: chr "Orwell" "Austen" "Fitzgerald"
$ year : num 1949 1813 1925
The str()
function in R stands for “structure”. It’s a very useful tool for quickly examining the structure of any R object, especially data frames. Here’s what it shows you:
For example, when we use str(books)
, it might show:
'data.frame': 3 obs. of 3 variables:
$ title : chr "1984" "Pride and Prejudice" "The Great Gatsby"
$ author: chr "Orwell" "Austen" "Fitzgerald"
$ year : num 1949 1813 1925
This tells us: - We have a data frame - It has 3 observations (rows) and 3 variables (columns) - The variables are: - title
: character type (text) - author
: character type (text) - year
: numeric type (numbers)
Using str()
is a quick way to get an overview of your data, which is especially helpful when working with large datasets.
You can access specific parts of a data frame using various methods.
$
and [row_number, column_number]
$
followed by the column name (e.g., books$title
).
[row_number, ]
(e.g., books[2, ]
).
[row_number, column_number]
(e.g., books[1, 2]
).
books
Data Frame
Data frames are essentially just spread sheet tables. Our books
data frame looks like this:
title | author | year |
---|---|---|
1984 | Orwell | 1949 |
Pride and Prejudice | Austen | 1813 |
The Great Gatsby | Fitzgerald | 1925 |
books$title
would give: 1984
, Pride and Prejudice
, The Great Gatsby
books[2, ]
would give: Pride and Prejudice
, Austen
, 1813
books[1, 2]
would give: Orwell
# Get a specific column
books$title
[1] "1984" "Pride and Prejudice" "The Great Gatsby"
# Get a specific row
books[2, ]
title author year
2 Pride and Prejudice Austen 1813
# Get a specific cell
books[1, 2]
[1] "Orwell"
# Get a specific cell using row number and column name
books[2, "title"] # Get the second title
[1] "Pride and Prejudice"
# Get the record based on a specific value
books[books$author == "Austen", ] # Get all records by Austen
title author year
2 Pride and Prejudice Austen 1813
The line books[books$author == "Austen", ]
might look complex, but let’s break it down:
books$author
: This part gets the author
column from our books data frame.books$author == "Austen"
: This compares each author name to “Austen”.books[...]
: This is like asking R to look inside the books data frame.books[books$author == "Austen", ]
: This combines steps 2 and 3.Think of it like a librarian (R) searching through a catalog (the data frame):
This example demonstrates a powerful technique called “filtering”:
books$author == "Austen"
creates a logical vector (TRUE/FALSE) for each row where the author is Austen.[ ]
to select only those rows that meet our condition.books [condition, ]
) means “select all columns for these rows”.This method allows you to extract specific records based on any condition you specify. For example:
Filtering is a crucial skill in data manipulation and analysis, allowing you to focus on specific subsets of your data.
Let’s apply our data frame skills to a new context: famous paintings. We’ll create a data frame of artworks and practice accessing its data in various ways.
Try the following exercises:
paintings
with columns for title, artist, and year_created.Feel free to use your own favorite paintings for this exercise! If you prefer, you can use the following code:
paintings <- data.frame(
title = c("Mona Lisa", "The Starry Night", "The Persistence of Memory"),
artist = c("Leonardo da Vinci", "Vincent van Gogh", "Salvador Dali"),
year_created = c(1503, 1889, 1931)
)
# 1. Create the paintings data frame
paintings <- data.frame(
title = c("Mona Lisa", "The Starry Night", "The Persistence of Memory"),
artist = c("Leonardo da Vinci", "Vincent van Gogh", "Salvador Dali"),
year_created = c(1503, 1889, 1931)
)
# 2. Access the 'artist' column
paintings$artist
[1] "Leonardo da Vinci" "Vincent van Gogh" "Salvador Dali"
# 3. Get the third row
paintings[3, ]
title artist year_created
3 The Persistence of Memory Salvador Dali 1931
# 4. Find the year the second painting was created
paintings[2, "year_created"]
[1] 1889
# 5. Add a new 'style' column
paintings$style <- c("Renaissance", "Post-Impressionism", "Surrealism")
# View the updated data frame
print(paintings)
title artist year_created style
1 Mona Lisa Leonardo da Vinci 1503 Renaissance
2 The Starry Night Vincent van Gogh 1889 Post-Impressionism
3 The Persistence of Memory Salvador Dali 1931 Surrealism
Now that you have learned about different data types and how to work with data frames, let’s put your skills to the test by creating a virtual deck of cards!
Remember, everyone starts as a beginner, and mistakes are part of learning. Focus on understanding the concepts rather than perfection.
Happy coding! 🚀
Remember, a standard deck of cards consists of:
Create a data frame called deck
with three columns: suit
, rank
, and value
.
suit
should contain all four suits, repeated 13 times each.rank
should contain all 13 ranks, repeated for each suit.value
should assign numeric values to the ranks (Ace = 1, Jack = 11, Queen = 12, King = 13, others as their numeric value).Use the rep()
function to repeat values. For example: - rep(c("A", "B"), each = 3)
gives "A" "A" "A" "B" "B" "B"
- rep(c("A", "B"), times = 3)
gives "A" "B" "A" "B" "A" "B"
For the suit
column: rep(c("Hearts", "Diamonds", "Clubs", "Spades"), each = 13)
For the rank
column: rep(c("Ace", 2:10, "Jack", "Queen", "King"), times = 4)
For the value
column: rep(1:13, times = 4)
deck <- data.frame(
suit = rep(c("Hearts", "Diamonds", "Clubs", "Spades"), each = 13),
rank = rep(c("Ace", 2:10, "Jack", "Queen", "King"), times = 4),
value = rep(1:13, times = 4)
)
# View the first few rows
head(deck)
suit rank value
1 Hearts Ace 1
2 Hearts 2 2
3 Hearts 3 3
4 Hearts 4 4
5 Hearts 5 5
6 Hearts 6 6
Now that we have our deck, let’s practice accessing information from it.
Use nrow()
to count the number of rows in a data frame.
nrow(deck)
[1] 52
Use the unique()
function on the suit
column of the deck.
unique(deck$suit)
[1] "Hearts" "Diamonds" "Clubs" "Spades"
View only the cards with a value greater than 10.
Use logical indexing to filter the deck. The condition should be deck$value > 10
.
high_value_cards <- deck[deck$value > 10, ]
print(high_value_cards)
suit rank value
11 Hearts Jack 11
12 Hearts Queen 12
13 Hearts King 13
24 Diamonds Jack 11
25 Diamonds Queen 12
26 Diamonds King 13
37 Clubs Jack 11
38 Clubs Queen 12
39 Clubs King 13
50 Spades Jack 11
51 Spades Queen 12
52 Spades King 13
In this chapter, we’ve covered:
These skills provide a solid foundation for working with structured data in R. Understanding data types and manipulating data frames are crucial skills in digital humanities research. As we progress, we’ll build upon these concepts to perform more complex data analysis and create compelling visualizations of humanities data.
R Data Types
Data Frames
Data Access
Data Filtering