Blocking experimental data
You are working with a manufacturing firm that wants to conduct some experiments on worker productivity. Their dataset only contains 100 rows, so it's important that experimental groups are balanced.
This sounds like a great opportunity to use your knowledge of blocking to assist them. They have provided a productivity_subjects DataFrame. Split the provided dataset into two even groups of 50 entries each.
The libraries numpy and pandas have been imported as np and pd respectively.
This exercise is part of the course
Experimental Design in Python
Exercise instructions
- Randomly select 50 subjects from the
productivity_subjectsDataFrame into a new DataFrameblock_1without replacement. - Set a new column,
blockto 1 for theblock_1DataFrame. - Assign the remaining subjects to a DataFrame called
block_2and set theblockcolumn to 2 for this DataFrame. - Concatenate the blocks together into a single DataFrame, and print the count of each value in the
blockcolumn to confirm the blocking worked.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Randomly assign half
block_1 = productivity_subjects.____(____, random_state=42, ____)
# Set the block column
block_1['block'] = ____
# Create second assignment and label
block_2 = ____
block_2['block'] = ____
# Concatenate and print
productivity_combined = pd.____([block_1, block_2], axis=0)
print(productivity_combined['block'].value_counts())