The 4C Model for Defining User Stories: Context, Card, Conversation, and Confirmation

Writing user stories that effectively support software product development is difficult for students new to the practice. At ICSE’23, during the 5th International Workshop on Software Engineering Education for the Next Generation (SEENG), I presented a position paper addressing this challenge by extending the existing 3C model for defining user stories with an extra C for ‘context’. This format is targeted to interactive software systems and inspired by a grounded theory research study where the observed product managers provided context by basing most user stories on concrete and validated design artifacts, such as wireframes.

The 4C Model

In addition to defining user stories using the existing 3C model (Card, Conversation, and Confirmation), I ask my students to start defining a user story by first providing its context (as illustrated in Figure 1).

Fig. 1. The 4 C’s of user stories: (concrete and validated) Context, Card, Conversation, and Confirmation.

The user story context is defined by:

  • The name of the larger feature or epic that encompasses the story.
  • A meaningful name for the story summarizing the covered behavior.
  • A concrete and validated design artifact from which the story can be derived (if applicable). The design artifact visually represents a concrete idea for the solution, like a wireframe. It has been validated by stakeholders (see more on that below). It provides the team with a shared understanding of the story’s context and serves as a starting point to derive the story Card, Conversation, and Confirmation. Note that more than one story could potentially be derived from the same design artifact.

The concrete and validated design artifact also reminds students that a creative process needs to occur BEFORE one starts specifying user stories with acceptance criteria. Concrete visual design artifacts (like sketches or wireframe) are effective at supporting the creative process because they allow us to refine our understanding of the problem and the solution at the same time (which is important because it is impossible to fully understand the problem before moving to the solution). On the contrary, user stories with acceptance criteria are abstract textual artifacts that poorly support the creative process and are more helpful at guiding implementation.

Extended INVEST Criteria

In addition to satisfying the well-known INVEST criteria (Independent [when possible], Negotiable, Valuable, Estimable, and Small), each user story should satisfy the following additional criteria:

  • Contextualized: The story is situated in its broader context via a concrete design artifact. It is straightforward to understand the story relationships with its encompassing feature or epic and surrounding stories.
  • Understandable: The behavior covered by the story is easy to understand by stakeholders, especially by developers in charge of implementation (and by faculty and teaching assistants in charge of evaluating the story).
  • Validated: Stakeholders have validated that the encompassing feature or epic satisfies their needs: The feature or epic is useful, usable, and delightful from a user perspective, competitive from a business perspective, and feasible from a technical perspective. This extends the INVEST ‘Valuable’ criterion. The idea is to identify problems very early on, before writing detailed user stories or code, in order to reduce rework.

Example

Figure 2 presents an example of the application of the 4C model.

Fig. 2. The 4 C’s of user stories: An example

The above story is contextualized (it belongs to the Donate Items epic and is related to the provided wireframe), understandable (the behavior covered is easy to comprehend) and satisfies the stakeholders’ needs (assuming that the wireframe has been properly validated by stakeholders).

Conclusion

After experimenting with the 4C model during four semesters with positive initial results, I posit that the model helps students generate stories that are easier to create and review while supporting the development of innovative solutions that satisfy the stakeholders’ needs. However, this conclusion is based on expert judgment and anecdotal evidence. Further research is necessary to rigorously evaluate the effectiveness of the proposed 4C model.

Reference

This blog is a short summary of the following position paper:

Cécile Péraire. “Learning to Write User Stories with the 4C Model: Context, Card, Conversation, and Confirmation” IEEE/ACM 5th International Workshop on Software Engineering Education for the Next Generation (SEENG) at ICSE’2023 (2023).

Related Posts

What’s Next in PySheetGrader? New Features, Rubric Types, and Modes

Recently, PySheetGrader has added several new features that enhance its functionality and make it even more useful for spreadsheet grading. These new features include a refactored grading algorithm, two new execution modes–log mode and debug mode–for better troubleshooting, the assertion rubric type to support in-cell Python assertions, the addition of an Expected Score field in the grading report, and the ability to evaluate built-in Excel functions using a Python-based Excel formula library in rubric types that rely on relative grading. In this blogpost, we will introduce you to these new features and explain how they can help you grade spreadsheets more efficiently and effectively.

Oh, and also: PySheetGrader is now open-source, available under the BSD 3-Clause License at: https://github.com/cmusv/pysheetgrader-core. If you wish to contribute, please write to hakane@andrew.cmu.edu.

New grading algorithm

PySheetGrader has undergone a significant refactoring of its grading algorithm that uses the Template Method design pattern to minimize duplicated code and ease the maintenance of existing rubric types, as well as to facilitate the addition of new rubric types in the future. This refactoring has made the codebase more modular and understandable, which should benefit project contributors. Additionally, the new algorithm provides more consistent and accurate grading, which will benefit both users and developers. With the new algorithm, PySheetGrader continues to improve its performance, robustness, functionality, and reliability.

Debug mode

Debug mode helps with the testing and troubleshooting of graded spreadsheets by disabling hidden and pre-emptive grading features. It is invoked by adding the -d flag or the --debug flag to the pysheetgrader command. When this mode is activated, hidden and killer cells (specified in the grading instructions in the _CheckOrder tabs) are ignored during grading. In the future, this mode will be improved to also ignore the minimum work requirements and prerequisite cells for better visibility during troubleshooting and testing. Disabling these features provides the instructor with full visibility when debugging the assignment. Here is an example invocation using this mode:

pysheetgrader TestGraderKey.xlsx TestGraderSub.xlsx --debug-mode --html-report-output Report.html

Log mode

Log mode can be invoked by adding the -l or --log-mode flag. This mode provides additional traceback logging, which can be useful in identifying the source of errors or unexpected behaviors in grading specifications and in the grading engine. Here is an example invocation using this mode:

pysheetgrader TestGraderKey.xlsx TestGraderSub.xlsx --log-mode --html-report-output Report.html

Assertions

Assertions are a powerful new relative grading strategy added to PySheetGrader. They extend the existing repertoire of relative grading strategies, which include the check, relative, and relative_f rubric types, with a new rubric type: assertion. The new rubric type allows the instructor to express custom conditions unconstrained by Excel’s limitations. Assertions can can verify complex relationships in submission files, and instructors can use them to determine the partial correctness of solutions and perform sanity checks.

Assertions are written using Python syntax that permit bound variables prefixed with $ to represent cell references. Assertions must express a Boolean condition: they must evaluate to either True or False. They typically contain Python logical operators (e.g., and, or, not) and comparison operators (e.g., ==, !=, <=, <, etc.), and may even include conditional structures such as if/else statements. They are inserted into cells of the key file with an attached rubric note that tells the grader that the cell contains an assertion to be evaluated on the submission file. Assertions work seamlessly with existing

As an example, suppose we have the Python boolean expression $B5 < $C7 or $B5 == $B6 in a given key cell with the following rubric note:

rubric:
 score: 1
 type: assertion

The expression includes bound variables $B5, $C7, and $B5, which refer to the evaluated values of the corresponding cells in the key file. Suppose in the submission file, the value of cell B5 equals 2020, C7 equals 204, and B6 equals 2020. The given assertion would evaluate to True, the key cell will be graded correct, and the grader will assign the given score to the key cell.

PySheetGrader features–including alternative cells, negative grading, and more–that work with other relative grading strategies work as expected with the assertion rubric type.

Expected Score field

The Expected Score feature is a new optional field in the _CheckOrder tab of the key file. It is used when running PySheetGrader in test mode to include an expected score for each graded cell to compare with the actual score. It helps with debugging the scoring: the instructors can quickly identify discrepancies between the expected scores and the scores generated by the grader on a test submission file. If the expected score does not match the score generated by PySheetGrader, the graded cell will FAIL with a failure message in the test-mode grading report, and the corresponding row in the HTML report will be red.

To use this feature, simply add a field with the name Expected-Score to the _CheckOrder tab of the key file you want to test with a submission file. Then, add the expected score as an integer as shown below. This change is fully backwards compatible, and will not fail columns with no added Expected-Score.

Excel formula library

PySheetGrader now uses a new Python library called formulas that allows the instructors to evaluate built-in Excel functions during relative grading. By setting a new rubric note attribute parse_excel to true in a check, relative, or relative_f, instructors can use this Python library to evaluate common Excel built-in functions that PySheetGrader’s grading engine normally does not recognize. These functions include AVG, SUM, LEN, LN, ROUND, CONCATENATE, COUNTIF, and more. An example rubric note is shown below that sets the parse-excel attribute to use the new feature:

The complete list of supported built-in functions as of this writing is given below, but more can be added after testing to make sure that they work as intended. The list of functions will be gradually extended with each new release as we test them. Many more functions are expected to work out-of-the-box but we haven’t yet added them to the sanctioned list. Any attempt to use a function that is not in the sanctioned list will throw an error.

SUPPORTED_FUNCTIONS = [
 '_xlfn.STDEV.S','SQRT', 'OR','IF', 'EXP',
 'LN', 'NOT', 'OR', 'ROUND', 'CONCATENATE',
 'LEN', 'SUM', 'MAX', 'AVERAGE', 'COUNTIF', 'AND',
]

For the most up-to-date list, please check the string array at https://github.com/cmusv/pysheetgrader-core/blob/main/pysheetgrader/formula_parser.py#L19. A new function can be added to the array if needed provided that it has been tested and is known to work.

Conclusions

In this blogpost, we’ve discussed several new features that have been added to PySheetGrader in its most recent release to facilitate troubleshooting and testing, improve relative grading, and support sanity checks for partial credit. Simultaneously, the grading algorithm received an overhaul to make it easier to maintain the code and add new features. The new features make PySheetGrader more powerful and flexible than before, allowing the instructors to create customized grading criteria that are tailored to their specific needs. Stay tuned for further improvements and fixes, which will be introduced in future blogposts.

PySheetGrader is now open-source

We have made PySheetGrader available to the greater community under the BSD 3-Clause License. To access the codebase and try it out, visit: https://github.com/cmusv/pysheetgrader-core. If you wish to contribute to the pysheetgrader-core project, please write to hakane@andrew.cmu.edu.

Related Posts

Automatic Grading of Spreadsheet Assignments

Improvements to PySheetGrader

Robust Testing of PySheetGrader and Its Assignments

Robust Testing of PySheetGrader and Its Assignments

Up until now, there was no concrete way to perform push-button testing of the full behavior of PySheetGrader and PysheetGrader assignments. Manual testing is cumbersome, time-consuming, error-prone, and not repeatable. The latest release fills this gap by adding a testing mode to PySheetGrader as a way to specify cell-level unit tests in the CheckOrder tabs, and a special grading report format to display the test-mode test results. These unit tests are only run in the test mode and are different from the test cases specified in a cell’s rubric note using the test_cases property. The test_cases property is for grading a student’s assignment based on instructor-specified unit tests to decide on a cell’s correctness, whereas test-mode tests are used for testing the behavior of the grader itself before new releases as well as testing new assignments before deploying them to students.

Running PySheetGrader in test mode

The test mode is enabled by invoking PySheetGrader with the --test-mode or -T options from command line. The key file containing the unit test cases and the submission file containing the data for the test cases are specified as follows, in the usual way:

pysheetgrader TestGraderKey.xlsx TestGraderSub.xlsx --test-mode \ 
 --html-report-output Report.html

Here TestGraderKey.xlsx is the test key file containing the test-mode test cases and the TestGraderSub.xlsx is the test submission file containing the data. The --html-report-output option asks PySheetGrader to generate a detailed HTML report, which will be stored in the file Report.html.

Specifying test-mode test cases in CheckOrder tab

In the test mode, cell-level test cases are specified in a CheckOrder tab using additional, special columns with headers test-name, expected-result, and failure-message (this formal will be familiar to the reader who has experience with unit testing in software development). These columns are ignored outside the test mode, during regular grading. The column headers are order-independent and case insensitive. 

Test-mode column purposeColumn headingContent/Value
Test nametest-nametext
Expected resultexpected-resultcorrect or incorrect
Failure message to be added to the report in case of test failurefailure-messagetext

A CheckOrder tab of a test key file containing test-mode unit tests may look like this, with the addition of the extra columns used in test mode:

Note that column headers are case-insensitive. Therefore, for example, Expected-Result is valid as the column header for the expected result column.

The expected result column indicates whether the test expects the evaluation of that cell to be correct or incorrect. The PASS/FAIL status of the test in the produced report will consider what the expected result is to produce a verdict. This allows both correct and incorrect cell solutions in the test submission file to be tested. If no value is specified in this column, PySheetGrader assumes the default value correct. For example, the test named T4 expects the grader to produce a zero score (indicating an incorrect solution) with the given test submission file if the cell A5 is positively graded or a negative score if the cell A5 is negatively graded.

The test-mode grading report

The grading report generated using the test mode has extra columns that indicate the status of each test-mode test case. Failures produce red rows with a FAIL verdict under the Status column. A failure occurs when the test submission file contains a solution that produces an unexpected result for the tested cell. The failure message, if specified, is appended to the status. Passing tests generate green rows with a PASS verdict under the Status column (and no failure message).


We hope that the test mode will allow PySheetGrader itself as well as individual PySheetGrader assignments to be tested more throughly and in a repeatable manner, increasing the confidence of PySheetGrader’s contributors who maintain and extend it as well as instructors who design assignments.

Related Posts

Automatic Grading of Spreadsheet Assignments

Improvements to PySheetGrader

Acknowledgment

Aakash Javia implemented the test mode during the Spring 2022 semester.

Improvements to PySheetGrader

The new release of PySheetGrader includes bug fixes, minor improvements, and several new grading features. In this post, we discuss one of the improvements and all of the new features. The improvement makes the format of the grading instructions in SheetGradingOrder and CheckOrder tabs more flexible by allowing custom columns orders, while retaining backward compatibility. The new grading features include prerequisite cells, killer cells, support for negative (deduction-based) grading, and a variation of relative grading, called check rubric type, that evaluates a formula of the key file on the submission file and compares the result to a cell value stored in the key file.

The check rubric type

The check rubric type is a variation of the relative rubric type. The existing relative rubric type evaluates a formula stored in a cell C of the key file using the submission file’s corresponding cell references and compares the result to the evaluated value of the cell C in the submission file. The check rubric type also employs a similar relative grading strategy, but compares the result of the formula’s evaluation on the submission file to a given cell’s evaluated value in the key file rather than in the submission file.

As an example suppose we want to check that the value of the submission cell B1 is greater than the value of the submission cell B2. In the key file, we can enter the formula =B1 > B2 in a cell A1. We can pick another cell A10 in the key file to specify the expected result of this formula. Suppose we pick A10 for this purpose: we enter TRUE in this cell, which stipulates that B1 > B2 in the submission file. The rubric note on A1 could look like this (in this case, with an alternative cell defined as well):

rubric:
 score: 2
 type: check
 result: A10
alt_cells:
 - F10

Suppose in the submission file, the cell B1 evaluates to 4 and B2 evaluates to 5. Then the score for cell A1 will be 0 for submission because the formula B1 > B2 evaluates to FALSE, but should be TRUE. Note that, unlike in the relative rubric type, the cell A1 of the submission file is not used in this grading strategy.

The expected result can also be stored directly in the cell A1 itself of the key file. We could do this if the formula B1 > B2 in fact evaluates to TRUE in the key file itself. Then the result property can omitted in the rubric note, like this:

rubric:
 score: 2
 type: check
alt_cells:
 - F10

Here is another example using a numeric comparison. This time the key cell A1 contains the formula =B1 + B2 and A10 contains the expected result 8.95. (The result cell cannot contain a formula for now: it needs to be a constant and the grader will not evaluate the formula if a formula is entered in the result cell. We will change this later so the the result cell can also contain a formula.) The rubric note on A1 would also admit the delta property because the result is numeric. It could look like this:

rubric:
 score: 2
 type: check
 delta: .1
 result: A10

A1’s score will be 2 in this variation since the key cell’s formula evaluates to 9 on the submission file and is within a margin of 0.1 of the expected result 8.95.

Negative grading

Negative grading is a common strategy that applies deductions based on each submission mistake. It is used in cases where the student is expected to submit a correct solution for a graded component, and instead of receiving a positive score for the correct solution, the student receives a penalty (a negative score) for an incorrect one. In PysheetGrader, negative grading is implemented by a new grading property whose value is negative. This allows a negative score to be assigned to a graded cell. If the submission cell passes the negative rubric specification, it is assigned a zero score (no deduction). If not, the submission cell is deemed incorrect, and the specified negative cell score is applied as a deduction on the total score. In case of a deduction, a feedback message is added to the grading row of that cell in the report. The maximum score for a negatively graded cell is zero whereas in positive grading, the maximum score is the score property’s value.

To use negative grading, the rubric note must include the grading property:

rubric:
 grading: negative
 ...

If the grading property is omitted, its default value is positive, and the grader uses the normal positive grading behavior. The following rubric note explicitly specifies a normal, positively graded cell (in which case, the grading property can safely be omitted):

rubric:
 grading: positive
 ...

As an example, suppose the key cell A1 contains the formula =A2 + B1 with this rubric note:

rubric:
 grading: negative
 type: formula
 score: -5

If the submission cell A1 contains the formula =A3 + B1, the submission cell will be incorrect (the submission cell’s formula doesn’t match the key cell’s formula), the corresponding grading row in the HTML report will be red (representing a failure, with a corresponding feedback message), and a deduction of -5 will be applied to the submission’s total score. However, if the submission cell A1 contains the formula =B1 + A2, the cell is correct (the key and submission formulas match), and its score will equal zero (the corresponding grading row in the HML report will be green).

Negative grading can be applied to a hidden cell, and works with alternative cells (the alt_cells property).

Prerequisite cells

The new release of PySheetGrader allows the instructor to specify certain cells that must be correct for a particular cell to be graded. These prerequisite cells are specified in the rubric note using a new property prereq, whose format is similar to that of alt_cells. Here is an example:

The above rubric note stipulates that cells B1 and B2 be graded as correct before cell B4 can be graded. If either B1 or B2 fails, the HTML report will look like this (in this example, B2 is correct, but B3 is incorrect):

If the instructor desires, the student can be informed of the prerequisites when they make a mistake in one of the prerequisite cells using the feedback field. In the above example, the instructor explicitly states which prerequisites apply to cell B4 in the feedback given when the cell is graded as incorrect. Recall that when things go well, no feedback is given to the student.

Killer cells

PySheetGrader now allows instructors to specify must-be-correct cells, which, if incorrect, will revert the score of the whole tab (sheet) in which a graded cell is located back to zero. These cells are called killer cells. Killer cells can be specified as hidden if revealing them gives too much information to the student.

An example of a killer cell is shown below in a CheckOrder tab of a key file. Note the inclusion of the special column. (Column headings are now significant: see the section on custom column naming and ordering below.)

Here, cells B5 and B6 are both killer cells, indicated by the letter k under the special column for that cell. If one of these cells is incorrect in the submission file, the HTML report reads as below, and the whole tab receives a zero score. Whenever a killer cell is wrong, the grader stops grading the tab in which the cell is located, and truncates the tab’s report section at that point.

The grader can also handle the case where the instructor does not want to reveal which killer cell is wrong, in which case the special column entry for that cell is specified as hk (or kh). In this case the section of the HTML report for the tab looks like this:

Custom column naming and ordering

The earlier release of PySheetGrader required that the columns of the SheetGradingOrder and CheckOrder tabs be in a specific order. Recall that these special tabs specify the grading instructions for the whole assignment and for its individual parts (graded tabs), respectively. The information specified includes the grading order, cell descriptions, optional feedback messages, hidden cells, and now, the killer cells. Since the columns of the grading instructions were required to be in a fixed order, the column headings did not matter in the previous release: they were ignored. This design decision made adding new grading information and column-based features difficult and cumbersome.

The new release makes the columns specifying grading instructions order-independent, however column headers (the first rows of the SheetGradingOrder tab and the CheckOrder tabs) are now significant and fixed. Columns with headers other than the ones that PySheetGrader recognizes as grading instructions are ignored, and thus can be used for any other purpose, for example, for the instructor’s comments or annotations. Here is an example of how a CheckOrder tab may look like using the new format:

Note that column I is blank and can be used for whatever purpose the instructor deems necessary. The intent is to provide flexibility.

The fixed column headers and contents as follows. All column headings are case-insensitive.

In the SheetGradingOrder tab:

Column purposeColumn headingContent
The name of the tab (sheet) to be gradedname, sheet, or tabtext
Score required for minimum work for the tabmin-worknumeric, zero or positive
Feedback message if minimum work requirement for the tab is not reachedfeedbacktext

In a CheckOrder tab:

Column purposeColumn headingContent
Instructor’s custom identifier for a graded cellnumber, cell-id, or idtext or number
Cell coordinate (reference)cellcorrect alphanumeric Excel cell reference
Description of the graded celldescriptiontext
Cell grading instructions for special cells (for hidden and killer cells)specialany combination of characters h or H (for a hidden cell) and k or K (for a killer cell), without spaces and using the same capitalization:
k, K, h, H, kh, hk, KH, or HK

What comes next?

Robust and repeatable testing is ultimately essential for any auto-grader. The next blog will present the newly released test mode that allows PySheetGrader contributors to test the grading engine before deploying fixes and new features, and instructors to test their assignments before deploying them to students, all in a push-button style. We are also working on the ability to evaluate formulas within Excel to improve the grader’s performance and be able to use any Excel built-in function in formula-based rubrics.

Related posts

Automatic Grading of Spreadsheet Assignments

Testing PySheetGrader and its Assignments

Acknowledgements

Sandeep Keshavgowda implemented the new check rubric type as well as the killer-cell and prerequisite-cells features. Ksheeraja Raghavan implemented negative grading. Aakash Javia implemented the custom column naming and ordering feature.

SEENG 2021 Software Engineering Education Workshop: Report

by Cécile Péraire and Stephan Krusche

The Third International Workshop on Software Engineering Education for the Next Generation (SEENG 2021) was held remotely on May 24, 2021. The workshop was an integral component of the Joint Track on Software Engineering Education and Training (JSEET) at the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). It specifically supported the JSEET’s theme of Educating the Next Generation of Software Engineers:

“Millennials and Generation Z have been dominating higher education programs for some time. In a decade, our classes will be hosting Generation Alpha. We have all noticed that these cohorts have unique needs and different learning styles, social behaviors, and skills. With their increasing mobility, our classes will become more and more diverse. Our students will become increasingly collaborative, socially aware, tech-savvy, multi-disciplinary, and interested in emerging technologies. How shall the software industry capitalize on the mix of qualities that the new breed of software engineers will possess? How shall software engineering educators effectively and equitably cater to the students’, society’s, and industry’s evolving and diverse interests while ensuring that the next generation of software engineers learn the timeless principles, have a firm grounding in both software engineering theory and practice, behave responsibly, and make a positive contribution to the society?” 

Following the First International Workshop on Software Engineering Curricula for Millennials (SECM 2017) and Second International Workshop on Software Engineering Education for Millennials (SEEM 2018), our goal in this third edition was to continue to bring together main stakeholders to discuss the unique needs and challenges of software engineering education for the next generation, including software engineering educators and representatives of STEM education. Building on its predecessors, the workshop used a highly interactive format, structured around short presentations to generate discussion topics, an activity to select the most interesting topics, and structured breakout sessions to allow participants to address those topics.

We solicited position papers addressing a variety of related topics that include: software engineering education for new and emerging technologies; novel approaches to designing software engineering curricula; skills and continuing education for software engineering educators; classroom formats that cater to diverse learning styles; teaching approaches that leverage technology-enhanced education in software engineering courses; balancing teaching of soft and hard skills; rigor and practicality in software engineering education; and experience in educating students in software engineering programs.

The diverse international program committee consisted of 21 software engineering educators and researchers from academia and industry across 12 different countries. The submission process was competitive. Each paper was evaluated by at least 3 reviewers. Eight position papers out of 20 submissions were accepted (40% acceptance rate). The contributions span a variety of topics, including project-based courses, teaching design by contract, software testing approaches, student feedback and assessment practices, and more.

The ulterior motive with this third edition of SEENG was to continue to grow a community excited about and dedicated to serving the next generation of students. 26 participants registered to the workshop and 18 of them joined via Zoom to share their thoughts, insights, and experiences, seek and offer advice on pedagogical methods that work, and explore collaboration opportunities on new software engineering education initiatives.

The workshop agenda was based on the structure and practices proposed in Designing Interactive Workshops for Software Engineering Educators with some adaptations for a remote setting, including the use of an online collaborative whiteboard platform called Miro.

Workshop collaboration was supported by the Miro online collaborative whiteboard platform

The workshop started with a brief introduction of all the participants using visual slides.

Visual slides used by the workshop participants to briefly introduce themselves

It continued with a series of brief 5-minute presentations of the accepted papers to remind everyone of the problems we face, and learn how others have been solving those problems. To identify common interests, participants were asked to capture any interesting insights from the presentations on virtual post-it notes in Miro.

Based on the insights gathered during the presentations, we then performed an affinity mapping activity to identify and select discussion topics. The topics that emerged as of most interest to the participants were: Team-based Learning, Automatic Assessment, and Specification and Verification.

Affinity map created by participants based on what they learned during the presentations (created in Miro)

Participants were then divided into breakout groups to discuss the selected topics further. For each topic (Team-based Learning, Automatic Assessment, and Specification and Verification), the breakout group participants were asked to define a Big Hairy Audacious Goal (BHAG), or very ambitious goal. For their BHAG, group participants created a Mind Map to brainstorm WHAT success looks like and HOW to achieve success. Finally, each breakout group presented its mind map to all the workshop participants. Here are the resulting mind maps.

Mind Map for Team-based Learning (created in Miro)
Mind Map for Automatic Assessment (created in Miro)
Mind Map for Specification and Verification (created in Miro)

We closed the workshop with a retrospective and identification of future action items.

Outcome of the workshop retrospective (done in Miro) 

The feedback was overwhelmingly positive, with participants appreciating the interactive workshop format, use of the Miro collaboration tool, short presentations, and the opportunity to discuss with peers despite the covid-19 pandemic. The main improvement suggestion was that the discussion period could have been longer. Some of the action items identified included using the workshop’s blog to present the results of the breakout sessions and to summarize and promote the SEENG papers.

Acknowledgment: We would like to thank the program committee members for their work and selection of high-quality papers. We would also like to thank all participants for attending the third edition of ICSE’s software engineering education workshop. The workshop was a success thanks to your enthusiasm, active participation, insights, and experiences.

Automatic Grading of Spreadsheet Assignments

Spreadsheets are ubiquitous and practical. Many academic and professional courses use them as a pedagogical aid. Applications abound in finance, accounting, economics, business, and even science and computing. I am myself a software engineering professor. Although my other courses at Carnegie Mellon University rely heavily on coding skills, I also teach an interdisciplinary graduate course, Decision Analysis and Engineering Economics for Software Engineers, which takes advantage of spreadsheets extensively.

Many of my assignments and labs in this course are deployed in the form of a spreadsheet. The students receive a problem description and an Excel template that serves as a starter spreadsheet. They complete the template according to the given problem description, and submit it as their solution. The submitted Excel file is then graded according to specified rubrics, and the students get a score and some feedback about the correctness and quality of their solution. To give you an idea, below is a typical assignment description in this course. The assignment presents a case study concerning a software development project, which needs to be analyzed and valued using the information and data provided. Students complete the specified tasks to solve the case study.

The gist

Together with the problem description, the students are provided with a starter Excel template, which includes further details and some data. They work on the template to solve the problem given to them by completing the yellow highlighted cells.

Typical Spreadsheet Assignment: Excel Solution Template

At the end of this process, here is what a student may submit as the completed solution:

Typical Spreadsheet Assignment: Completed Solution

Infinitely many correct solutions and cascading cell dependencies get in the way of grading spreadsheets

Grading the submitted solution seems straightforward at first: one could simply compare the evaluated values of the highlighted cells in the student submission with those of a reference solution contained in a key spreadsheet file. However the process is not that simple in practice: in a spreadsheet, many cells contain formulas that refer to other cells, which may further refer to other cells. If a student makes a mistake in a cell that is at the bottom of a dependency chain, the mistake will be propagated to all the other cells that depend on it. With this naive approach, the student’s solution will end up being penalized multiple times for the same early mistake. To prevent multiple penalties, the grader may need to compare a formula in the student’s solution to the correct formula in the key file. If the two formulas are mathematically equivalent, the student should not be penalized for an earlier mistake, and given the full credit for that cell. Therefore, when grading the solution, graders must be careful and flexible. Because there are an infinite number of ways a correct solution can be expressed as a formula and cell dependencies tend to go deep, grading spreadsheet assignments manually can be quite tedious, laborious, and error-prone, even for simple assignments. The effort required for the mechanistic parts of this process is best allocated to more meaningful and pedagogically more value-adding activities. Fortunately, automation comes to the rescue.

Automated grading with PySheetGrader

I have been a fan of automated grading for many years. Automated grading of programming-based assignments is now widespread in computer science and software engineering education. Several platforms and tools exist. Low-stakes assignments and labs in another course I teach on software verification and testing—a programming- and modeling-intensive course—are nearly fully automated, which saves my teaching assistants and me precious time that we can allocate to more open-ended and more cognitively demanding aspects of the course. I wanted to reap similar benefits in my decision analysis course that uses spreadsheet assignments, but I could not find any tool or system on which I could build to automate the grading process. Building on previous successes, finally my students and I created PySheetGrader, a tool for fast and flexible grading of spreadsheet assignments.

PySheetGrader supports many types of cell-based grading strategies for maximum flexibility and expressiveness. We have integrated PySheetGrader into Vocareum, a cloud-based lab platform for computing education, and used it with great success in my course. Going further, push-button integration of Vocareum with my university’s LMS system, Canvas, allowed a more seamless experience for my students. In the rest of the blog, I will explain the philosophy, architecture and main features of PySheetGrader.

PySheetGrader has a simple architecture, shown below. The central concept is the grading specification, which is embedded directly into a key spreadsheet file representing a reference solution. The key file is just an ordinary Excel file (with .xlsx extension): it contains a fully-developed solution for the assignment, together with meta-information that specifies how the assignment will be graded. This information includes grading instructions, cell-based rubrics, sanity checks, and cell-based test cases. When a student submits a solution to the submission platform (Vocareum in our case), PySheetGrader’s grading engine processes the student submission using the key file to produce a grade report. The report contains granular and structured feedback to the student, and tallies up the rubric components to assign a total score to the student’s submission. The student can then revise the solution based on the feedback, and resubmit if necessary.

PySheetGrader Architecture

Next, I will discuss the core features of PySheetGrader with several examples. The core features provide us with a wide range of ways we can create auto-graded assignments, but they have limitations. PySheetGrader’s expressiveness can be improved substantially with advanced grading strategies and concepts, such as sanity checks and assertions. I will save the discussion of these advanced strategies for a future blog.

A simple example with basic cell-based rubrics

To illustrate how PySheetGrader works, we’ll start with a simple example with two simple rubric types. The assignment is about calculating the total amount for a grocery store transaction from unit prices of items, quantities purchased, and a sales tax rate. The students are given this Excel starter template to complete the assignment:

The highlighted cells need to be filled out in the submitted solution. In a key file that mirrors the template the students receive, the instructor works out the solution and specifies rubrics in a domain-specific language for grading. Rubrics are attached as Excel notes to selected highlighted cells. Suppose the instructor decides that cells D6, D8, D9, D10, and D11 will be graded. Note that the highlighted cells D6 to D8 are computed in a similar manner, so the instructor decides to sample only two of these three similarly-computed cells. If this sample is completed correctly by the student, we will assume that the cells in the same column that should use the same formula are too. (In general, we don’t need to grade every single row of a computed column in a data table if the table’s rows are computed using the same formula.) The Excel key file for this assignment may look like this:

For convenience, column E shows the formulas entered in each row of column D for the highlighted cells. This is the correct solution, but it is just one correct solution. The rubric for each graded cell is specified as a cell note, added by right-clicking on the cell and selecting New Note from the cell dropdown menu: cells with these rubric notes are the ones with the little red earmarks at the top-right corner. If we hover on one of these cells, Excel will display the attached rubric note:

So cell D6’s rubric note contains this text:

rubric:
 score: 1
 type: constant
 delta: 0.01

This text is a rubric specified in PySheetGrader’s rubric language. The savvy reader will identify it as a YAML expression. YAML is a simple data serialization language in which specifications are structured using simple indentation. This specification tells PySheetGrader that cell D6 is graded as a straightforward value check (type: constant). To grade this cell, the grading engine first evaluates both the key cell and the corresponding submission cell to determine their computed values. The resulting values are then compared to set the cell’s score: if the values match, the cell gets the score indicated in the rubric (score: 1); otherwise, the cell’s score is set to 0. For numeric cells, a precision can be specified as well using the delta property, in which case the comparison is performed using the given precision.

This naive strategy works fine in some cases, but the real power of a spreadsheet comes from a cell’s ability to define a formula that refers to other cells. In the above example, cell D10’s rubric type is a formula as opposed to a constant (type: formula). The rubric note for this cell looks like this:

rubric:
 score: 3
 type: formula

This rubric note tells the grading engine that the cell is to be graded more smartly: as a formula. PySheetGrader compares the key cell’s formula to the corresponding submission cell’s formula to determine whether one formula can be mathematically reduced to the other by treating the cell references appearing in the formulas as free variables. If this is the case, the two formulas are mathematically equivalent, and the graded cell receives the score indicated in the rubric (score: 3); otherwise the cell’s score is set to 0. In this example, the key file specifies the formula B2*D9 for cell D10. Suppose the submission file’s D10 cell contains the formula D9*B2. The two formulas are mathematically equivalent, so the cell receives the score 3. There are infinitely many different ways to rewrite the formula B2*D9: PySheetGrader uses a symbolic math library SymPy to perform the comparison, and is smart enough to infer, for example, that the more complicated-looking formula 1 + SQRT(B2*B2)/(1/D9) - 1 is in fact equivalent to B2*D9.

The formula rubric type is important since it prevents a propagated mistake from being double-penalized. To see why, suppose the student made a mistake in cell D6 and receives a score of 0 for that cell. D10 depends on D9, which in turn depends on D6. If the rubrics for D10 and D9 were of type constant, both of these cells would also receive a score of 0 even if their formulas were correct, causing the submission to be penalized three times for the same early mistake.

PySheetGrader has several other rubric types and grading strategies that give the instructor further flexibility. We will get to them soon enough. But first let’s look at how the key file lets the grading engine know which cells needs to be graded and in what order.

Grading instructions: which cells to grade and in what order

We need a way to tell the grading engine how the grading should proceed to prevent PySheetGrader from checking every cell in a key file to figure out which ones have rubric notes attached and thus needs to be graded. We also do not want the cells to be graded in a random order or in an order contrary to the assignment’s expected workflow. This meta-information constitutes grading instructions. It is provided outside the cell-based rubrics in special spreadsheet tabs that must be added to the key file.

Since a spreadsheet can contain multiple tabs (sheets), first we need a way to tell the grading engine which tabs need to be processed and in which order. This information is included in a special tab named SheetGradingOrder in the key file. For an assignment with a single graded tab named Tab1, SheetGradingOrder simply looks like this:

This tab tells the grading engine that the submission file should have a tab named Tab1, which needs to be graded first. That’s it! No other tab in the submission file needs grading in this example, so we have just one row for Tab1. PySheetGrader actually totally ignores SheetGradingOrder‘s first row that contains the column headings Sequence and Tab (they can be named any way the instructor likes). It also ignores the first column with the heading Sequence (the instructor may attach sequence numbers or internal identifiers to the tabs to improve readability, but doesn’t have to). The tabs specified in SheetGradingOrder‘s second column are processed in the order given, starting with the tab named in the second row. Multiple tabs, if needed, can be handled by adding extra rows to SheetGradingOrder.

Next we need a way to specify, for each graded tab, in what order the cells in that tab need to be graded. This instruction is given by adding a new tab to the key file for each tab in the submission file that needs grading. In this example, we have a single tab, Tab1, so we add a new special tab call3e Tab1_CheckOrder to the key file:

The prefix of the tab-specific tab’s name must match the tab for which it is specifying the grading instructions: TabToBeGraded_CheckOrder, where TabToBeGraded is replaced by Tab1 in this example.

As in SheetGradingOrder, the first row containing the column headings is ignored by the grading engine: this row is there just for informational purposes. The most important column is column B, titled Cell above: this column specifies which cells in the associated submission tab needs to be graded and in which order (from top to bottom). Each of these cells must have an attached rubric note in the corresponding key file tab. Column A (Id) specifies a unique identifier for each graded cell: these identifiers are used in the grade report. The remaining columns are optional and specify extra information and behavior that the instructor may want to add to the grading strategy of a cell:

  • Column C: If a description is provided, it is printed in the grade report.
  • Column E: If a feedback text is provided, the feedback text is added to the grade report if the cell receives a score of 0. The feedback text may refer to the submission file’s cell results using the $ prefix before a cell reference to further customize the feedback. In the above example, if the submission cell D6 of Tab1 receives a 0 score and the value of submission cell A6 and D6 equal Whole Chicken and 30.0, respectively, the feedback text for that cell in the grade report will read “The Price of Whole Chicken is not 30.0“.
  • Column D: This is the hidden toggle, used for secretly graded cells. If the student should not know, for a graded cell, which cell is being graded in the grade report, the hidden toggle is set to H. If the cell receives a score of 0, the feedback is added to the graded report, but the cell reference and description are omitted. In the above example, none of the graded cells are hidden, so column D is empty. The hidden toggle is convenient for performing sanity checks: graded cells that students do not have to complete, but hold intermediate calculations or verify correctness conditions based on cells that the students have to complete. For example, a sanity check may stipulate that the computed probabilities in a table column must add to 1.

The grade report

To put all together, let’s have a look at the grade report for the above example. Suppose the student’s submission looks like this:

Column E is added to the submission file to show what the student entered in each highlighted cell. Note that the highlighted cells are the ones that will be graded. Notably, the student entered formulas in some cells, and hardcoded values in others. Depending on whether a cell is graded using the constant or formula rubric type, the student’s entry for that cell may be marked correct or incorrect.

PySheetGrader produces two reports in different formats: plain text and HTML. The plain text report is hard to read and is meant for archival and post-processing purposes, so we will only show the HTML version. The HTML grade report for this submission looks like this:

If you noticed that this report’s format looks somewhat familiar with its green (passing) and pink (failing) rows , the resemblance is not coincidental: it was inspired by the xUnit-class of software testing tools. Here the student made two mistakes: in cells D6 and D11. In cell D6, the student entered a hard-coded value, which is ok since since cell is graded using the constant rubric type (specified as Value check in the grade report), however the value was incorrect according to the key file (it should have been 31.9, but it was 30.0). In cell D11, the student entered another hardcoded value, 40.20. While the value is technically correct according the to the key file, the student should have entered a formula and computed this value from cells D10 and D9 instead. The student gets a score of 0 for both cells.

Let’s look at the grading of cell D9, which calculates a subtotal for the range D6:D8. The cell is to be graded as a formula. The formula specified in the key file is SUM(D6:D8), using Excel’s built-in function SUM. But the student entered the formula D6 + D7 + D8 instead in the submission file for this cell. The grading engine can recognize that the two expressions are equivalent, therefore the student receives the full score this cell.

The formula rubric type can handle most Excel built-in functions if the arguments are plain cell references as opposed to formulas themselves, but cannot handle arbitrary expressions involving arbitrary Excel built-in functions. Fortunately, the grading engine knows how to symbolically reason about some commonly-used built-in functions such as SUM, MIN, and MAX even when these functions’ arguments are formulas themselves. We are working on making other commonly used built-in functions to fully work with the formula rubric type.

Other rubric types

PySheetGrader currently provides a few other rubric types for increased flexibility. Here are some examples:

Soft Formula

The soft formula rubric type represents a compromise grading strategy between the formula and constant rubric types. It forces the student to enter a formula in a cell rather than a hard-coded value, but the grading engine performs a naive value comparison rather than a formula comparison. This rubric type is useful when a cell requires complex Excel built-in functions that the grading engine cannot interpret and the student is not allowed to enter a hard-coded value into the cell.

rubric:
 score: 1
 type: soft_formula
 delta: 0.01

The grading strategy for the soft_formula rubric type is as follows:

  1. If the cell does not contain a formula, the score is 0.
  2. If the cell contains a formula, the cell is graded as a constant rubric type by comparing the submission cell’s evaluated result to key cell’s evaluated result, ignoring any formulas.

The delta property is optional for specifying a precision.

Relative Grading

Relative grading allows a submission cell to be graded using the key cell’s formula and checking the result against the submission cell’s value. This strategy ensures the student’s solution is internally consistent in that it computes the result correctly using the right cell references, but doesn’t care how the student achieves this effect. Relative grading is useful when the same computation can be expressed in many different ways using different cell references. For example, suppose cell A1 depends on cell B1 and cell B1 depends on cells C1 and C2. Thus A1 = f(B1), and B1 = g(C1, C2). Therefore, in fact, A1 = f(g(C1, C2)), and we can also write A1 as a function of C1 and C2 rather than a function of B1. The formula rubric type will not work well in this example.

Relative grading has two variations via these the rubric types: relative and relative_f. For example:

rubric:
 score: 4
 type: relative
 delta: 0.01

The relative rubric type grades the submission cell by first evaluating the key cell’s formula using the submission file’s values referenced by the key cell’s formula. The result is then compared to the evaluated result of the submission cell. If they match, the cell receives the score assigned in the rubric; otherwise the cell receives a score of 0.

For example, suppose the key cell A1 contains the formula IF(A2 = "ok", B2, C2), which needs to be graded using relatively. In the submission file, suppose A1 evaluates to the value 13, A2 contains the string not_ok, B2 evaluates to the value 13, and C2 evaluates to the value 14. The instructor’s formula is evaluated with the student submission’s cells, which yields the result 14. This result is then compared with the value of the submission’s cell A1, which equals 13. In this case, the comparison fails since A1 equals 13, but its expected value is 14.

  • relative_f is simply a stricter version of the relative rubric type. The grading strategy is identical to that of the relative rubric type, but additionally the submission cell is required to be a formula (it cannot be a constant). If the evaluated cell is a hardcoded constant, the cell does not get a score.

Both relative and relative_f optionally admit a precision (via the delta property) when the result of the computation is numeric.

An example of the relative_f rubric type is shown below in a key file:

Here, cell H6 is computed from cells G6, G14 and F14 using the formula (G6 - G14)*100/(F14 - G14). This formula is evaluated on the submission file for the same cell: instead of the key file’s cells, the submission file’s cells G6, G14, and F14 are used in the evaluation. Even if the result does not match the value of cell H6 in the key file, as long as it matches (with a precision of 0.5) the value of cell H6 in the submission file itself and the submission cell computes this value using a formula of its choice, cell H6 is awarded to specified score of 0.5. The submission cell’s formula is not required to be equivalent to the key cell’s formula in this grading strategy.

Specifying alternative solutions for a cell

PySheetGrader allows multiple ways of satisfying a rubric type using alternative cells in the key file. These cells don’t have a rubric note of their own, but instead specify alternative ways in which the original submission cell can meet the rubric criteria. Alternative cells are added to a rubric note using the alt_cells property:

rubric:
 score: 2
 type: formula
alt_cells: 
 - A12
 - A13
 - A14

Below is an excerpt from a key file with rubrics that take advantage of the alternative cells grading strategy. Merged cell V27 (merged with range V28:V31) has a formula rubric type (the formula is shown in column W). The rubric note defines cell V32 as an alternative cell, which contains an alternative formula for cell V27 (again, the formula is shown in column W). The alternative formula uses different cell references, but yields the same result. The grading engine first tries to grade the corresponding submission cell V27 using the original key formula, comparing the original key cell’s formula to the corresponding submission cell’s formula. If this strategy fails, the formula in the alternative cell is used and compared to the formula in the submission cell V27. If this second comparison succeeds, the cell V27 receives the specified score. If none of the original or alternative comparisons succeed, the cell receives a score of 0. Note that alternative cells in the key file are part of the grading specification: they should not be included in the starter template file deployed to the students: students should not, and do not need to, know about them.

The rubric types formula, constant, relative, relative_f, and soft_formula all admit the alt_cells property.

Specifying test cases for a cell

For numeric cells, another cell grading strategy available in PySheetGrader is inspired by unit testing in software development. This strategy treats a graded cell containing a formula as a unit to be tested. The unit behaves as a function, which can be tested by specifying an expected result when given a set of input values for the cells referenced by the formula. The input values are hypothetical values of the cells on which the submission cell depends in the submission file. The actual result is the (evaluated) value of the submission cell. The expected result is the (evaluated) value of the corresponding cell in the key file. The input values and the expected output value constitute a test case, which can be defined in new rubric type called test.

To illustrate, consider the simple case of a graded cell that converts a distance in meters to a distance in kilometers, rounding up the result to a specified number of digits after the decimal point. The solution (submission) uses Excel’s built-in function ROUNDUP(number, decimal_digits). In the submission file, the formulas entered in the highlighted cells are shown under column D. Note the deliberate mistake in cell C5.

Suppose the key file for this problem looks like below. Cell C4 is graded as a formula, specified as ROUNDUP(B4/1000, $C$1). Unfortunately, the grading engine cannot reduce this formula to the submission cell’s formula even though they are mathematically equivalent: it does not attempt to interpret a built-in function’s arguments if the arguments themselves are formulas (rather than constants or plain cell references). As a result, PySheetGrader will fail to grade cell C4 as correct.

To get around this problem, an alternative strategy is to use the test rubric type, as shown in key cell C5’s rubric note below. Note that the key cell C5 has exactly the same formula as the key cell C4. Fortunately, PySheetGrader knows how to evaluate the built-in ROUNDUP function because this function is explicitly implemented by the grading engine along with some other commonly used Excel built-in functions.

Let’s repeat C5’s rubric note before dissecting it:

rubric:
 score: 2
 type: test
test_cases:
 exact:
  output: 2
  input:
   B5: 2000
   C1: 0
  fail: "When B5 is 2000 and C1 is 0, the value of this cell should be exactly $expected!" 
 round_up1:
  output: 2.0
  delta: 0.01
  input:
   B5: 2013
   C1: 1
 round_up2:
  output: 20.19
  delta: 0.01
  input: 
   B5: 20189
   C1: 2
 

The above rubric note specifies three test cases, called exact, round_up1, and round_up2, for cell C5. This is done using the test_cases property. Each test case must be named uniquely. The first test case, exact, states that when the submission cells B5 and C1 (listed under property input) evaluate to values 2000 and 0, respectively, the submission cell C5 (property output) should evaluate to a value of 2. Here, the given output value of 2 is the expected result of the test case. The grading engine evaluates the formula of the submission cell C5 with the given inputs instead of the actual cell contents entered in the submission file. If the result matches the expected result, the test case passes; otherwise, it fails.

Each test case may also have these optional properties:

  • delta: for specifying a precision for numeric cells, and
  • fail: a failure message (defined as a double-quoted string) to be included in the grade report when the test case fails.

The test cases round_up1 and round_up2 in the above rubric uses the delta property. The test case exact defines a failure message using the fail property.

The failure message can reference the submission file’s cells using $ variables just like what we can do in the _CheckOrder tab; e.g., in a failure message string, $B5 would refer to the actual (evaluated) value of the submission cell B5 on the graded tab. In addition, PySheetGrader defines two reserved variables , $expected and $actual, that can be used in failure messages. These variables evaluate to the actual and expected values of the graded cell, respectively.

The test rubric type uses an all-or-none strategy for scoring the cell. A graded cell receives the score assigned to it only if all of the test cases in the test rubric note passes. Otherwise, the cell receives a score of 0. In the above example, cell C5 will receive a score of 0 because two out of its three test cases fail. (In the future, we are planning to add a partial credit version of this rubric type that pro-rates the score according to the proportion of the passing test cases.)

Here is a variation of the test rubric type for cell C6 in which all test cases pass for the original submission file.

The grading report for the associated tab looks like the following for cells C4, C5, and C6.

C4 and C5 receive a score of 0 each. For cell C4, the formula comparison fails because PySheetGrader cannot cope with formulas within the arguments of an Excel built-in function even if the grading engine knows how to evaluate the built-in function: therefore the formula rubric type is unsuitable for cell C4. C5 gets a score of 0 since at least one test case is failing. The test case exact fails because of a (deliberate) mistake in the corresponding submission cell formula. The test case round_up1 fails because the test case is incorrectly specified (the expected value is wrong in the output property). Finally cell C6 gets a full score of 2 because both of its test cases pass.

A word of caution: indentation is significant in a YAML specification, and since rubric notes are YAML specifications, one must pay attention to correct indentation in complex rubrics. The test rubric type is especially prone to indentation mistakes.

Two additional points on the test rubric type: it cannot have an alt_cells property, and it is currently only available only for numeric cells.

Multi-tab spreadsheets and specifying minimum work requirements

Complex spreadsheet assignments often require multiple tabs (sheets). When this happens, each tab is graded separately and in a self-contained manner. To be graded by PySheetGrader, a cell on a graded tab must refer only to cells on the same tab. The SheetGradingOrder tab specifies the order in which the different tabs should be graded, and each graded tab must in turn have its own _CheckOrder tab that provides specific grading instructions for that tab.

SheetGradingOrder can can optionally specify extra settings related to a graded tab. One of these settings is minimum work requirement, which tells the grading engine to suppress all feedback associated with that tab in the grading report if the tab receives a total score below a specified minimum for that tab. The minimum work requirement prevents students from relying on excessive experimentation. If the minimum work requirement is not met, the grading report can include a custom message if a feedback message specified for that tab. An example is shown below.

Vocareum

PySheetGrader can run on as a standalone on a local machine and batch-process collected submissions using simple scripts. However, its real power comes through when integrated into a cloud-based submission management platform. My choice is Vocareum, which I have been using successfully since 2015 to automate unit testing and model checking assignments. PySheetGrader is easy to setup on Vocareum, which can further integrate with popular LMS systems. Vocareum setup instructions and scripts are available upon request.

How does auto-grading affect student performance?

I deployed PySheetGrader on Vocareum for the first time in the Spring 2021 offering of my course. The data underlying my spreadsheet assignments and labs change from semester to semester. There also are variability points whose contents I switch for differentiation and components that I turn on and off to create multiple versions. But essentially, the assignments and labs structurally remain the same for valid comparison between different offerings.

I compared the performance of the 2021 cohort to that of the 2020, 2019, and 2018 cohorts for five assignments and three labs. Assignments and labs were manually graded from 2018 through 2020. The main differences between an assignment and a lab are that (1) labs are performed live in class (remotely during a live session or in-person inside the classroom) whereas assignments are take-home; and (2) lab are guided by the instructor with synchronized steps and sharing of intermediate results whereas assignments are not due to their offline nature. The 2020 and 2021 offerings of the class were completely remote, whereas the 2018 and 2019 offerings were in person. Other than the delivery mode, there were no notable differences among the different offerings. The table below compares student performance in the 2021 offering to student performance in the previous offerings.

2021Pre-2021
Total no. of students3564
Total no. of assignments graded163313
Total no. of labs graded93189
Average grade for assignments96.80%92.74%
Average grade for labs98.81%99.07%
% Improvement for assignments from Pre-2021 to 20214.38%
% Improvement for labs from Pre-2021 to 2021-0.27%
p-value (significance) for improvement in assignment performance< 0.00000001
p-value (significance) for improvement in lab performance0.499
Student performance with auto-grading (2021) and without auto-grading (Pre-2021)

The nature of the labs ensures that most students get full marks: the tasks are guided, and students complete the steps in a lock-step manner and check their results. Non surprisingly, there was no significant difference in the lab grades. Students’ performance in take-home assignments however improved on average by 4.38%, which was highly statistically significant. I attribute this improvement to the ability to give fast and granular feedback to students with multiple chances to correct their mistakes, which is not feasible or scalable without automated grading. While the 4.38% improvement seems small, it is large enough to push a student to the next letter grade in their final assessment.

Future plans

PySheetGrader is written in Python. It is still under development: we are constantly improving it, addressing glitches, and adding new capabilities. Our philosophy is to maintain backward compatibility when we introduce new features. Some of the planning capabilities include:

  • new and more advanced rubric types for maximum flexibility,
  • negative grading rubrics,
  • killer cells to specify fatal mistakes,
  • ability to embed Python code (assertions) into rubrics for Python-savvy instructors and advanced assignments,
  • better feedback mechanisms for students,
  • a professional-grade testing process for making changes and adding new features,
  • better error reporting for instructors, and
  • plagiarism deterrence capabilities.

If you are interested…

If you are interested in giving PySheetGrader a try as a beta user, please ping me.

Related posts

Improvements to PySheetGrader

Testing PySheetGrader and its Assignments

Acknowledgements

Many thanks to several CMU students who have been contributing to the development of PySheetGrader. Ricardo Suranta helped me devise the original concepts, created a proper software architecture for the codebase which has served us very well, implemented the base capabilities of PySheetGrader in a fully functional prototype, and built its initial release to be used with Vocareum as part of his graduate research project. Prerna Singh and Fang Han, my former TAs, managed PySheetGrader’s first deployment in a live course while implementing the necessary patches and planned improvements on the fly. The improvements included a JUnit-style HTML report, refactoring of the Vocareum scripts, relative and soft formula rubric types, customized feedback messages to test cases in the test rubric type, the minimum work requirement setting, and the hidden grading setting. They performed all of this in an incremental manner, deploying new features and patches at every assignment. Ksheeraja Raghavan implemented the long-awaited negative grading strategy and fixed some obscure bugs. Sandeep Keshavegowda worked on a new rubric type for performing straightforward sanity checks as an alternative to the test rubric type and implementing grading instructions to specify showstopper mistakes and cell-grading prerequisites.

Announcing the SEENG 2021 Workshop

We are please to announce the Third International Workshop on Software Engineering Education for the Next Generation!

SEENG 2021 will be held in Madrid, Spain. It will be an integral component of the Joint Track on Software Engineering Education and Training (JSEET) at the 43rd IEEE/ACM International Conference on Software Engineering (ICSE).

The Call for Papers is available on the JSEET website, where we invite you to submit a position paper of maximum 4 pages containing a position statement and your profile, plus a single page of references.

Note that we would love to see authors participate in both the main JSEET track and SEENG workshop. Hence we encourage authors to submit both a “long” paper (research, experience, idea, and tool paper) to the main JSEET track AND a position paper to the SEENG workshop.

We look forward to seeing you in Madrid!

Dual-Track Agile in Software Engineering Education

In the previous post, related to my ICSE’19 paper on Dual-Track Agile in Software Engineering Education [1],  I focussed on the question: Why do we need to teach Interaction Design to Software Engineering students? 

This post addresses the next question: How could we teach Interaction Design while staying away from “Big-Design-Up-Front”?

The proposed answer is situated in the context of a course aimed at enabling students (who are doing a Master of Science in Software Engineering at Carnegie Mellon University in Silicon Valley) to design and implement software systems that at once useful, usable, and enjoyable to use, while making a unique contribution to society. This is done by introducing Interaction design (IxD) and Software Product Management (PM) in the context of dual-track agile.

Note that I am referring to Software Product Management (PM) versus Requirement Engineering (RE) simply because nowadays product managers are often the ones who write and prioritize requirements (like user stories), with a goal of organizing the work to support development.

Introducing New Practices

Teaching IxD requires introducing new practices, including for instance Contextual Inquiry, Affinity Mapping, Persona Modeling, Storyboarding, Design Walkthrough, Brainstorming, Ideation, How-Might-We, Sketching, Prototyping, Think-Aloud Usability Testing, Usability Testing, Heuristic Evaluation, Style Guide, and so on.

For students, like mine, who have some experience in software development but no experience with IxD,  learning those practices could be challenging and time consuming (specially when they involve collaborating with stakeholders). So leaning to apply the practices effectively is going to take most of the semester, with limited time for implementation and iterations. Given that situation, the challenge is: How do we stay away from teaching “Big-Design-Up-Front”?

Staying Away from “Big-Design-Up-Front”

When I first joined Carnegie Mellon University in 2012, I taught an existing course on Requirements Engineering that introduced some IxD practices with no implementation. As a result, and even though no specific software development process was mentioned, the students’ perception was that we were talking about “Big-Design-Up-Front”.

To address the problem, when asked to create my own new course in 2015, I decided to add the implementation of a Minimum Viable Product (MVP) and to frame the work in the context of an iterative process. I called this process the Double-Wheel, as it was my adaptation of the Wheel from the UX Book [2], with IxD practices covered in the first wheel and the implementation of the MVP in the second wheel. However, because the new IxD practices and MVP implementation were done sequentially during the course project, the perception was still “Big-Design-Up-Front” for some students.

Finally, last semester, following an empirical study at a company called Pivotal Software [3], I replaced the Double-Wheel with Dual-Track Agile, to better reflect what we observed in industry, and I introduced some concurrency between the IxD practices and the MVP implementation. Even though the concurrency was only minimum, these changes managed to eventually shift the perception away from“Big-Design-Up-Front”.

Introducing Dual-track Agile

To introduce dual-track agile in the classroom I came-up with a model  that simplifies the messy reality for students, as illustrated in Figure 1.

Fig.1: A simple model of Dual-Track Agile for students

The model includes two tracks. The first track is about discovering what functionality to build. The second track is about delivering those functionalities. The two tracks run continuously and in parallel. This is about continuous product discovery and delivery.

The red activities are led by interaction designers, the green ones by product managers, and the blue ones by developers. The colors do not represent silos but one team with multi-disciplinary skills. Any team member could be involved in any activity.

Dual-track agile is introduced during a semester-long project. The students propose their own projects by identifying product opportunities with unique contribution to society (the domains covered by students so far include education, food waste, homelessness, medicine, natural disaster relief, recycling, traffic congestion, and the elderly). Students auto-form their teams of 4 to 5 based on project interest. They recruit their own stakeholders.

Starting with their own opportunity, and working closely with stakeholders, each team performs five tasks during the semester. The tasks are presented below.

Needs Elicitation

First, students do some research and analysis to understand the stakeholder’s needs (as illustrated in Figure 2).

Fig.2: Needs Elicitation

Conceptual Design

Second, students start to apply design thinking to generate solution concepts and validate the concepts with stakeholders (as illustrated in Figure 3).

Fig.3: Conceptual Design

Solution Envisioning

Third, based on the best concepts, students define a long-term vision for the product and a MVP for implementation. They also validate their vision with stakeholders (as illustrated in Figure 4).

Fig.4: Solution Envisioning

Prototyping & Backlog Preparation

Next, students create a prototype for their MVP, validate the prototype with stakeholders, and prepare a backlog by creating and prioritizing stories (as illustrated in Figure 5).

Fig.5: Prototyping & Backlog Preparation

Continuous Product Discovery and Delivery

Finally, students do not only build the MVP that is reviewed by stakeholders, but they also go back to Discovery to identify the next valuable product increment, prototype the increment, and create the corresponding stories (as illustrated in Figure 6).

Fig.6: Continuous Product Discovery and Delivery

That way, the team knows what valuable product increment should be implemented next and gets a taste of the continuous and concurrent nature of product discovery and delivery.

Limitations

One of the limitations of the course is its reliance on a pre-requisite course, called Foundations of Software Engineering (FSE) [4]. FSE focuses on “traditional” agile development and the related development practices and technology.

This pre-requisite allows me to teach dual-track agile without having to worry about continuous delivery and while allocating enough time to teach the new practices related to product discovery.

References

  • [1] Cécile Péraire. “Dual-Track Agile in Software Engineering Education”, 41st International Conference on Software Engineering: Software Engineering and Education Track, ICSE-SEET  (2019). Available online.
  • [2] Rex Hartson and Pardha Pyla. “The UX Book: Process and Guidelines for Ensuring a Quality User Experience”, 1st ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. (2012).
  • [3] Todd Sedano, Paul Ralph and Cécile Péraire. “The Product Backlog”, 41st International Conference on Software Engineering, ICSE (2019). Available online.
  • [4]  Hakan Erdogmus and Cécile Péraire. “Flipping a Graduate-Level Software Engineering Foundations Course”, 39th International Conference on Software Engineering: Software Engineering and Education Track, ICSE-SEET  (2017). Available online.

Why Teaching Interaction Design

In the context of ICSE’19 Software Engineering and Education Track (SEET), I presented a paper on Dual-Track Agile in Software Engineering Education [1]. The presentation focussed on two questions:
(1) Why do we need to teach Interaction Design (IxD) to Software Engineering (SE) students?
(2) How could we teach IxD while staying away from “Big-Design-Up-Front”?

This post addresses the first question, while another post addresses the second question.

As a short answer to the first question (Why do we need to teach IxD to SE students? ), let me borrow a quote by Bill Buxton from Microsoft Research: If we don’t teach interaction design, our students “might get a sub-optimal design right, but will almost never get the right design” [2].

Exploring Alternatives

One underlying reason given by Buxton [2] is the fact that interaction designers explore various equality viable alternatives before fully developing one idea for the solution (as illustrated in Figure 1).

Fig. 1: Explore various equally viable alternatives [2]

Exploring alternatives is not typically done in SE, where we tend to fixate on the first idea that comes to mind.

Note about Agile: The fact that the Agile Manifesto [3] values working software and delivering software early only reinforces the “idea fixation” behavior.

Co-evolution

Another underlying reason is the fact that interaction designers refine their understanding of the problem (or opportunity) and their understanding of the solution at the same time (as illustrated in Figure 2). For instance they use sketches, which have been shown to be a very effective way  of reflecting on both the problem and the solution [4].

Fig. 2: Co-evolve problem and solution spaces

Again, co-evolution of problem and solution understanding is not typically done in SE, where we tend to by-pass the creative process by going directly from the problem to writing requirements (like user stories).

Note about Agile: Agile’s short iterations support co-evolution  by providing repeated opportunities to review the solution. However, if we think about what typically happens during a Sprint Review for instance, the discussions are often in the solution space, and about accepting the requirements (like user stories), with few opportunities to refine the understanding of the problem.

A direct implication is that we should generally hold off writing detailled requirements until after the creative process has generated a concrete idea for the solution (i.e. a solution concept). Only then, requirements (like user stories with acceptance criteria) should be created to decompose the solution concept into small chunks for implementation. As a positive side effect, having a concrete idea for the solution makes writing those user stories fairly easy compare to writing stories in the abstract only based on a potentially sub-optimal understanding of the problem.

Conclusion

Interaction design supports creativity. It encourages students to explore early solution concepts so they can deeply understand the problem (or opportunity), select the best alternative, and hence create the right solution.

The outcome of the creative process is an innovative solution concept. Detailled requirements, like user stories with acceptance criteria, should generally be created only after the generation of the solution concept. They should be created as a mean to organize the implementation work versus a way to imagine a solution.

References

  • [1] Cécile Péraire. “Dual-Track Agile in Software Engineering Education”, 41st International Conference on Software Engineering: Software Engineering and Education Track, ICSE-SEET (2019). Available online.
  • [2]  Bill Buxton. “Sketching and experience design”, Stanford University Human-Computer Interaction Seminar (2007). Available online.
  • [3] Manifesto for Agile Software Development (2001). Available online.
  • [4]  Bill Buxton. “Sketching user experiences: getting the design right and the right design”, Morgan Kaufmann (2010).

Teaching SE Concepts by Engaging Millennials in Discussion of Real-World Scenarios

The traditional teaching of SE concepts is now gone!! Several teaching tools, methodologies, and techniques have been used and experimented with to bring SE concepts into reality for Millennials. We have been involved in teaching SE courses for Millennials and found that teaching the Requirements Engineering and Software Testing concepts with lecture-based learning is typically difficult and poses many challenges (due to the multiple learning dimensions). The visualization and effectiveness of the tools and techniques defined in the textbooks are trivial for Millennials. Hence, we thought of developing cases (i.e., real-world examples that are open-ended) that can be used to complement lecture-based learning. SEABED (Software Engineering – Case-Based Learning Database) is an initiative toward developing cases meant to boost the teaching/learning of different phases of a typical SE process. This database, in its early stage, is publicly available where anyone can contribute new cases and reuse existing cases for their SE courses.

For experimenting with Case-Based Learning (CBL) in SE education, we developed cases for software requirements, design, and testing; and also evaluated the effectiveness of CBL in the classroom sessions. In SEEM 2018, we got the opportunity to present our work on “Teaching Requirements Engineering Concepts using CBL” to the researchers and academics in SE education. In this work, we reported the results of the study conducted to assess the effectiveness of the Case-Based Learning (CBL) methodology in facilitating the learning of several RE concepts. The study was conducted with students in the 20-22 years age group. The outcome of the CBL exercise was positive, and we found that our learning objectives were achieved. Based on our experiences with CBL sessions, we also stated various related challenges, recommendations, and lessons learned. We particularly want to dedicate this work to our friend, collaborator, mentor (and a Co-Founder of SEABED), and a lovely human being Late Dr Ashish Sureka (Author 4 of our paper).

We would like to thank Hakan and Cécile for the opportunity and also SEEM program committee for their rigorous comments/suggestions to improve our presented paper and also insights to move our research forward in this  (related to CBL, empirical study design and interpretation of findings). We would also like to highlight that the way presentations and activity sessions were managed during the workshop was very interesting. Specifically, the brainstorming and collaborative activities carried out during the half-day session is a memorable experience and a new way of discussing SE education challenges for me.