| Scope | Column | Type | Example | Description |
|---|---|---|---|---|
| design | condition | text or numeric | A | Required. Unique identifier for the condition (e.g., “A” / “B” or 1 / 2). The app creates the same number of conditions as unique values and randomly assigns participants to them. |
| design | sequence | numeric | 1 | Optional. Defines the order in which posts are displayed (ascending). If missing, random integers are assigned per participant, leading to between-subject randomization. |
| design | commented_post | boolean | 0 | Optional (Legacy for Twitter only). If 1, this post is displayed at the top of the feed and all other posts appear as comments beneath it. Defaults to 0 if missing. |
| post | doc_id | numeric | 1 | Required. Unique numeric identifier for each post. Must be an integer (e.g., 1, 2, 3). |
| post | datetime | text | 01.03.22 06:00 | Required. The time a post was published, formatted as dd.mm.yyyy hh:mm:ss. |
| post | text | text | Just experienced the most incredible sunrise… #Yosemite #NatureLovers | Required. Content of the post. Can contain hashtags, emojis, and URLs. |
| post | media | text | https://images.unsplash.com/photo-1472396961693… | Required. Direct (raw) URL to an image file (.jpg, .png, .gif, or .webp). Posts without a media URL are not displayed in Instagram and Stories feeds. |
| post | alt_text | text | Sunset illuminates Half Dome… | Optional. Accessibility description of the media element. |
| post | likes | numeric | 15 | Required. Number of likes the post has received. Defaults to 0 if missing. |
| post | reposts | numeric | 6 | Required. Number of reposts the post has received. Defaults to 0 if missing. |
| post | replies | numeric | 2 | Required. Number of replies the post has received. Defaults to 0 if missing. |
| post | sponsored | boolean | 0 | Optional. If 1, the post is displayed as a sponsored/promoted post. Defaults to 0 if missing. |
| post | target | text | https://example.com | Conditionally required. URL of the landing page for sponsored posts. Must be provided if sponsored = 1. |
| user | username | text | NatureFanatic | Required. The display name of the post’s author. |
| user | handle | text | NatureFanatic88 | Required. The handle of the post’s author, automatically prefixed by an @ symbol. |
| user | user_description | text | Lover of all things nature… | Optional. A brief bio of the user, shown in a tooltip when hovering over the profile picture. |
| user | user_image | text | https://images.unsplash.com/photo-1522506209496… | Required. Direct (raw) URL to the user’s profile picture. For Twitter, Instagram, and LinkedIn a colored avatar with initials is used as fallback, but providing an image is strongly recommended. |
| user | user_followers | numeric | 4523 | Required. Number of followers the user has. |
1 Generate Stimuli
The DICE app was designed to fit into a consumer researcher’s typical workflow where participants are recruited (e.g., via Prolific) before they are exposed to stimuli and survey items (e.g., in Qualtrics). The key procedural difference in using the DICE app (compared to software such as Adobe Photoshop or Microsoft Powerpoint) is that the stimuli are not configured graphically but tabularly: the DICE app requires researchers to configure a csv file that provides information on each post, such as the actual content, engagement metrics, and the corresponding username. The app then loops through that file (while filtering for conditions) to display each row as a separate post embedded in an interactive feed. The advantage of this procedural difference is that while it requires the same amount of information as the graphical configuration, it is less time consuming and less error prone because the software handles the graphical representation consistently. In addition, the tabular configuration is more accessible as researchers are trained to work with csv, as opposed to Photoshop files.
1.1 Overview
Here, we provide a configuration csv file that serves as a template for researchers who configure their first set of stimuli. In Table tbl-template, we display and describe an exemplary row of this template containing social media posts on the Yosemite National Park in California. The first column in Table tbl-template describes a configuration column’s scope, that is, whether it defines how, when and to whom a post is displayed (design), contains a post’s actual content or engagement metrics such as the number of likes the post has received (post), or whether it describes a post’s author (user). The second column lists all of the input csv file’s required configuration columns.1 We then describe the configuration columns’ data types, provide examplary values, and describe them in more detail.
All 18 columns listed in Table tbl-template must exist as column headers in your CSV, even if some cells are left empty. DICE will report an error if any column is missing. Columns marked as optional may contain empty values; columns marked as required must have a value in every row.
The media column is required for Instagram and Stories feeds: posts without a media URL are silently skipped and not shown to participants. For Twitter and LinkedIn, media is optional — text-only posts are displayed normally.
1.2 Design Columns
Before Video fig-tutorial-1 describes the columns in more detail, we focus on the design columns and describe how researchers can configure them to implement their experimental designs.
1.2.1 Conditions
Researchers can leverage the condition configuration column to set up between-subjects designs by assigning the respective rows with N different values (e.g., “treatment 1”, “treatment 2”, …, “treatment N”). The DICE app will then count the number on unique values and create N different treatment groups. When launching a study, participants will then be assigned to these groups randomly and uniformly such that the group sizes do not differ in expectation. In case study 2, we show how we leveraged this variable to create two conditions that contain two different sets of nineteen organic posts but share the same sponsored post. As each row can only be assigned to one condition, this required us to enter the sponsored post twice within the configuration file: both versions contained the same post and user configurations and only differed with respect to their condition. Similarly, if researchers want to display the same set of organic posts in N conditions, then they have to enter N copies of that set of posts and adjust the condition column accordingly.
1.2.2 Sequences
Another important configuration column is sequence. It defines the order in which posts are displayed and gives researchers control over the order in which posts are displayed. Explicit sequences can be useful to study ordering and ranking of social media posts. Researchers can, for instance, use the user interactions measured in a previous study to rank the by engagement to approximate a platform’s recommender systems. Importantly, this column is special as the DICE app replaces missing values with random numbers for each participant individually. This is a feature we leveraged in both of our case studies: in Case Study 2, we only defined the sequence of the sponsored post such that it was always displayed in fifth position. The sequence configuration column was not assigned to any of the organic posts. Hence, each participant experienced a different sequence of organic posts. In Case Study 1, we left that sequence column empty for every post to randomize the order of both sponsored and organic posts. This resulted in a diverse set of sequences that we exploited to study primacy effects in ad recall.
1.2.3 Threads
Finally, the commented_post configuration column is interesting for researches who want to investigate discussions as it changes the social media feed’s appearance slightly. If one post is assigned to a 1 in this column, this post will serve as a “parent post” whereas all other posts will be displayed a comments of that parent post.
1.3 Image Requirements
The DICE app does not host images directly. Instead, you must provide direct links (raw URLs) to your images hosted elsewhere. These URLs should point directly to the image file itself, not to a webpage containing the image.
What is a Raw Image URL? A raw image URL points directly to the image file and typically ends with a file extension like .jpg, .png, .gif, or preferably .webp. These URLs provide direct access to the image file without any surrounding webpage elements. For instance, a raw GitHub image URL might look like https://raw.githubusercontent.com/username/repository/main/images/example.jpg, while an Imgur URL might be https://i.imgur.com/abcd123.png, and a Giphy URL could be https://media.giphy.com/media/abc123/giphy.gif.
1.3.1 Hosting Your Own Images
Several platforms can host your images for use with DICE. GitHub offers a straightforward approach: upload images to a public repository and use the raw URL that starts with raw.githubusercontent.com. Imgur provides another popular option as an image hosting service that readily provides direct image links. Cloud storage services like AWS S3 or Google Cloud Storage can also work well, though you’ll need to ensure public access is enabled.
1.3.2 Getting Raw Image URLs
The process of obtaining a raw URL varies by platform. On GitHub, navigate to your uploaded image and click the “Raw” button - the resulting URL in your browser will start with raw.githubusercontent.com. When using services like Imgur, you can usually right-click on the uploaded image and select “Copy image address” or a similar option. The key is ensuring your URL ends with an image extension (.jpg, .png, etc.).
1.3.3 Verifying Your URLs
You can easily test whether your image URL is correct by pasting it directly into a browser’s address bar. If the browser shows only the image itself, without any surrounding webpage elements, the URL is suitable for use with DICE. This simple verification step can save time troubleshooting later.
1.4 Video Tutorial
Here, we describe the configuration of the stimuli we used in our brand safety case study (see sec-brand-safety-case) in detail.
1.5 Best Practices
Image optimization plays a crucial role in your study’s performance. Compressing your images before hosting them ensures faster loading times for participants. Web-optimized formats like .webp often provide the best balance of quality and performance. There are many online converters and even python modules available.2
It’s crucial to understand that directly linking to images from social media platforms like X (formerly Twitter) may lead to problems: if a user updates their profile picture or deletes a post, the image URL will break and your study’s stimuli will be incomplete. Instead, download any images you want to use from social media and host them on a platform you control, such as GitHub. This ensures your stimuli remain stable throughout your study’s duration.
Remember to maintain a backup of all images and verify that your image URLs remain publicly accessible throughout your research period. This approach provides the most reliable way to ensure your study’s integrity over time.
You will likely create your own images. To include them in your feed, you need to host them somewhere publicly. We usually use Github for these purposes. We learnt that it makes a lot of sense to compress the images you are using such that your stimuli are less affected by slow internet connections on your participants’ side. In addition, it helps to use an image format that is optimized for web usage. We made good experiences with the .webp format.
1.6 Archiving for Review
For academic documentation and review purposes, it’s valuable to create permanent archives of your experimental feeds. Services like perma.cc or the Internet Archive’s Wayback Machine can capture and preserve your feeds exactly as they appeared during the study. These archives serve as reliable references for reviewers and future researchers, ensuring that your experimental stimuli remain accessible even if the original hosting platforms change or links break over time. When writing up your research, you can include these permanent archive links in your methodology section or supplementary materials.
1.7 Public Stimuli Resources & Databases
We created a set of synthetic users, where we matched actual usernames and handles (see McKelvey et al. 2017) with stock images (from unsplash.com) and some synthetic, LLM-generated information. Also see generated.photos and their academic data for user images and behindthename.com for user names.
In addition, the data science competition platform kaggle hosts a variety of annotated social media related datasets (e.g., the Social Media Sentiments Analysis Dataset, the Cyberbullying Dataset, Social Media Influencers in 2022, or Political Social Media Posts) which researchers can use to create their stimuli. The same applies to Hugging Face, which hosts a Reddit confessions dataset, for instance.
Finally, you can use Facebook’s Ad Library, a publicly accessible database that archives advertisements run by advertisers across Meta’s platforms to copy (or draw inspiration for) copy, creative as well as a landing page for sponsored posts.
Researchers can change the order of these columns as they want. In addition, they can add additional columns for internal purposes. We recommend, to document a post’s source (i.e., a url) if a post was scraped or copied from social media or an ad library.↩︎
See our repository (url provided after review) for more details on the technical implementation.↩︎