1 Stimuli

The DICE app was designed to fit into a consumer researcher’s typical workflow where participants are recruited (e.g., via Prolific) before they are exposed to stimuli and survey items (e.g., in Qualtrics). The key procedural difference in using the DICE app (compared to software such as Adobe Photoshop or Microsoft Powerpoint) is that the stimuli are not configured graphically but tabularly: the DICE app requires researchers to configure a csv file that provides information on each post, such as the actual content, engagement metrics, and the corresponding username. The app then loops through that file (while filtering for conditions) to display each row as a separate post embedded in an interactive feed. The advantage of this procedural difference is that while it requires the same amount of information as the graphical configuration, it is less time consuming and less error prone because the software handles the graphical representation consistently. In addition, the tabular configuration is more accessible as researchers are trained to work with csv, as opposed to Photoshop files.

1.1 Overview

Here, we provide a configuration csv file that serves as a template for researchers who configure their first set of stimuli. In Table 1.1, we display and describe an exemplary row of this template containing social media posts on the Yosemite National Park in California. The first column in Table 1.1 describes a configuration column’s scope, that is, whether it defines how, when and to whom a post is displayed (design), contains a post’s actual content or engagement metrics such as the number of likes the post has received (post), or whether it describes a post’s author (user). The second column lists all of the input csv file’s required configuration columns.¹ We then describe the configuration columns’ data types, provide examplary values, and describe them in more detail.

Table 1.1: Description of Input csv File
Scope	Column	Type	Example	Description
design	condition	text	A	Unique identifier for the condition (e.g., group A or B). The app will create the same number of conditions and randomly assign participants to them.
design	sequence	numeric	1	Defines the order in which posts are displayed (ascending). If missing, random integers will be assigned, leading to between-subject randomization.
design	commented_post	boolean	0	If 1, this post will be displayed at the top of the feed while all other posts are displayed as comments of this post.
post	doc_id	numeric	1	Unique identifier for each post in each condition.
post	datetime	text	01.03.22 06:00	The time a post was published, formatted as dd.mm.yyyy hh:mm:ss.
post	text	text	Just experienced the most incredible sunrise… #Yosemite #NatureLovers	Content of the post. Can contain hashtags, emojis, and URLs.
post	media	text	https://images.unsplash.com/photo-1472396961693…	URL to a publicly available image or GIF. We recommend using compressed formats like .webp or hosting images on platforms such as Unsplash or GitHub.
post	alt_text	text	Sunset illuminates Half Dome…	Optional description of the media element for accessibility purposes.
post	likes	numeric	15	Number of likes the post has received.
post	reposts	numeric	6	Number of reposts the post has received.
post	replies	numeric	2	Number of replies the post has received.
post	sponsored	boolean	0	If 1, the post will be displayed as a sponsored post.
post	target	text	https://example.com	URL of the landing page for sponsored posts. If sponsored = 1, this URL will be displayed as the target for the advertisement.
user	username	text	NatureFanatic	The username of the post’s author.
user	handle	text	NatureFanatic88	The handle of the post’s author, automatically prefixed by an @ symbol.
user	user_description	text	Lover of all things nature…	A brief description or bio of the user, displayed when hovering over the user’s profile.
user	user_image	text	https://images.unsplash.com/photo-1522506209496…	URL to the user’s profile picture. Similar to media, we recommend using stock images or hosting them on platforms such as GitHub, preferably in compressed formats.
user	user_followers	numeric	4523	Number of followers the user has.

1.2 Design Columns

Before Video 1.1 describes the columns in more detail, we focus on the design columns and describe how researchers can configure them to implement their experimental designs.

1.2.1 Conditions

Researchers can leverage the condition configuration column to set up between-subjects designs by assigning the respective rows with N different values (e.g., “treatment 1”, “treatment 2”, …, “treatment N”). The DICE app will then count the number on unique values and create N different treatment groups. When launching a study, participants will then be assigned to these groups randomly and uniformly such that the group sizes do not differ in expectation. In case study 2, we show how we leveraged this variable to create two conditions that contain two different sets of nineteen organic posts but share the same sponsored post. As each row can only be assigned to one condition, this required us to enter the sponsored post twice within the configuration file: both versions contained the same post and user configurations and only differed with respect to their condition. Similarly, if researchers want to display the same set of organic posts in N conditions, then they have to enter N copies of that set of posts and adjust the condition column accordingly.

1.2.2 Sequences

Another important configuration column is sequence. It defines the order in which posts are displayed and gives researchers control over the order in which posts are displayed. Explicit sequences can be useful to study ordering and ranking of social media posts. Researchers can, for instance, use the user interactions measured in a previous study to rank the by engagement to approximate a platform’s recommender systems. Importantly, this column is special as the DICE app replaces missing values with random numbers for each participant individually. This is a feature we leveraged in both of our case studies: in Case Study 2, we only defined the sequence of the sponsored post such that it was always displayed in fifth position. The sequence configuration column was not assigned to any of the organic posts. Hence, each participant experienced a different sequence of organic posts. In Case Study 1, we left that sequence column empty for every post to randomize the order of both sponsored and organic posts. This resulted in a diverse set of sequences that we exploited to study primacy effects in ad recall.

1.2.3 Threads

Finally, the commented_post configuration column is interesting for researches who want to investigate discussions as it changes the social media feed’s appearance slightly. If one post is assigned to a 1 in this column, this post will serve as a “parent post” whereas all other posts will be displayed a comments of that parent post.

1.3 Image Requirements

The DICE app does not host images directly. Instead, you must provide direct links (raw URLs) to your images hosted elsewhere. These URLs should point directly to the image file itself, not to a webpage containing the image.

What is a Raw Image URL? A raw image URL points directly to the image file and typically ends with a file extension like .jpg, .png, .gif, or preferably .webp. These URLs provide direct access to the image file without any surrounding webpage elements. For instance, a raw GitHub image URL might look like https://raw.githubusercontent.com/username/repository/main/images/example.jpg, while an Imgur URL might be https://i.imgur.com/abcd123.png, and a Giphy URL could be https://media.giphy.com/media/abc123/giphy.gif.

1.3.1 Hosting Your Own Images

Several platforms can host your images for use with DICE. GitHub offers a straightforward approach: upload images to a public repository and use the raw URL that starts with raw.githubusercontent.com. Imgur provides another popular option as an image hosting service that readily provides direct image links. Cloud storage services like AWS S3 or Google Cloud Storage can also work well, though you’ll need to ensure public access is enabled.

1.3.2 Getting Raw Image URLs

The process of obtaining a raw URL varies by platform. On GitHub, navigate to your uploaded image and click the “Raw” button - the resulting URL in your browser will start with raw.githubusercontent.com. When using services like Imgur, you can usually right-click on the uploaded image and select “Copy image address” or a similar option. The key is ensuring your URL ends with an image extension (.jpg, .png, etc.).

1.3.3 Verifying Your URLs

You can easily test whether your image URL is correct by pasting it directly into a browser’s address bar. If the browser shows only the image itself, without any surrounding webpage elements, the URL is suitable for use with DICE. This simple verification step can save time troubleshooting later.

1.4 Video Tutorial

Here, we describe the configuration of the stimuli we used in our brand safety case study (see Section 4.2) in detail.

Figure 1.1: DICE: csv file configuration

1.5 Best Practices

Image optimization plays a crucial role in your study’s performance. Compressing your images before hosting them ensures faster loading times for participants. Web-optimized formats like .webp often provide the best balance of quality and performance. There are many online converters and even python modules available.²

It’s crucial to understand that directly linking to images from social media platforms like X (formerly Twitter) may lead to problems: if a user updates their profile picture or deletes a post, the image URL will break and your study’s stimuli will be incomplete. Instead, download any images you want to use from social media and host them on a platform you control, such as GitHub. This ensures your stimuli remain stable throughout your study’s duration.

Remember to maintain a backup of all images and verify that your image URLs remain publicly accessible throughout your research period. This approach provides the most reliable way to ensure your study’s integrity over time.

You will likely create your own images. To include them in your feed, you need to host them somewhere publicly. We usually use Github for these purposes. We learnt that it makes a lot of sense to compress the images you are using such that your stimuli are less affected by slow internet connections on your participants’ side. In addition, it helps to use an image format that is optimized for web usage. We made good experiences with the .webp format.

1.6 Archiving for Review

For academic documentation and review purposes, it’s valuable to create permanent archives of your experimental feeds. Services like perma.cc or the Internet Archive’s Wayback Machine can capture and preserve your feeds exactly as they appeared during the study. These archives serve as reliable references for reviewers and future researchers, ensuring that your experimental stimuli remain accessible even if the original hosting platforms change or links break over time. When writing up your research, you can include these permanent archive links in your methodology section or supplementary materials.

1.7 Public Stimuli Resources & Databases

We created a set of synthetic users, where we matched actual usernames and handles (see McKelvey et al. 2017) with stock images (from unsplash.com) and some synthetic, LLM-generated information. Also see generated.photos and their academic data for user images and behindthename.com for user names.

In addition, the data science competition platform kaggle hosts a variety of annotated social media related datasets (e.g., the Social Media Sentiments Analysis Dataset, the Cyberbullying Dataset, Social Media Influencers in 2022, or Political Social Media Posts) which researchers can use to create their stimuli. The same applies to Hugging Face, which hosts a Reddit confessions dataset, for instance.

Finally, you can use Facebook’s Ad Library, a publicly accessible database that archives advertisements run by advertisers across Meta’s platforms to copy (or draw inspiration for) copy, creative as well as a landing page for sponsored posts.

Researchers can change the order of these columns as they want. In addition, they can add additional columns for internal purposes. We recommend, to document a post’s source (i.e., a url) if a post was scraped or copied from social media or an ad library.↩︎
See our repository (url provided after review) for more details on the technical implementation.↩︎