Educational Excellence
Learning Institute
by Michael O’Connell
When I started this dive into Artificial Intelligence (AI), training models, and generating AI art, I had a clear goal in mind, and steps on how to achieve that goal. I had the hardware planned out, tutorials queued up, and a high-level mission to see what is involved in training AI models, discover the nuances and complexities it brings, and see if we can use an AI like Stable Diffusion to "help define style in a body of work". So, I rolled up my sleeves, popped open my web browser, and went down the rabbit hole of online tutorials and GitHub notes.
Building a computer is right in my wheelhouse. While I've never spent this much money building a computer before, it's an old hat at this point. So, the first "challenge" really wasn't a challenge at all - it was quite fun!
Finding tutorials was a bit of a challenge though, or I should say, finding accurate and helpful tutorials was difficult. While there is a wealth of information online about training AI models, the technology is evolving so rapidly that much of the information becomes obsolete only months after it's posted. I was able to find a great walkthrough on setting up Stable Diffusion 1.5, and a tutorial on using Automatic 1111, which is a user interface for using and training Stable Diffusion.
So great, now I have a powerful AI-capable computer and a working AI art generator. Let's test it out! I used some AI art generators (Midjourney and Dall-e 2) and I have experimented a little with prompts, so I knew a little bit about what to expect: give a detailed prompt--the more specifics the better--and get a cool looking image.
Midjourney Prompt: Scrappy Doo playing Dungeons and Dragons
My Prompt: golden retriever dog in a red hood playing table top games with dice in a dark library, medieval
![]() |
![]() |
![]() |
Well… that was not the result I was expecting. How come the Midjourney results are so much better than mine? What even is "better" in this context? Did I download the wrong thing? Is there a step I missed? There are tons of settings when generating images. Did I get these settings wrong? What even is "wrong" in this context? I climb up the rabbit hole to catch a breath. Looks like I have some reading to do…
------
After more research, I discovered that although Stable Diffusion and Midjourney are both "AI art generators", they are worlds apart. This is both a good thing and a bad thing. Midjourney is a collection of neural networks and models that parse your text prompt and give you an output. It’s built to take, perhaps, more “lazy” prompts, and generate more photoreal, stylized images. Stable Diffusion is similar I suppose, but the models shine when they are trained on specific imagery to guide specific outputs. That became a benefit for what I was trying to accomplish. I just needed to train a model on some collection of images, and then I’d be able to generate new, similar images. I realized that this would help me define the "style" of those images. Then, it was time to train a model. Luckily, I found some tutorials on how to train Stable Diffusion 1.5 models, so I hit the ground running.
I learned of a training platform called EveryDream, and got it set up fairly easily with the help of tutorials. The basic idea behind EveryDream and training AI models is that you take a collection of images (say, a collection of pictures of my dog), tell the software what these images contain ("pictures of my dog Thomas"), tell the computer how these images are similar by assigning a keyword ("ThomasTheDog"), and associate the keyword with a common token word ("dog"). The idea is that the base model already has pictures associated with "dog", but it doesn't know specifically what "ThomasTheDog" is. By providing different images of "ThomasTheDog", the trainer will train the model on what that is.
The model gets "trained" by applying noise on top of the image I provide (like the "noise" that is visible in a grainy, low-quality photo), and then it removes that noise one pixel at a time until it gets a result that matches the original image. By iterating through converting these noise patterns back to the source image, the model "learns" what a "ThomasTheDog" is. When creating a new image of a "ThomasTheDog", Stable Diffusion plugs in a random noise pattern into the model and tells it to "turn this into a ThomasTheDog".
Sounds simple, right?
Training a model requires its configuration though, and this can drastically affect how accurate the final model will be. How fast should the model "learn" what I'm trying to teach it? How much should it associate "ThomasTheDog" with other "dogs"? Should I use a consistent noise pattern or a random noise pattern? Who knew that teaching a computer would be just as complicated as teaching a student?
So, it looks like my next step will be to research more about the nuance of training models. Who knew I’d be creating lesson plans to teach my computer what my dog looks like? Looks like I’m going back down that rabbit hole…
![]() |
ABOUT THE AUTHOR Michael O’Connell - Project Manager, Academic Operations Michael O’Connell has been working in a variety of roles within the IT department at Appleby College for the last 15+ years. A natural problem solver, his focus has been on using technology to improve processes, increase efficiency, and reduce tedium in our daily lives so we can focus on the important stuff – using our creativity to unlock the potential of ourselves and others. His wide range of interests and hobbies include all things in tech, and he is constantly on the lookout for the next great innovation in this space. |