Background and aim of the study

The fieldwork for this study took place in two classrooms with children aged 7 and 8 in a mid-sized town in Sweden. The teacher, the same for both classes, introduced the children to computers for school work as she was interested in the method “Reading through writing” (In Swedish: “Att skriva sig till läsning”) (Trageton, 2003). This method for teaching literacy encourages writing via keyboard and screen before writing via pencil and paper. The method has gained attention among teachers in Sweden and is quite widely used, although in different ways (Hultin and Westman, 2014). The reason for studying the activities in these two classrooms was an interest in young children’s screen-based text-making in general, not an interest in the particular method “Reading through writing”. In the studied classrooms the computers were mostly used for writing by means of the word processing software Word or the visual narration software Photostory. The computers were also used for different pedagogical games and for downloading and saving images. The two most common genres of text created in the classrooms were personal commentaries and different varieties of subject texts. Images from the internet, i.e. Google images, or from the computer-inherent software ClipArt were copied into texts of both genres. The copied and recontextualized images are here called “prefabricated”, defined as images that are not produced/photographed by the actual text-makers but instead selected from an image bank. Image selection was a time consuming activity during the studied lessons in the two classrooms, indicating that the activity was complex but most likely also interesting for the children. The selection of images was handled by the children themselves and not addressed in teaching situations.

The focus of the study presented in this article grew out of the observations made around the particular challenges of creating cohesion between “personal content” in writing and images that are copied from the internet. Personal content should here be understood as referring to the children’s experiences, memories, artefacts, animals or family and friends that figure in the children’s lives. The aim is to describe cohesion in multimodal texts with personal content created by the children with specific attention paid to their selection of prefabricated images. The results are anticipated to inform our understanding of one way of creating multimodal texts in young children’s classrooms, namely the activity to select and recontextualize images. The ability to obtain cohesion across modes can be regarded as a defining feature of success in working transmodally (Wyatt-Smith and Kimber, 2009), and therefore motivated to discuss in regards to children’s multimodal literacy abilities, when both creating and interpreting contemporary texts.

Young children, multimodal text-making and online activities

The use of digital technologies makes different representational resources available in the classroom via downloading, copying and pasting. Adami and Kress (2010, p. 187) describe contemporary text creation as characterized by “representation-as-selection”, which means that semiotic and meaning-making activities to a large extent concern navigation and selection among options. Notions of semiotic agency become a matter of selection from already existing semiotic material and “a sense of (multimodal) text as bricolage” (Pachler, Bachmair and Cook, 2010, p. 191). “Representation-as-creation” would be a semiotic activity where an element of content generation is more easily recognized, as for example when a text-maker writes about an experience or event and draws an image or photographs with a camera. However, following Kress (2003) and the theory of social semiotics, meaning-making is always creative as it involves making use of available material and resources in new ways. The processes of selecting and recontextualizing textual elements are therefore also creative in that existing material, such as a prefabricated image, is combined with other modes in new contexts.

In the contemporary textual landscape the abilities to create and interpret a wide variety of communicative resources in multimodal compositions are essential (Bearne, 2009b, p. 31). This may be specifically important to stress in relation to younger children as meaning-making via language and writing tend to be foregrounded in their teaching (Kendrick and McKay, 2004). Children themselves are however often more familiar with new literacies where images and visual resources are central, than with traditional literacies where resources such as letters and words dominate (Yamada-Rice, 2010). Many researchers call for a concept of literacy that problematizes the image and the visual as semiotic and communicative resources in their own right as well as in combination with other modes such as writing and audio (Bearne, 2009a, b; Kendrick, McKay and Mutonyi, 2009; Kimber and Wyatt-Smith, 2009; Marsh, 2010; Pahl and Rowsell, 2010, 2012; Kimber and Wyatt-Smith, 2010). Such an approach could also, besides preparing the children for full participation in multimodal communication, serve as a bridge from one symbol system to another making the newer symbol system of the word more accessible to young children (Dyson, 1992).

Managing several modes in text-making involves considering the meaning potentials of the resources used individually and in combination (Jewitt, 2005). Important for the design and assessment of a multimodal textual product is therefore the concepts of cohesion and unity. Kimber and Wyatt-Smith (2010, p. 618) propose an assessment framework for the creation and sharing of knowledge online, and the “ability to assemble, compose or design an aesthetic, creative combination/transformation or treatment of existing sources and materials into new, cohesive representations or text (e.g., colours, fonts, spatial layout)” is one of the learning priorities in the category called “e-design”. The level of cohesion achieved by the text-maker can convey parts of the person’s cognitive and organisational abilities, technological capacity and aesthetic awareness (Wyatt-Smith and Kimber, 2009, p. 78). Bearne’s (2009a, p. 161) suggestions for a framework for analysing children’s multimodal texts pay attention to image concerning: content, size, colour, tone, line and placing/use of space, and to language concerning: syntax and lexis. The framework is developed also for sound, gaze and movement. It is stressed that modes interrelate to make meaning. Therefore children’s different ways of creating coherent texts through interweaving different modes in a specific form of text are in focus in Bearne’s (2009a) suggested framework.

This article focuses specifically on children’s use of prefabricated images in text-making at school, but also teachers and textbook authors are embracing images. Jewitt (2013, p. 143) points to the fact that images have gained importance for the subject of English and are frequently used by teachers as starting points for introducing new themes. In classrooms where teachers recognize and encourage meaning-making with digital technology the prerequisites for learning about a wide variety of resources for communication are favourable. However, there may still be a need for pedagogical interventions and time for reflection around multimodal text-making in order for children to obtain a deeper understanding of the activities carried out. Yamada-Rice (2010, p. 344) states that “I do not believe that learning to ‘read’ images by osmosis, such as through television and picture books, is the same as having been taught skills to produce, criticise and evaluate visual meaning-making”. Even though most young children enter school with experiences of visual meaning-making the different meaning potentials of images and multimodal text-making may need to be explicitly addressed in the teaching of literacy.

Semiotic potentials of images and writing and the changing meaning potential of photographs

The prefabricated images chosen for text-making by the children in this study are typically photographs. Photographs are usually seen as a form of representation with high claims for resembling naturalistic reality (Björkvall, 2009, p. 114) and they traditionally function as witnesses and documentation (Machin, 2004; Machin and Jaworski, 2006). Photographs have “naturalistic modality” (Jewitt and Oyama, 2001; Kress and van Leeuwen, 2006) when there is a congruence between how you see an object in an image and how you see it in reality. The modality of an image is lower and not naturalistic when the image seems manipulated, for example concerning colours, which may be more extreme and intense than you experience them in real life. In van Dijck (2008, p. 57) it is suggested that the digital age has emphasised the communicative and identity-shaping aspects of photography at the expense of the documentary aspects. The easiness for people to communicate via images and the possibilities of manipulation are significant here. The internet has also made personal photographs “vulnerable to unauthorized distribution” (van Dijck, 2008, p. 59) as personal photographs available on for example Google images may turn up in unexpected contexts. When images are recontextualized, a photograph of a person, place or an event may not correspond to the identity of the person, place or event referred to in writing in a new text. Instead other types of meaning potential than identity must be drawn upon. In the UK’s Daily Mirror in 2014, an article about so-called food banks in England was illustrated by an image of a crying child taken in San Francisco, USA, in 2009 (Figure 1). The newspaper article dealt with poverty in England and the many food banks supplying families with food parcels, but the image from the USA had nothing to do with the situation in England, and the child’s tears were not due to hunger. On the Flickr page from which the image can be downloaded, it is said that the child is crying because an earthworm that she intended to take home to the garden crawled away and disappeared.1 In the article the lack of authentic relation between image and writing is kept implicit. Some viewers may therefore conclude that the crying child is hungry and that her family is in need of food parcels, in line with the content of writing in the article.

Figure 1 

Image of a crying girl photographed in the USA and writing about poor families in England. (Daily Mirror, April 14, 2014).

In the case of the image in Daily Mirror, the meaning potential of the crying girl could be something like “children’s suffering”, which is in tune with the overall message of the article about poor families and hungry children. However, when the origin of the image becomes known the authenticity may be questioned as there is no relation between the child in the image and the children described in the newspaper article. This photograph does not document the information in the written article.2 Machin (2004) discusses a move away from the traditional use of photographs as denoting and documenting, and suggests that photographs today often function as typical examples or generic symbols. Genericity as meaning potential is enhanced when the motif is decontextualized, the environment is ambiguous or when specific props and attributes like for example a white coat, glasses and a computer are used to signify something like “science” (Machin, 2004, pp. 320–322). Photographs with generic characteristics can be bought or downloaded to fit different textual products, and this means that when photographs in newspapers on paper or screens are downloaded from the internet or bought from image banks, the idea of the photograph as a witness or documentation must be problematized. In the case of the image in Figure 2 the tears can be considered an attribute that signifies “pain” or “suffering”. The environment is almost eliminated and out of focus. These generic meaning potentials link the hungry English children described in writing with the image of the sad American girl.

The theoretical foundations and previous research for this study derive from social semiotics in relation to young children’s multimodal communication in online activities, multimodal discourse analysis and media studies. The choice of background might in the case of media studies seem distant from children’s text-making and literacy activities at school. However, as the internet provides for an infinite number of images and photographs available for text-making, the situation when children use Google images or ClipArt seems comparable to when news editors search for and select apt images for news articles from image banks. Prefabricated images are used in different contexts today, certainly in the news, social media and advertisements (Machin, 2004) and the young children’s selection and recontextualization of images at school are thereby part of a broader textual context as the use of prefabricated images is a consequence of the online access to images in society at large.

Methodology and data

The examples used in the present study are part of a larger material in a project focusing on young children’s screen-based text-making at school and in the home (Björkvall, 2012, 2014; Engblom, 2013a, 2013b). The material relevant for this study consists of recordings, observations and textual products of activities involving computers in two classes with 7- and 8-year-olds at the same school. The specific material used in the study consists of text-making processes and textual products from 6 children in the two different classes.3 The texts result from two teacher-initiated activities and one child-initiated activity. The teacher-initiated activities involve selecting favourite animals and writing one sentence about each animal (Figure 2, child 1) and writing about a summer memory (Figure 3, child 2, Figure 4, child 3 and Figure 5, child 4). The child-initiated activity occurs during a lesson when the children are asked to choose freely among different activities like reading or going to the library. Two children chose to write a text on the computer (Figure 6, child 5 called Alvin, and 6 called Joakim).

Figure 2 

Child 1: Cohesion between image and writing in content via the image of a dog and the caption “This is what my great grandmother’s dog looks like”.

Figure 3 

Child 2: Cohesion via the image of candy floss and the mentioning of candy floss in writing and via decontextualisation of the motif together with personal writing about a trip to Spain.

Figure 4 

Child 3: Cohesion via distant perspective of a boat together with writing about persons, places and the boat trip.

Figure 5 

Child 4: Textual cohesion via the resource of colour, but incongruence between the content of the image and the content of writing.

Figure 6 

Children 5 (Alvin) and 6 (Joakim): Cohesion via general opinions on motocross bicycles together with a non-naturalistic image of a motocross rider.

The material was collected with consideration to both ethnographical and social semiotic perspectives (Björkvall, 2012; Björkvall and Engblom, 2010), combining a semiotic, multimodal analysis of texts with ethnographic observations of situated processes. The aim is to avoid de-contextualised semiotic analysis as well as atheoretical ethnographic description. The methodology enables the process of text-making and the conditions and prerequisites surrounding the activity to be analysed and not only the finished product. For example, the amount of time dedicated to writing in comparison to image selection may be estimated, the children’s considerations concerning design may be observed, and also the potential technical problems that may have an impact on the design of the text and the finished product.

As the study focuses on cohesion in texts produced with prefabricated images and writing about the text-makers’ personal experiences, memories, artefacts, animals or family and friends, of course the chosen texts and/or processes include such a combination. The analysis of the selected images and the writing follows Bearne’s (2009a) framework for analysing multimodal texts regarding image and language. Specifically content, size and colour are used in the analysis of the images and lexis and syntax concerning language. The interrelations across image and writing are also analysed by investigating the characteristics of the image in terms of decontextualization of setting and modality, i.e. reality value (Kress and van Leeuwen, 2006; Machin, 2004).

Cohesion in personal texts using prefabricated images as resources for meaning-making

The text in Figure 2 consists of a photograph of a dog and the writing “This is what my great grandmother’s dog looks like” (In Swedish: “Såhär ser min gamel (misspelled for ‘gammel’) mormors hund ut”). It was created during the teacher-initiated task of selecting favourite animals and writing one sentence about them. The image shows a light brown and white dog of the breed collie lying on the floor inside a house or an apartment. The dog takes up a large part of the image leaving little space for the environment around the dog. The image is a naturalistic photograph, not looking professionally produced or manipulated. The image is large in proportion to the text as a whole. The size of the image is not a choice made by the child, but due to the software Photostory as it determines the size of the image. The image is copied from Google images and therefore not the actual dog of his great grandmother. It may be very similar to the real dog or not. The incongruity between the actual and intended dog and the photograph of the collie taken from Google images seems to be dealt with using the formulation “looks like”, which claims likeness and similarity between the two dogs, and at the same time opens up for the image to show another dog than the child’s great grandmother’s. Cohesion is obtained via the image of a dog and the sentence about a dog, but the formulation “looks like” falsifies a direct link between the dog in the image and the dog in the world.

The possibilities for representation in writing is flexible compared to the more inflexible possibilities for representation in image. Images always show specific places, events and persons, but can increase their generic meanings via certain characteristics, while language may refer to generic circumstances via formulations like “all people” or “someone”. The child seems to be drawing on the flexible possibilities of language in order to convey the incongruity between image and intended meaning. The teacher’s task is here interpreted as meaning a specific animal (like the great grandmother’s dog), but a more common interpretation in other children’s presentations of the same task is instead a favourite species of animals.

In Figure 3, a child writes about a trip to Spain as a summer memory. The text is created using the software Word. The child finishes her writing before she adds the image. The trip is described in past tense with naming of persons and places and descriptions of events. Her family visits a market, rides on carousels, buys candy floss and swims. The writing and the image take up about the same amount of space in the text. When the child is searching for an image on the internet she scrolls and looks at different images for around 10 minutes before selecting one.4 The image she chooses has no identifiable human beings as motif, only a hand holding a stick of candy floss. The chosen image differs from many of the others that appear for the Swedish search word “sockervadd” (In English: “candy floss”) as most of them include identifiable human beings. In the selected image there is a sign behind the candy floss with the Swedish word “sockervadd” on it. The image is therefore unauthentic for the reader/viewer looking at the text as a documentation of the trip to Spain. On the other hand, it is possible to imagine the girl writing the text or perhaps her sister which is referred to by name in the original writing holding the stick of candy floss. This is possible as the face is not part of the image. The close-up perspective decontextualizes the candy floss and the specificity of the motif is thereby reduced. The flexibility of representation of the image is increased due to decontextualization and the image becomes apt for a personal commentary in a way that a motif of a recognizable, but unknown person would not be. The naturalistic photograph also increases coherence with the personal writing of her family’s trip to Spain.

In Figure 4 the text consists of a photograph that shows a boat, either on a lake or the sea. The boat is moving in high speed shown by the ripples alongside and behind it. The boat is photographed from above which is an unusual angle for the average photographer who is standing on the ground. Except for the bird’s eye perspective the photograph can be described as naturalistic. The title of the text is “Ett sommarminne” (In English: “A summer memory”) and consists of personal writing including a place “stugan” (In English: “the cottage”) where the text-maker, his father and another name-given person go by boat. The last sentence evaluates the memory, “Det var jättekul” (In English: “It was so much fun”). The distance from the viewer’s position to the boat makes it impossible to discern people or personal objects in the image. The faraway distance in the image together with the small amount of context around the boat, there is just water, promote flexibility in meaning potential concerning people and places Comparing Figures 3 and 4, it seems that distant perspectives can function in the same decontextualizing way as close-up perspectives. Both perspectives can create cohesion between image and writing as they allow for flexible interpretations of the motif.

In Figure 5 the text reflects a personal experience of attending a flea market together with members of family, making money and being happy. The chosen image is of cupcakes, which are not explicitly referred to in the content of writing. Cohesion between image and language is thus lacking concerning content. However, in the text, the colours in the image and the colours of the letters in the title are the same, and cohesion is obtained through the visual impression of image and language building on colour. The primary function of the image comes across as aesthetic and important for the layout, but not for showing the personal experience.

In one of the observed lessons two children, here called Alvin and Joakim, find the opportunity to use a personal photograph instead of a prefabricated. They search the internet for a website of a motocross club which one of them is a member of. The motocross club photographs riders during training and uploads the images to the club’s website. The children address which image to select and where it was photographed when they interact in front of the screen. They decide on an image of Alvin. Joakim says: “because it’s only you in this one” (In Swedish: “för den är du själv på”), and then asks “what are you driving there” (In Swedish: “vad kör du där för nåt”). Alvin answers “the big one in the curve” (In Swedish: “den stora i kurvan”), perhaps meaning the curve of a big race track. They are unable to copy the selected image into their Word-document and instead look for prefabricated motocross images on ClipArt. The photograph that ends up in the text has low naturalistic modality, as its colours are intense and the clouds in the sky seem to have been manipulated to form a certain pattern. The final writing is: “joakim and alvin think that crosses are cool crosses move fast crosses can drive in sand” (In Swedish: “crossar är häftiga tycker joakim och alvin crossar går snappt (‘snappt’ is misspelled for snabbt) crossar kan köra i sand”). The writing contains general information about the children’s opinion on crosses, namely that they are cool, followed by two characteristic features of crosses, that they can move fast and drive in sand. When interacting about the personal image from the website more specific information about the location and Alvin as the only participant are addressed. In this case, the image is selected before Alvin and Joakim start writing. The finished text is personal as it includes Joakim’s and Alvin’s names and opinions of motocross bicycles, but not personal to the same degree as their interaction during the visit to the website of the motocross club. Possibly, this adjustment in content from their verbal interaction when visiting the website to their writing after selecting the ClipArt image could be interpreted as a strategy to obtain cohesion between the prefabricated image and the content of their writing.

Results and discussion

In this article, young children’s screen-based text-making at school is analysed and discussed from the perspective of cohesion between prefabricated images and writing about personal experiences, memories, artefacts, and family and friends. The material relevant for this study consists of recordings, observations and textual products of activities involving text-making via computers from 6 children in two different classes. During lessons and text-making activities the children had unlimited access to images on the internet and these prefabricated images, usually in the form of photographs, were copied into texts and combined with writing. The selection of images in terms of time and commitment was an important part of creating multimodal texts in these two classrooms.

The combination of prefabricated images and writing about personal circumstances has in this study been pointed out to involve certain challenges concerning cohesion as the persons, objects, places or events shown in the image are unrelated to the content of the personal writing. In the studied classrooms authentic or personal photographs were generally not available, while the content of the writing in their texts often built upon personal experiences or interests. One result is therefore that meaning-making via writing and image diverges in these classrooms; writing allows for representation-as-creation while image allows for representation-as-selection (cf. Adami and Kress, 2010). Having to select a prefabricated image from a more or less infinite collection of images promotes of course certain metasemiotic abilities concerning the combination of modes, but not others. Being able to consider the meaning potential of genericity, via for example decontextualization, instead of specificity, via for example identifiable human beings, is an example of one such promoted ability.

The computers were introduced in the classrooms by the teacher with the purpose of enhancing the children’s writing and reading skills, but the online access to images opened up also for a more varied meaning-making. One of the teacher-initiated tasks was oriented toward both writing and image by means of the software Photostory, while the other was oriented toward writing using Word. Yet images were used by the children in all texts, including the child-initiated activity, which is a signal about the children’s interests and perceptions concerning text-making. Images are either the largest textual element in the texts or balanced in size with writing, suggesting the importance of the mode of image in text-making for these 7- and 8-year-old children.

The specific challenge of creating cohesion in personal texts when the available visual resources are prefabricated was accomplished in a few different ways in the exemplified texts. In five of the six texts cohesion in writing and image concerned content in that words and motif shared denotation. However, one strategy to diminish such cohesion was to use formulations in writing that falsify the relation between the motif of the animal in the image and the real animal in the world represented in writing (Figure 2). Language is used to manage the dissonance between the intended animal and the animal in the image that was available to the child. Other strategies to obtain cohesion included to make use of a close-up perspective that decontextualize surroundings (Figure 3), and distant perspectives that make identification of people, environment and objects difficult (Figure 4). In Figure 3 a problem with cohesion maintained in spite of the decontextualized motif as the image contained a sign in Swedish whereas the writing described an event taking place in Spain. Naturalistic modality (as in Figures 2, 3 and 4 (only partly naturalistic due to the bird’s eye perspective)) in the photographs adds to cohesion in personal texts in another way than non-naturalistic photographs would do. Another strategy to accomplish cohesion was paradoxically to select an image that was unrelated to the writing concerning content (Figure 5). Cohesion was still obtained via the resource of colour as the image and the title used the same colours. In the motocross example (Figure 6), the text-making activity starts with a search on a website where one of the children can be seen in different photographs during motocross training. Child 5 and 6 discuss where the different photographs are taken and whether to include a photo with or without other riders than child 5. In their finished textual product a non-naturalistic prefabricated image and their general opinions about motocross bicycles were expressed. The change in content from their interaction about the personal image to the writing about the prefabricated seems apt in order to increase cohesion.

All the texts in the study showed cohesion between image and writing in some way. Another result is therefore that the children possess cognitive and organisational abilities, technological capacity and aesthetic awareness (Wyatt-Smith and Kimber, 2009, p. 78). However, there were also examples of weakness in cohesion. In Figure 2 the problems of referring to a specific animal without having access to a photograph of that animal called for formulations in writing that created a distance between motif in the image and the animal in the world. In Figure 3 the events described in writing take place in Spain while the image includes a sign in Swedish, and in Figure 5 cohesion between image and writing relied solely on the resource of colour.

Previous research has shown that literacy teaching that utilizes a variety of forms for representation to a large extent has remained unexplored even though visual resources are integral to the early years of schooling (Bearne, 2009a, b; Kendrick, McKay and Mutonyi, 2009; Kimber and Wyatt-Smith, 2010; Marsh, 2010; Pahl and Rowsell, 2010, 2012; Wyatt-Smith and Kimber, 2009). In the text-making activities presented in this article the children are working multimodally, selecting images from Google or ClipArt and combining modes in different meaning-making activities. When children as well as other age groups go online, prefabricated images that do not document or bear witness in the same way as authentic/personal photographs need to be understood and managed. Naturally, children’s acquired understandings and experiences could be used as a starting point in learning situations that address cohesion of image and writing in a varied sense. Such learning situations in the classroom could also benefit from critical perspectives on images and on the ethics of digital text-making involving recontextualizations of textual elements.