Crowdsourcing-Based Framework for Teaching Quality Evaluation and Feedback Using Linguistic 2-Tuple

2018-10-23 08:05TiejunWangTaoWuAmirHomayoonAshrafzadehandJiaHe

Computers Materials&Continua 2018年10期

Tiejun Wang, Tao Wu, , Amir Homayoon Ashrafzadeh and Jia He

Abstract: Crowdsourcing is widely used in various fields to collect goods and services from large participants. Evaluating teaching quality by collecting feedback from experts or students after class is not only delayed but also not accurate. In this paper, we present a crowdsourcing-based framework to evaluate teaching quality in the classroom using a weighted average operator to aggregate information from students’ questionnaires described by linguistic 2-tuple terms. Then we define crowd grade based on similarity degree to distinguish contribution from different students and minimize the abnormal students’ impact on the evaluation. The crowd grade would be updated at the end of each feedback so it can guarantee the evaluation accurately. Moreover, a simulated case is shown to illustrate how to apply this framework to assess teaching quality in the classroom. Finally, we developed a prototype and carried out some experiments on a series of real questionnaires and two sets of modified data. The results show that teachers can locate the weak points of teaching and furthermore to identify the abnormal students to improve the teaching quality. Meanwhile, our approach provides a strong tolerance for the abnormal student to make the evaluation more accurate.

Keywords: Teaching quality evaluation, crowdsourcing, linguistic 2-tuple, group decision making.

1 Introduction

Teaching quality is a key factor influencing student achievement, especially classroom teaching quality in university. The graduate attributes to professional engineering defined in Washington Accord [International Engineering Alliance (2014)] consists of 12 parts,and most of which obtained through studies in a classroom as class-based teaching is the main channel to impart knowledge and develop various capabilities to solve complex engineering problems. Therefore, it is more important to improve classroom teaching quality. Not only the teaching management department in universities but also the teachers themselves would like to know about the achievement of some knowledge points.Several evaluation mechanisms [Dong and Dai (2009); Chen, Hsieh and Do (2015);Chang and Wang (2016); Zhang, Wang, Zhang et al. (2017)] had been established to leverage systematical and scientific methods to evaluate course quality and help teachers to improve the teaching methods. Normally, teaching quality evaluation can be regarded as a multi-attribute group decision making (MAGDM) problem, and similar problems had been discussed. In this issue, different kinds of questionnaires designed according to the teaching process including teaching preparation, teaching organization, teaching content, teaching methods, teaching effectiveness, etc. In the process of evaluating teaching quality, the assessment results obtained from the filled questionnaires. After that,teachers could get feedback about those key indicators from other people who could be students or other experts. The evaluation system getting feedback from experts could provide results about how a teacher or course is good enough to develop students’ skills in a certain field as well as which part there are problems in the course. However, it could not give teachers the details information about which part of each course has problems as generally those experts have different backgrounds about responsible courses with the evaluated teachers. The other kind of evaluation system is getting feedback from students at the end of the semester as it also could not provide a specific solution since the feedback is seriously lagging behind.

Crowdsourcing is a sourcing paradigm in which individuals or organizations obtain goods and services from a large and relatively open group of internet users. It can divide complex problem among participants to achieve a cumulative result which could be ideas and finances. Some research [Luca and Michael (2014); Xu, Liu, Yen et al. (2017);Fernandes, Nati, Loumis et al. (2015)] have been done to collect information from a large crowd to help improve service quality. If teachers could get feedback about courses directly from a large number of students who are sitting the classroom, then that information without delay would be more accurate.

In this paper, we show a framework to aggregate the heterogeneous feedback directly from students seating in the classroom to generate more accurate evaluation conclusion by answering some simple questions about the courseware through the App or web browser. And when teachers go home, they could obtain the evaluation results to different knowledge points designed by themselves before class. As the assessments are for the courseware concerning some specific knowledge points that teachers care about,educators could utilize the evaluation results to refine the teaching content and method at low score points and finally make their course quality will increase respectively. To make the results more accurate, teachers could assign tasks only to specific students whose properties satisfied with instructor’s requirements. Then the main issue is that how to provide an incentive for students to be accurate in their ranking. Furthermore, teachers can get details from different dimensions according to the student’s properties.

The rest of the paper organized as follows: the related works will be in the next section,and in Section 3 we will show a framework based on crowdsourcing to evaluate teaching quality using linguistic 2-tuple model. Section 4 presents an example using our proposed framework. In the last section, we point out some conclusions and future works.

2 Related works

Crowdsourcing has been widely using in various fields to gather information and aid decision-making. Carrasco et al. [Carrasco, Sánchez-Fernández, Muñoz-Leiva et al.(2017)] present a linguistic multi-criteria decision-making model for aggregating heterogeneous questionnaires with opinions about the quality of the e-service offered by the hotels. Authors gather information from several websites and find that all the scales slightly better evaluated. But it is too difficult to evaluate e-service by scales. Then Jatoth et al. [Jatoth, Gangadharan, Fiore et al. (2018)] addressed a hybrid multi-criteria decisionmaking model involving the selection of cloud services among the available alternatives by ranks. The proposed methodology assigns various ranks to cloud services based on the quantified quality-of-service parameters using a novel extended grey technique for an order of preference by similarity to ideal solution integrated with the hierarchical analytical process. Besides, CrowdGrader [Luca and Michael (2014); Xu, Liu, Yen et al.(2017)] is a system that allows students to submit and collaboratively review and grade homework. In CrowdGrader, students would benefit from being able to examine the solutions offered by other students. Meanwhile, all submissions will be distributed to different students to get some reviews. This system has been used at seven classes and utilizes grade to evaluate submissions. But the detail information of the ranks or grades could be lost in the evaluation.

Herrera et al. [Herrera and Martinez (2000)] found that the linguistic domain treated as continuous while in the symbolic model treated as discrete. They proposed a linguistic,computational model based on linguistic 2-tuples carries out processes of computing with words easily and without loss of information. After that, many research [Wang and Hao(2006); Merigo, Casanovas and Martinez (2010); Wei and Zhao (2012); Liu, Lin and Wu(2014); Wu, Wu, Zhou et al. (2015); Li, Dong, Herrera et al. (2017); Wei, Alsaadi, Hayat et al. (2018)] had been done from various aspects.

3 Proposed framework for teaching quality evaluation

Generally, digital courseware, such as PowerPoint and Keynote, is extensively used as the primary way to communicate with students in classrooms of universities. Digital courseware is the basement to evaluate teaching quality as they almost assume all the responsibilities to pass on knowledge to students. So we will show an approach, as shown in Fig. 1 that teachers can utilize to share courseware with students and mark some questions on specific slides to generate questionnaires at the same time. Then, students can make responses using applications installed in their smartphones or browsers in computers while the contents (knowledge points) in the marked slides taught by teachers in the classroom. Because the questions will only be displayed just when the related knowledge point illustrated, therefore, the feedback directly from students seating in the classroom or front of computers is more accurate. In this framework, the evaluation of teaching quality is launched by teachers, the feedback is coming from students, and the results are aggregated from that feedback and could be applied by teachers to refine the target content in this course. Thus, the problem reduced to aggregate relevant results according to the student evaluation of target knowledge points.

Figure 1: Flowchart of the proposed framework for teaching quality evaluation

3.1 Step 1 initiation

Usually, the syllabus for one course consists of some knowledge points that may reflect in different types of material, and teachers can organize them together to make digital courseware depending on some teaching methods. In the course design phase, a teacher may concern with the achievement of some knowledge point covered by some slides in digital courseware. After the slides associate with the knowledge point finished, teachers may ask some questions to students X={x1, x2,,xH}and hopefully get some feedback to assess this knowledge point, and where His the number.

Let K be a knowledge point that should be evaluated by students in the course. We assume that there are N individual questions qk( k=1, 2,,N)associated with knowledge point K. Usually, the questions represent different aspects of K, and then the teacher could know about the degree of acquiring K by aggregating information from those questions. Thus, we define knowledge point K={q1, q2,,qN}as a question set in which the linguistic value of a question expressed with a linguistic 2-tuple (sk,αk)defined in Herrera et al. [Herrera and Martinez (2001)]. All options that students can choose for each question qkcan be defineed by the notations S={s0, s1,,sM}where si( i=0, 1,,M)stands for ithlinguistic term (option) in S and M +1 refers to the granularity of S.

For example, a set of seven linguistic terms S could be

where s0is a linguistic term which means “Nothing” to some attribute that will be assessed by someone and should be assigned to a numerical value and the others have a different meaning in linguistics. The numerical values of the seven linguistic terms in the set are set to equal to their index values. We can express linguistic information by using the linguistic 2-tuple (s,α), where s is a linguistic term and αnumerical value representing the symbolic translation. Then, we could translate a discrete symbolic term into a numerical value by symbolic translation defined in Herrera et al. [Herrera and Martinez (2000)].

The students are learning the course can play as the experts who will evaluate the teaching quality of some knowledge point by choosing an appropriate term from the linguistic set (4). In the same time, we should distinguish students from each other by crowd grade ωi∈Ω={ω1, ω2,,ωH}which is a comprehensive score computed from multiple grade values expressing some student credibility in the evaluation. In the initiation phase, we should assign all the students with the same crowd grade value 1. At the end of each evaluation process, the crowd grade will be changed according to the student choices.

3.2 Step 2 questionnaire preparation

A teacher should attach some questions behind some slides in digital courseware and then share them with students after they have defined those questions based on real requirements. Generally, those questions are about how students believe they have learned about some known point in a syllabus. At the same time, teachers could decide what kind of students would answer the question. Respondents could be every student sitting in the classroom, someone who scored more than 80 points in mathematics, all the girls, or even ten students selected randomly. For the sake of simplicity, the respondents will be all of the students in this paper.

Optionally, teachers could add some filter questions to mark students who answer “No”to those questions, such as “Have you revised the content of the last lesson?”. Then their questionnaires to specific knowledge points should not be counting or lower their weight.In a classroom, teachers will play the digital courseware slide by slide in this framework while teaching students. When the last slide to a knowledge point finished, the question will show on the screen of the chosen students. Then all the chosen students will be allowed to answer those questions within the specific time. To limit the answer time could also check whether some students are absent or doing something else in the classroom. After that, all the completed questionnaires will be collected automatically.

3.3 Step 3 symbolic translation

After collecting those questionnaires, we need to express the symbolic information by translating them into a set of linguistic 2-tuple terms.

Definition 1 (Symbolic Translation [Herrera and Martinez (2000)]): Let β∈[0, M]be a numerical value aggregating from discrete linguistic information defined in S , where S={s0, s1,,sM}is a linguistic term set, then βcan be obtained from the following operation:

where the round is the normal round operate and αis called symbolic translation.

Thus, we can make a linguistic term relationship to a numerical value βand αis the distance to index ias shown in Fig. 2. We also can get the linguistic 2-tuple representing to any linguistic term siis (si,0)according to definition 1. At the end of this phase, all the questionnaires will translate into 2-tuple form.

Figure 2: Sample for a representation of a linguistic 2-tuple (s2,−0. 3)value which means it is 0.3 lower than the option “Low” and βis 0.17=2−α

Besides, the following characteristics are required to be satisfied [Herrera and Martinez(2000)]:

(1) Comparable: Let (sj,α1)and (sk,α2)be two 2-tuples, with each one expressing symbolic information as following:

(2) Negation Operator: To function Δover 2-tuple there must be a negation operation Δ−1:S ×[−0. 5, 0. 5)→[0,M]can be defined per Δ−1(si,α)=i+α.

(3) Aggregation Operator: Aggregation operator can be used as a mapping from a set of linguistic 2-tuples to one 2-tuple. Many aggregation operators [Merigo, Casanovas and Martinez (2010); Wei and Zhao (2012); Wu, Wu, Zhou et al. (2015); Wei (2011); Mo and Deng (2016)] have been proposed to adapt to aggregate information in different computation model.

3.4 Step 4 aggregation

Then we will calculate over the linguistic 2-tuples to aggregate the assessment of teaching quality. The different question qkhas a different important factor to describe a knowledge point. To do so, the teacher should assign question qka different weight associatedindicating how important qkis to the knowledge point K.

Definition 2 (Arithmetic Mean [Herrera and Martinez (2000)]): Let q={(s1,α1),(s2,α2),,(sn,αn)}be a set of linguistic 2-tuples. Then the 2-tuple arithmetic means can be defined as the following:

Definition 3 (Weighted Average Operator [Herrera and Martinez (2000)]): Letbe a set of linguistic 2-tuples andbe their associated weights. Then the 2-tuple weighted average operatorcan be defined as follows:

At the beginning of evaluation, all the students have the same crowd grade that will define in Step 5 and updated at the rest of loops. Using the crowd grade as for associated weight for each student, we aggregate the weighted averagefor each question qkusing weighted average operator based on above definition. After all the collective performance values for each question calculated, teachers can get the sorted result about those questions according to the comparable characteristic of linguistic 2-tuple and find out what is the most need to be improved in teaching. Finally, we can calculate the degree of acquiring knowledge point K by aggregating information from all.

3.5 Step 5 update

In this phase, we will update the crowd grade for each student. The crowd grade will be updated at the end of each loop if students will be encouraged to fill in the questionnaire more carefully.

Definition 4 (Distance [Gerogiannis, Rapti and Karageorgos et al. (2015)]): Let (si,αi)and (sj,αj)be two linguistic 2-tuples in q={(s1,α1),(s2,α2),,(sn,αn)}, then the distance between them can be calculated as the following:

Definition 5 (Similarity Degree): Let q={(s1,α1),(s2,α2),,(sn,αn)}andbe a linguistic 2-tuples set and the arithmetic mean, respectively, then the similarity degree of any linguistic 2-tuple (si,αi)in q can be calculated as the following:

Definition 6 (Accuracy Grade): Let s(( sjk,αjk),)be the similarity degree of the student jto question qk, then the similarity degreeof a student j in the loop ofor calculating knowledge point Kocan be defined as the following:

We assign to each student jan accuracy grade to describe how much better he/she is than average level. The closer to the average mean, the bigger accuracy grade he/she has.The crowd grade of a student is a comprehensive score that can be defined as definition 7.

Definition 7 (Crowd Grade): Letandbe the crowd grade and accuracy grade,respectively, used to aggregate knowledge point Ko, then the crowd gradeused to calculate knowledge point Ko+1can be calculated as the following:

Meanwhile, we merge ωoand αointo crowd grade by giving a weight γ∈[0, 1]to teaching evaluation for students. The initial value ω0of crowd grade for each student set to 1 in our scenario.

3.6 Step 6 improvement

After update phase, the teacher would get the evaluation results generated at Step 4. Thus,the teacher could refine the syllabus and design more effective teaching methods to improve weak to specific knowledge point. At the same time, the achievement of specific knowledge point can also be calculated, and it could be a qualitative indicator of teaching quality. More importantly, the assessment directly comes from students, so it is more reliable and accurate than that from other exports. Moreover, it comes without delay and teacher can get the feedback at the end of each class.

4 Case study

We will illustrate how to use this framework to evaluate teaching quality in this section.To a knowledge point in the syllabus, the teacher should prepare some questions, such as“Are you interested in this part of the content?”, “Do you understand this part of the content?”, “Do you satisfy this kind of teaching method?”, etc. Then the teacher should set associated weightto each question that could reflect the different contribution to the achievement degree. In this case, let associated weight set be W={0. 2, 0. 3, 0. 15, 0. 2, 0. 15}. Moreover, the teacher could arrange what kind of students will answer those questions in advance optionally. If not, all students will make feedback by default. After that those questions will be attached to associated slides that can be shared with students in this framework and students will answer them when they are going to be displayed on the students’ screens at the end of that knowledge point teaching.

We record all collected answers with the matrix c as Eq. (9) shown. According to definition 1, we can translate any linguistic term sito the linguistic 2-tuple (si,0). Then we can get matrix Cwhere Cijis a 2-tuple expressing the answer to the question jof the student i(xi) as shown in Tab. 1.

Then we aggregate all answers to each question qifrom all students by calculating a weighted average to a column iin a matrix C using Eq. (4) Let associated weights be Ω={ω1, ω2,,ωH}where ωi∈Ωis the crowd grade of a student iwith an initial value 1 and that can be calculated by Eq. (8) in the following loops. Thus, we can get aggregated 2-tuples for each question qias Eq. (10) shown.

Then the above results can be sorted in increased order and getwhich means the teacher should pay more attention to question 5 and try to refine it in the following course. Moreover, the achievement for knowledge point Kocan also be calculated by Eq. (4). Let associated weights be W={0. 2, 0. 3, 0. 15, 0. 2, 0. 15}for five questions and then the 2-tuple representing the achievement of knowledge point Kois(M,0. 23). The aggregated result means that the teaching quality to knowledge point Kois in medium level and the teacher may make more effect on this point.

Table 1: Linguistic 2-tuple translated from performance evaluation of all selected students to a knowledge point Ko in S={s0=N, s1=VL, s2=L, s3=M, s4=H, s5=VH, s6=P}

Finally, let γbe 0.25 as default value, and we update crowd grad for each student according to Eq. (6) and Eq. (7). In this case, we assume that all seven students answered the questions arranged to them. Then the updated crowd grads are:

The results show that the crowd grade of a student x5is lower than others then the student x5will decline in influence in the next loop to evaluate another knowledge point.

5 Experiments

Table 2: The answers that twenty-two students (x1to x22) given to questions (q1to q60)selected from S={s0=N, s1=VL, s2=L, s3=M, s4=H, s5=VH, s6=P}associated with 12 knowledge points (K1to K12)

To demonstrate the practicability of the proposed approach in this paper, we first developed a prototype with basic functions to evaluate teaching quality. Then we gather a series of questionnaires completed by twenty-two students in one class, and which consist of sixty questions and cover twelve knowledge points of the Java Programming course. There are five questions to each knowledge points and totally sixty answers as shown in Tab. 2.

We input these answers into the implemented system manually and get the achievement of each knowledge points. Fig. 3 shows that the evaluation results of the question q28and q25for knowledge point K6and K5, respectively, are not satisfied, and the teacher should pay more attention to those two aspects in the future teaching.

All the twenty-two students have the same initial crowd grade with value one which would be updated at the end of each loop. Fig. 4 shows that there are four abnormal crowd grade curves which represent four studentsx3, x5, x11and x19. Comparing them with other students in Tab. 2, we can find that student x3and x11simply filled out all the questionnaires with the same answer Pand student x11seemed to give the answers randomly. So their crowd grade is getting lower and lower. The crowd grade of a student x19is the most different with others, and there is a clear rebound from the seventh loop.Analyzing the answers those student x19given, the answers in the first six questionnaires are random that shows the student x19has no intention at the beginning.

Figure 3: Two evaluation results of knowledge points K5and K6generated by the implemented system according to 264 questionnaires each of which consists of five questions to the Java Programming course and filled out by 22 students of one class in 12 installments

Figure 4: Changes in crowd grades of 22 students with different evaluation loops

The rest answers of student x19in the second half of the course. Teachers could take advantage of in the last six questionnaires are close to weighted average mean, and which illustrated the change analyzing results according to Fig. 4 to identify the abnormal students and do some detail works on them.

As the crowd grade of an abnormal student is getting lower and lower if he/she does not change the behavior, the influence on the assessment of knowledge points will gradually decrease. To illustrate this feature, we do another experiment on two modified data sets.One set of data are modified by picking out four abnormal students that could let us get a real assessment without noising, and the other set of data are modified by removing four normal students randomly in which the noises are still existing. The experiment results show that the deviation from the real value is bigger at the beginning but smaller during the second half of the term (see Fig. 5). The reason is that our approach needs some loops to identify the abnormal students and reduce their crowd grades. The results show that our approach has a strong tolerance for abnormal data.

Figure 5: Changes of achievements of 12 knowledge points with two sets of modified data

6 Conclusions

Usually, participants in crowdsourcing have uncertainty during the feedback process, and the uncertainty leads the results obtained from participants may not be accurate.Meanwhile, how to describe and compute this information is a big issue. In this paper, we leverage linguistic 2-tuples to express the feedback from student and proposed a crowdsourcing-based framework to evaluate teaching quality in a classroom. We focus on how to distinguish the difference among all the students. Then we defined crowd grade depending on the accuracy grade to weight ranking from different students. In our approach, student’s crowd grade is always updated at the end of each feedback, and that could encourage students to answer the questions more seriously. Finally, teachers could get the evaluation in time and accuracy could be guaranteed.

Feature works will focus on how to describe a student with more properties by using that teacher can send questions to a specific student, and which would make the answers more accurate. If so, we need to develop a mechanism to make sure every student have the similar opportunity to answer the questions.

Acknowledgment:This research is supported by the National Key Research and Development Program of China (No. 2017YFC1502203) and the Key Project of Sichuan Provincial Department of Education (No. 2017GZ0333).

Computers Materials&Continua2018年10期

Computers Materials&Continua的其它文章: Analyzing Cross-domain Transportation Big Data of New York City with Semi-supervised and Active Learning; Improved VGG Model for Road Traffic Sign Recognition; Multi-task Joint Sparse Representation Classification Based on Fisher Discrimination Dictionary Learning; Snow Cover Mapping for Mountainous Areas by Fusion of MODIS L1B and Geographic Data Based on Stacked Denoising Auto-Encoders; New Method for Computer Identification Through Electromagnetic Radiation; Method of Time Series Similarity Measurement Based on Dynamic Time Warping