Esta é unha revisión antiga do documento!

ComposIT: experiments and comparison with AI planners

In previous works [1][2], ComposIT was analyzed and compared with the winners of the Web Service Challenge 2008 (WSC'08). These experiments demonstrated that ComposIT can obtain optimal solutions, outperforming the results obtained by the winners with respect the number of services and the composition length. However, the aim of the experiments presented here is to compare the performance of ComposIT with two open source state-of-the-art composition engines, OWLS-Xplan and PORSCE-II, which use classical AI planning algorithms.

The experimentation is focused on the generation of semantically valid composite services taking into account the semantic information regarding the inputs and outputs of services in absence of preconditions and effects. To carry out this comparison, we developed a benchmarking tool that automates the evaluation process of the different composition algorithms using an adapted version of the WSC'08 datasets. This page extends the details of the evaluation done.

Datasets

To validate the three composition engines, we used three collections of datasets: Exact-Matching, Semantic-Matching and WSC'08 datasets. The first two collections are based on the original WSC'08 datasets but containing only the services from the solution itself for validation purposes. Characteristics of each dataset are shown below.

Semantic Matching datasets

Download the XML Semantic-Matching datasets semantic-matching.tar.gz

Dataset	#Serv	#Serv. Sol.	#Length	Avg. In./Out.	#Init. con.	#Goal con.
Semantic-Matching 01	2	2	2	1.0 / 1.0	1	1
Semantic-Matching 02	3	3	2	3.0 / 1.0	3	1
Semantic-Matching 03	10	10	3	4.03 / 3.08	3	2
Semantic-Matching 04	5	5	3	4.25 / 4.68	4	1
Semantic-Matching 05	40	40	23	5.07 / 2.44	3	1
Semantic-Matching 06	10	10	5	5.47 / 3.47	6	4
Semantic-Matching 07	20	20	8	3.03 / 5.11	2	3
Semantic-Matching 08	35	35	14	3.10 / 3.37	9	4
Semantic-Matching 09	20	20	12	3.07 / 4.40	8	1
Semantic-Matching 10	30	30	20	3.25 / 5.42	5	4

Exact Matching datasets

Download the XML Exact-Matching datasets exact-matching.tar.gz

Dataset	#Serv	#Serv. Sol.	#Length	Avg. In./Out.	#Init. con.	#Goal con.
Exact-Matching 01	2	2	2	1.0 / 1.0	1	2
Exact-Matching 02	3	3	2	3.0 / 20.5	3	40
Exact-Matching 03	10	10	3	4.03 / 16.53	3	100
Exact-Matching 04	5	5	3	4.25 / 21.05	4	77
Exact-Matching 05	40	40	23	5.07 / 14.20	3	407
Exact-Matching 06	10	10	5	5.47 / 24.09	6	157
Exact-Matching 07	20	20	8	3.03 / 26.39	2	261
Exact-Matching 08	35	35	14	3.10 / 23.21	9	507
Exact-Matching 09	20	20	12	3.07 / 24.71	8	219
Exact-Matching 10	30	30	20	3.25 / 33.81	5	473

Web Service Challenge 2008 datasets

Download the XML WSC'08 datasets wsc08.tar.gz

Dataset	#Serv	#Serv. Sol.	#Length	Avg. In./Out.	#Init. con.	#Goal con.
WSC'08 01	158	10	3	3.53 / 5.25	3	1
WSC'08 02	558	5	3	3.79 / 3.92	4	1
WSC'08 03	604	40	23	4.07 / 6.46	3	2
WSC'08 04	1041	10	5	4.23 / 5.47	6	1
WSC'08 05	1090	20	8	3.36 / 4.26	2	1
WSC'08 06	2198	35	14	6.00 / 4.31	9	4
WSC'08 07	4113	20	12	7.37 / 7.21	8	3
WSC'08 08	8119	30	20	5.44 / 6.54	5	4

These tables show: the number of services of each dataset (column #Serv); the number of services of the optimal solution (column #Serv. Sol.); the length of the shortest solution (column #Length); the average number of inputs and outputs (column Avg. In./Out.); the number of inputs provided by the user (column #Init. con.) and the number of wanted output concepts (column #Goal con.). The goal is to find, for each dataset, the optimal composite service (minimum number of services, minimum length) that satisfies the goal concepts, using only the initial inputs provided.

Exact-Matching datasets were calculated by extending the outputs of each web service, including all superclasses of each output as an output of the service itself (semantic expansion). Thus, the average number of outputs is bigger than in the other datasets. The semantic expansion transforms a semantic matching problem into a exact matching problem, when exact and plug-in match is used to perform the semantic matchmaking. This allows us to test composition algorithms (that do not use semantic reasoners) with the WSC'08 datasets. For example, suppose that a service S1 provides the instance “my_car” which is an instance of the concept “sedan”, whereas another service S2 requires an input “car”. Given that “my_car” is a sedan “car” (plug-in match), S2 can use it as an input. This behavior can be simulated by adding superclasses of my_car to the service. Thus, service S1 will return outputs “my_car”, “car”, “sedan” and “vehicle” after the semantic expansion.

The figure below represents an example of the semantic expansion. In the first case, a semantic reasoner is required to match output “my_car” to “car”. In the second case, the service returns the output “my_car” as well as all superclasses (sedan, car, vehicle). In this case, the output “car” from S1 matches exactly the input “car” from S2.

Evaluation

The performance of each composition algorithm has been measured using a test platform developed for this purpose. We carried out three different experiments, one for each collection of datasets shown in the previous tables, to determine the quality of the solutions and the composition time. The experiments consisted of five independent executions of each algorithm over each dataset. The number of services and the length of the composition, as well as the minimum, average and standard deviation of the time taken to solve the composition problem are shown in the tables below. All the experiments were performed using a Windows XP 32-bit with VirtualBox 4.1.22 (2 GB of RAM, 1 core), on a PC-station equipped with an Intel Core 2 Quad Q9550 at 2.83GHz running Ubuntu 10.04 64-bit.

1. Exact-Matching evaluation results

Dataset	Algorithm	Solution		Execution (ms)
Dataset	Algorithm	#Serv	#Length	Ex. 1	Ex. 2	Ex. 3	Ex. 4	Ex.5	Min	Avg	SD
Exact-Matching 01	ComposIT	2	2	238.20	158.04	153.63	153.61	145.07	145.07	169.71	38.57
	Xplan	2	2	306.65	348.32	328.53	359.50	305.37	305.37	329.67	24.29
	PORSCE-II	2	2	3478.53	3717.59	3657.99	3524.25	3553.97	3478.53	3586.47	98.60
Exact-Matching 02	ComposIT	3	2	274.59	209.04	211.42	196.24	206.73	196.24	219.60	31.28
	Xplan	3	3	416.11	490,68	391.30	476.10	422.66	391.30	439.37	42.17
	PORSCE-II	9	9	5206.54	5283.59	5240.02	5482.74	5649.92	5206.54	5372.56	188.49
Exact-Matching 03	ComposIT	10	3	422.66	379.86	369.33	330.27	365.57	330.27	373.54	33.19
	Xplan	10	10	815.73	642.90	616.80	595.58	622.55	595.58	658.71	89.38
	PORSCE-II	22	22	16081.13	15721.06	15105.77	16452.69	14779.84	14779.84	15628.10	686.69
Exact-Matching 04	ComposIT	5	3	321.52	316.63	316.42	354.47	381.93	316.42	338.19	29.13
	Xplan	-	-	-	-	-	-	-	-	-	-
	PORSCE-II	14	14	10889.68	10777.27	10595.46	10898.34	10767.12	10595.46	10785.57	122.58
Exact-Matching 05	ComposIT	40	23	7111.35	6676.08	6669.89	6775.57	6710.47	6669.89	6788.67	185.20
	Xplan	-	-	-	-	-	-	-	-	-	-
	PORSCE-II	-	-	-	-	-	-	-	-	-	-
Exact-Matching 06	ComposIT	10	5	463.92	468.82	603.30	484.27	505.79	463.92	505.22	57.20
	Xplan	-	-	-	-	-	-	-	-	-	-
	PORSCE-II	27	27	24106.72	24340.28	23880.96	24732.57	24025.89	23880.96	24217.28	332.65
Exact-Matching 07	ComposIT	20	8	952.07	1103.62	951.09	1015.01	895.89	895.89	983.54	79.27
	Xplan	-	-	-	-	-	-	-	-	-	-
	PORSCE-II	52	52	38036.12	37755.37	35577.92	36377.60	36306.59	35577.92	36810.72	1043.49
Exact-Matching 08	ComposIT	35	14	6071.00	6299.35	5859.83	6324.65	6174.44	5859.83	6145.85	189.58
	Xplan	-	-	-	-	-	-	-	-	-	-
	PORSCE-II	-	-	-	-	-	-	-	-	-	-
Exact-Matching 09	ComposIT	20	12	1079.93	1065.36	1096.51	1090.66	1100.17	1065.36	1086.53	14.09
	Xplan	-	-	-	-	-	-	-	-	-	-
	PORSCE-II	-	-	-	-	-	-	-	-	-	-
Exact-Matching 10	ComposIT	30	20	5236.27	5596.29	5352.37	5663.87	5263.09	5236.27	5422.38	195.88
	Xplan	-	-	-	-	-	-	-	-	-	-
	PORSCE-II	-	-	-	-	-	-	-	-	-	-

2. Semantic-Matching evaluation results

In this experiment, we evaluate the performance of ComposIT and PORSCE-II with the semantic-matching datasets. Note that OWLS-Xplan does not support semantic matchmaking. Therefore, this algorithm has been excluded from this evaluation and also from the WSC'08 evaluation, which requires semantic matchmaking.

PORSCE-II uses different threshold values to configure the maximum semantic concept distance allowed for a valid semantic matching. The values of these thresholds are directly related to the performance of the algorithm so that the higher the thresholds, the slower the algorithm runs, but the better the quality of the solutions. Given the difficulty of selecting an appropriate value for the threshold without compromising too much the performance, we selected the optimal value for each dataset based on a study using the Semantic-matching datasets in which we analyzed the performance of PORSCE-II by using incremental values of the threshold from 1 to 19, which corresponds with the maximum depth for all ontologies.

Plug-in threshold	Time (for Semantic-Matching 01)	% Datasets solved	Semantic-Matching datasets solved	Performance
1	6721.69	10%	01	100.00%
2	9735.35	30%	01, 02, 03	69.04%
3	13257.96	40%	01, 02, 03, 04	50.70%
4	15139.06	50%	01, 02, 03, 04, 07	44.40%
5	20059.98	50%	01, 02, 03, 04, 07	33.51%
6	20616.51	60%	01, 02, 03, 04, 06, 07	32.60%
7	24607.99	60%	01, 02, 03, 04, 06, 07	27.32%
8	27791.03	60%	01, 02, 03, 04, 06, 07	24.19%
9	31261.37	60%	01, 02, 03, 04, 06, 07	21.50%
10	33648.79	60%	01, 02, 03, 04, 06, 07	19.98%
11	34074.82	60%	01, 02, 03, 04, 06, 07	19.73%
12	40787.78	60%	01, 02, 03, 04, 06, 07	16.48%
13	40704.43	60%	01, 02, 03, 04, 06, 07	16.51%
14	44085.17	60%	01, 02, 03, 04, 06, 07	15.25%
15	47256.00	60%	01, 02, 03, 04, 06, 07	14.22%
16	50164.25	60%	01, 02, 03, 04, 06, 07	13.40%
17	51815.99	60%	01, 02, 03, 04, 06, 07	12.97%
18	56561.49	60%	01, 02, 03, 04, 06, 07	11.88%
19	58981.02	60%	01, 02, 03, 04, 06, 07	11.40%

Figure below shows the percentage of semantic problems solved by PORSCE-II with different values of the threshold and the performance measured by comparing the time taken by the algorithm to solve the Semantic-Matching 01 with the different thresholds. As can be seen, the computational cost increases (performance decreases) when the threshold is incremented. Thus, using a threshold of 10, the algorithm obtains a performance close to 20\%, which means that the algorithm is about 1/0.20=5 times slower than using a threshold of 1. Based on these results, we selected the optimal thresholds N for each datasets: N=1 for Semantic-Matching 01, N=2 for Semantic-Matching 02 and 03, N=3 for Semantic-Matching 04, N=4 for Semantic-Matching 07 and N=6 for Semantic-Matching 06. Note that ComposIT does not require any special configuration for the semantic datasets as it calculates matches at unlimited depth.

Dataset	Algorithm	Solution		Execution (ms)
Dataset	Algorithm	#Serv	#Length	Ex. 1	Ex. 2	Ex. 3	Ex. 4	Ex.5	Min	Avg	SD
Semantic-Matching 01	ComposIT	2	2	92.46	214.92	108.11	297.97	387.13	92.46	220.12	125.32
Semantic-Matching 01	PORSCE-II (1)	2	2	7487.74	6655.72	6537.85	6190.85	8326.61	6190.85	7039.75	862.66
Semantic-Matching 02	ComposIT	3	2	272.64	387.46	119.45	87.49	240.80	87.49	221.57	121.35
Semantic-Matching 02	PORSCE-II (2)	3	3	14355.59	14784.66	14467.15	13685.42	14779.62	13685.42	14414.49	449.48
Semantic-Matching 03	ComposIT	10	3	104.14	261.09	229.25	307.21	317.30	104.14	243.80	85.79
Semantic-Matching 03	PORSCE-II (2)	11	11	47524.98	49847.54	49750.68	45989.71	45359.04	45359.04	47694.39	2076.83
Semantic-Matching 04	ComposIT	5	3	287.12	213.46	97.21	112.70	101.86	97.21	162.47	84.48
Semantic-Matching 04	PORSCE-II (3)	7	7	29187.01	29593.40	29466.08	28759.73	31130.29	28759.73	29627.30	898.98
Semantic-Matching 05	ComposIT	40	23	684.00	694.36	682.87	272.57	282.76	272.57	523.31	224.32
Semantic-Matching 05	PORSCE-II (16)	-	-	-	-	-	-	-	-	-	-
Semantic-Matching 06	ComposIT	10	5	223.73	409.05	101.80	96.48	125.48	96.48	191.31	132.10
Semantic-Matching 06	PORSCE-II (6)	14	14	134258.53	114729.94	112889.12	114402.36	118113.16	112889.12	118878.62	8806.96
Semantic-Matching 07	ComposIT	20	8	332.64	402.18	178.90	132.44	342.98	132.44	277.83	115.80
Semantic-Matching 07	PORSCE-II (4)	35	35	154651.56	155128.96	153448.76	156402.12	153196.42	153196.42	154565.56	1305.71
Semantic-Matching 08	ComposIT	35	14	674.44	742.30	857.37	338.21	790.32	338.21	680.53	202.71
Semantic-Matching 08	PORSCE-II (17)	-	-	-	-	-	-	-	-	-	-
Semantic-Matching 09	ComposIT	20	12	445.83	295.22	151.83	180.21	684.96	151.83	351.61	219.36
Semantic-Matching 09	PORSCE-II (15)	-	-	-	-	-	-	-	-	-	-
Semantic-Matching 10	ComposIT	30	20	219.66	214.97	445.83	535.24	210.08	210.08	325.16	154.28
Semantic-Matching 10	PORSCE-II (17)	-	-	-	-	-	-	-	-	-	-

3. WSC'08 evaluation results

Dataset	Algorithm	Solution		Execution (ms)
Dataset	Algorithm	#Serv	#Length	Ex. 1	Ex. 2	Ex. 3	Ex. 4	Ex.5	Min	Avg	SD
WSC'08 01	ComposIT	10	3	308.45	223.35	224.23	217.05	231.03	217.05	240.82	38.13
WSC'08 01	PORSCE-II (2)	11	11	700552.14	703166.16	706210.96	709269.96	688332.37	688332.37	701506.32	8056.46
WSC'08 02	ComposIT	5	3	286.18	247.79	236.50	231.01	308.12	231.01	261.92	33.63
WSC'08 02	PORSCE-II (3)	8	8	3633027.68	3635780.31	3494138.55	3518414.74	3501751.84	3494138.55	3556622.62	71551.69
WSC'08 03	ComposIT	40	23	1448.94	1417.31	1429.48	1413.16	1457.06	1413.16	1433.19	19.27
WSC'08 03	PORSCE-II (16)	-	-	-	-	-	-	-	-	-	-
WSC'08 04	ComposIT	10	5	427.54	434.77	434.53	451.68	440.62	427.54	437.83	9.02
WSC'08 04	PORSCE-II (6)	16	16	13594552.10	13746460.11	13260928.67	13153932.78	13152705.66	13152705.66	13381715.86	272606.85
WSC'08 05	ComposIT	20	8	1052.68	1018.33	995.83	1019.80	1011.26	995.83	1019.58	20.80
WSC'08 05	PORSCE-II (4)	45	45	9569454.60	9642162.17	9461697.18	9900376.49	9735598.82	9461697.18	9661857.85	166822.85
WSC'08 06	ComposIT	42	7	3611.03	3587.10	3525.84	3535.71	3436.85	3436.85	3539.31	67.31
WSC'08 06	PORSCE-II (17)	-	-	-	-	-	-	-	-	-	-
WSC'08 07	ComposIT	20	12	4164.99	4043.20	4083.13	4062.16	4095.46	4043.20	4089.79	46.54
WSC'08 07	PORSCE-II (15)	-	-	-	-	-	-	-	-	-	-
WSC'08 08	ComposIT	30	20	5849.57	5751.64	5962.46	5740.37	5710.24	5710.24	5802.86	103.39
WSC'08 08	PORSCE-II (17)	-	-	-	-	-	-	-	-	-	-

You can download the validation logs for each algorithm here:

ComposIT Exact-Matching/Semantic-Matching/WSC'08 logs (download)
PORSCE-II Exact-Matching/Semantic-Matching/WSC'08 logs (download)
OWLS-Xplan Exact-Matching logs (download)

Usage

We provide three different java binaries for each composition algorithm. In order to launch a test, you must have installed the Java JDK 6+. Latest java JDK is available here. Once installed, you can run the test platform in GUI mode or in console mode. To launch the GUI interface, simply type:

java -jar algorithm.jar

Where algorithm.jar is one of the available algorithms:

CompositAlgorithm.jar (download)
PorsceAlgorithm.jar (download)
OWLSXplanAlgorithm.jar (download)

You can launch also a background test from the command line, with the following syntax:

java -jar algorithm.jar algorithm_name dataset_path http_server_port

Examples to launch tests on each repository with the dataset Exact-Matching 01, using the port 80 to create the http server to provide the services and redirecting the output to a file 01.txt:

ComposIT:

java -Xmx1024M -Xms512M -jar CompositAlgorithm.jar "ComposIT" "...\Datasets\Exact-matching\01" 80 > 01.txt

PORSCE-II:

java -Xmx1024M -Xms512M -jar algorithm.jar "PORSCE-II" "...\Datasets\Exact-matching\01" 80 > 01.txt

OWLS-Xplan:

java -Xmx1024M -Xms512M -jar OWLSXplanAlgorithm.jar "OWL-S Xplan 2.0" "...\Datasets\Exact-matching\01" 80 > 01.txt

Disclaimer

All the contents of this web-site are owned by the Centro de Investigación en Tecnoloxías da Información (CITIUS), University of Santiago de Compostela (USC), except the original versions of the WSC'08 datasets, OWLS-Xplan 2.0 and PORSCE-II, which are owned by their respective authors. The software available here is for private, non-commercial use and it is offered solely as a proof of concept for validation purposes.

ComposIT: A Semantic Web Service Composition Engine