5 Replies Latest reply on May 17, 2019 8:55 AM by 1006501

    grouping time

    New User


      I just have started using the QSAR toolbox v4.3.1, and try to run some chemicals.

      but since few days ago the grouping time is extremely slow and I have no idea what the problem is...

      I searched the related problems ( about slow process..) and confimed that the profiling optinons are normal.

      I attached some pictures for the explanation.




      I'm trying to figure out the mutagenicity of this chemcial , the 189,559 (which stated picture below)of number is too huge( I think)  and it just never stop grouping...

      If you have solutions or guidelines for this matter,

      a tiny bit of help will be appreciated....

      thank you!

        • Re: grouping time
          1006501 New User

          Dear Dr. Park Jinhee,


          My name is Darina Yordanova  and I would be happy to assist you.


          All available chemicals in QSAR Toolbox are located in the "Data" module.

          The Data module is spitted into two parts - databases and inventories (Figure 1).

          Figure 1

          1) The databases are compilations of chemicals having experimental data

          2) The inventories are compilations of chemicals without experimental data.


          In order searching of analogues to be fast, all databases are preliminary cached with all available profiling schemes. However, this does not hold for the inventories (they are not pre-profiled).

          The analogues are searched in the selected databases/inventories. So, in your case, the inventories are probably also selected. Please uncheck them to be sure that you are only using the cached profiling results.

          Furthermore, if you are interested only in genotoxicity endpoints, you could select to search chemicals only in the databases containing such kind of data. To do this, just select the row corresponding to Genetic toxicity (1) , group the databases according to the Data for selected endpoint (2)  and then select all green databases (these are the databases containing genotoxicity data) (3) (Figure 2).

          Figure 2


          Kind regards,


          1 of 1 people found this helpful
            • Re: grouping time
              New User

              Dear Darina Yordanova,


              Thank you so much for the kind reply !!!

              I finally solved the problem that I've been struggle with..!!

              But I have so many curious things....

              like I specified before, the mutagenicity is what I really want to predict for the results,

              I attached the oecd toolbox procedure  to predict an chemical (CAS No. 75-76-3)

              I fully aware of the fact that you're very busy ,, but if you have time, can you check the file for me??

              It will be really appreciated!!


              Best regards,


                • Re: grouping time
                  1006501 New User

                  Dear Jinhee,

                  I checked the steps you have done and they are correct. The only comment is about the subcategorization steps, for which it is better to be applied after entering data gap filling (i.e. after click on Read-across button) in order to see how the chemicals/data points are located on the graph.

                  To the questions arose during the workflow:

                  1)      Slide 4: Are those different endpoints that I already choose? – The selected endpoint in the Input module helps the user by highlighting the databases/profilers relevant to the endpoint of interest. However, some of the databases contain data for more than one endpoint. Thus, this “Read data” dialog message provides the user possibility to collect data for all available endpoints in the selected databases or to collect data for specific endpoint. In your case the available data is coming from the ECHA CHEM database (which contain data for various endpoints).

                  2)      Slide 5: Selecting of profiler for searching of analogues – The “Organic functional groups” profiler, as well as the other structure-based profilers (such as Organic functional groups, Norbert Haider, US-EPA New Chemical Categories, Aquatic toxicity classification by ECOSAR), is suitable to be used for searching analogues. The goal is to form broad group of analogues chemical which will be subsequently subcategorized in order to find the most similar analogues.

                  3)      Slide 6: “Read data?” popped up  second time, after pressed OK.. Why? – Because once the analogues are found, the data needs also to be collected. Again here you could collect all data available for these chemicals in the selected databases or to collect data for the defined endpoint, only.

                  4)      Slide 7: The huge number of chemicals/data points is due to the ECHA CHEM database, which is selected within the “Data” module. This database is not highlighted (or it does not contain data for the endpoint as it is defined), so you can unselect it and to work only with the highlighted ones.

                  5)      Slides 8-10: Selecting of profilers for subcategorization – The subcategorization aims to remove the chemicals which are different to your target. Usually, it starts with some mechanism-based profilers (e.g. DNA alerts for AMES by OASIS, etc.) in order to eliminate the chemicals acting by different mechanisms. On the second step, some structure-based profilers (like Organic functional groups, Chemical elements Structural similarity, etc.) could be used. In this way, only the analogues which are mechanistically and structurally similar to your target chemical will remain.

                  You could see an illustrated scheme on how to build categories in the FAQ section of the Toolbox helpdesk:


                  Hope this is of help.


                  Kind regards,


                    • Re: grouping time
                      New User

                      Dear Darina,


                      I really appreciate your efforts to help me with the problems!!!!


                      and,,, I have some follwing questions,


                      you mentioned that

                      " about the subcategorization steps, for which it is better to be applied after entering data gap filling (i.e. after click on Read-across button) in order to see how the chemicals/data points are located on the graph "


                      so I took this advice, tried to make the results.

                      below are those process





                      ( same chemical 75-76-3)


                      ?? 2 echa chem ???? ??? ???? ??.PNG

                      this time unselected the "echa chem" -> clicked Gather,


                      but an pop up came out " there is no experimental data available for the chemicals of interest"

                      => is this can be an obstacle to predict this chemical??

                          if there's no experimental data, does this prediction reliability fragile?



                      anyway I kept proceed-!

                      ?? 3 difine ??.PNG


                      Category definition step -> clicked Define -> 22 chemicals found


                      ?? 4 ????? ???? ?? ?????.PNG


                      select all end poinnts ->  223 points and 21 chemcials found -> organic functional groups _Define



                      ?? 5 ??????? ?? ?? ???? ? ??.PNG

                      and This step I decided to click read-across first before the subcategorization step


                      clicking read-across - >  accept prediction ->

                      ?? 6 ???? ??????? ??.PNG

                      subcategorized with two options

                      - DNA bidning by OASIS

                      - in vitro mutagenicity (Ames test)



                      finally I made the results for 75-76-3 and this result is a little different than previous prediction reports!

                      as you can see, the tables are changed


                      (1)                                                                                                                                                     (2)




                      also the matrixes,








                      the two results seems quite different and I'm not sure which predction is better one...

                      (even though they both predict "negative" value


                      I'm so sorry that I have too many questions


                      but I can see that you are  an expert in this field....?


                      advices from you make a really big move for me to carry this program on.!!!


                      agian, thank you so much~~!


                      Best regards,


                        • Re: grouping time
                          1006501 New User

                          Dear Jinhee,

                          1. When click on “Gather” data the system just checks for available data for your target chemical in the selected databases. Many chemicals are not tested and therefore there are no experimental data available for them. This is an ordinary case and it is not an obstacle to predict the chemical.
                          2. The subcategorization steps should done within the data gap filling module (Fig.1). I am sorry, if it was not clear.




                          When you click on the “Accept prediction" button, then the prediction for you target chemical will be based on the current analogues. The prediction report includes all information for the workflow (primary grouping, subcategorizations, domain, etc.) finishing with a prediction. The subcategorizations after accepting the prediction will be not taken into account and will not present in the report.  Therefore, your both reports are different (and the data matrixes as well).


                          3. Basically, the prediction with consistent results for the analogues (all negative or all positive) is better than an prediction with conflict results (analogues with positive and negative data).


                          Kind regards,