-
Re: Successful importation of datasets
83059 Dec 2, 2010 3:40 PM (in response to Geoffrey Hynes)Excellent. Good to see that it works. Will add your clarifications to the manual.
thanks
Bob
-
Re: Successful importation of datasets
601256 Dec 7, 2010 12:43 AM (in response to 83059)Bob/Geoff,
I think there are still some issues on this one.
In general it is relatively easy to get a dataset to import but getting it to exactly the correct endpoint path, especially for long paths, is still tricky. So for example, although Geoff was able to import using the entire string Environmental Fate and Transport#Biodegradation#Biodegradation in water: screening tests#% Degradation I have been unable to get similar results with, "Ecotoxicological Information#Aquatic Toxicity#Mortality#EC50#48 h#Animalia#Arthropoda(Invertebrates)#Branchiopoda(branchiopods)#Daphnia magna".
I can get a successful import by just using the Defined portion, (Ecotoxicological Information#Aquatic Toxicity), but then have been unable to define the rest of the fields using the metadata tags, partly because it is not clear which tags refers to which points on the endpoint path, and partly because of the seemingly inconsistent behaviour of the import process, as below.
I have tried to follow the step by step instructions in the revised pdf for horizontal import of the Ecotoxicological example. When assigning metadata tags, for example Duration Unit as in the example on page 30 of the revised import wizard pdf, I cannot assign more than one column to the metadata tag "duration
In the example given in the pdf, the duration tag can have the value of "Mean value/Scale value" or "Unit" depending on the radio buttons selected. I cannot get these tgs to appear when I select the appropriate button, so can only define one field for duration. This means that when the data loads, it loads in an endpoint path that has "undefined duration" as part of its path. It maybe something to do with the "Set tree hierarchy" feature mentioned in the pdf but this does not appear to be very clearly explained.
In any event it appears that this is a problem even within the databases included in the Toolbox. I have noted some data entries which have an "undefined" element in their endpoint path, when I suspect they should appear in a fully defined path. If this is the case there may be significant amounts of data which cannot be used for read across because their endpoint paths have not been defined properly or the data in the original spreadsheet is not in the exactly correct format. Import would still appear OK as the "Import Successful" message does not necessarily mean that import has been successful. I may be completely wrong about all of this but I'm struggling for explanations for my own failure to import the data into the "correct" endpoint path. I have attached an extract of the data I'm trying to import, so if anyone can give me some tips I'd be most grateful.
Nick
-
tbimporttest.xls 10.0 K
-
Re: Successful importation of datasets
588921 Dec 7, 2010 3:36 PM (in response to 601256)Hi Nick,
I've written a response. It has some pictures in it so I've attached it as a Doc file.
Regards,
Georgi, LMC Team
-
Re: Successful importation of datasets
Geoffrey Hynes Dec 7, 2010 3:48 PM (in response to 588921)Hi Nick,
There does still seem to be some fundamental issues with the database importation wizard which I thought was linked to the exact matching of the tree path. However, after Nicks comments I have gone in and looked at a specific endpoint which I know has limited data.
For cyclophosphamide (CAS 50-18-0), the mouse lymphoma test is classified under the following tree path and is underfined.
Human health hazards#Genetic Toxicity#in vitro#Undefined Test type#Gene mutation
Whereas, for 2-aminoanthracene (CAS 613-13-8), the correct following tree path is supplied.
Human health hazards#Genetic Toxicity#in vitro#mammalian cell gene mutation assay#Gene mutation#Mouse Lymphoma cells
However, using this correct tree path by coping directly from the TB, does not seem to mean that it will correctly import as expected.
This procedure worked for:
Environmental Fate and Transport#Biodegradation#Biodegradation in water: screening tests#% Degradation
But not for:
Human health hazards#Genetic Toxicity#in vitro#mammalian cell gene mutation assay#Gene mutation#Mouse Lymphoma cells
As this was literally copied directly from the TB, pasted into Excel and then imported straight back in, there seems to be an issue with the TB. So I’m now not sure if I’m any further forward.
Hi /Georgi,
I will review your additional information and will hopefully be successful.
Best Regards, Geoff...
-
Re: Successful importation of datasets
601256 Dec 7, 2010 6:15 PM (in response to 588921)Georgi,
Thanks for the additional information. Can I take your points in turn:
1. The problem with Duration. I fully understand the significance of the difference between "define new Region" and "Metadata". However your comment ;
"Note that “Is value” has some particular behavior. It reacts when clicked but does not update properly when other column is set. As a rule of thumb when you select another column you should assume that it does not properly show its “is value” status and explicitly check/uncheck it." does answer my point re- multiple assignation of metadata tags "though particular behaviour" is a quaint way of putting it. Using your "check/uncheck method works, thanks. However, this fix seems to fail if all compounds are removed from the active window and a new set loaded. The program has to be rebooted for this fix to work again.
2. The endpoint tree. Again I've read the wizard pdf and I'm aware of how the data fields are made up from Defined regions and metadata. My point was not which metadata fields are displayed in the toolbox but rather how to get the data loaded into the "correct" endpoint path. For example, how do the metadata tags code for "Kingdom" "Phylum etc", because if only the species metadata tag is used the data will be dispayed as "Unknown Kingdom", "Unknown Phylum" etc.
You say "
The Animalia#Arthropoda(Invertebrates)#Branchiopoda(branchiopods) part is a separate feature in which Kingdom#Phylum#Class information is inserted before the field Test organisms (species)."
I'm not sure what this means. I've tried using the "Daphnia magna" part as my species metatag but, not surprisingly the data ends up in an "undefined kingdom/undefined phylum/undefined class" path. Also I've tried incorporating the kindom/phylum/class columns within my "Species" metatag, but with the same result. Since you have used my example small data set in your reply, maybe you could tell me if you successfully imported that data to the path Ecotoxicological Information#Aquatic Toxicity#Mortality#LC50#48 h#Animalia#Arthropoda(Invertebrates)#Branchiopoda(branchiopods)#Daphnia magna
3. Your comment about "consistent visual experience" misses the point. If all data from the same test/species/duration/etc. from all datbases are not defined and metatagged consistently then when the user forms groups for read across he/she will not have access to all the available data because some will contain one or more "undefined" fields as indicated by Geoff in his recent post. This will diminish the value of the Toolbox as a predictive aid since you need as much data in a category (group) as possible to improve the probability of a correct prediction. I suspect that quite a few datapoints from databases provided with the toolbox are not properly assigned to their "correct" endpoint path.
Nick
-
Re: Successful importation of datasets
588921 Dec 8, 2010 8:59 AM (in response to 601256)Nick,
1. I guess this is a bug. I will check this and make sure it is fixed in the next release.
2.
I've tried using the "Daphnia magna" part as my species metatag but,
You shoud tag the "Daphnia magna" column as "Test organisms (species)". Then the Toolbox engine will put the Kingdom#Phylum#Class information.
I did successfully import your example file with no problems. I've attached a screenshot with the designations I've used.
3. You are right. Right now the Toolbox offers the flexibility to import any data to any metadata field. You could import Daphnia magna to a field called Duration for instance which will then look off when the dynamic tree is built. Additional restrictions might be in order but I do not have additional information at the moment.
Georgi
-
designations.JPG 150.5 K
-
Re: Successful importation of datasets
601256 Dec 8, 2010 11:30 AM (in response to 588921)Georgi,
Many thanks for the rapid reply. SUCCESS at last.
I do wonder though if it might be unfortunate that the metatag label "Species" is not the one required to label the species column, but that "Test Organism (species) is the correct one. I think a full list of the metadata tags and their use context would be very useful.
Thanks also for the comments regarding the implementation of metadata in the Toolbox itself. I'm sure it would be a monumental task to check it all but, for instance I have found some examples where Ames tests using S.typhimurium TA100 appear in the tree path as being "Undefined Test organisms (species)". Also 878 data points from the OASIS Genotox database appear under "Human health hazards#Genetic Toxicity#in vitro#in vitro mammalian chromosome aberration test#Chromosome aberration#Undefined Test organisms (species)#without S9", but according to the exported endpoint data appear to be from Chinese Hamster lung cells.
-
Re: Successful importation of datasets
Geoffrey Hynes Dec 8, 2010 11:40 AM (in response to 588921)Hi Georgi,
The complexity seems to have increased substantially when importing proprietary databases. This was very simple, but affective in version 1 of the Toolbox.
I understand your comments and hence Nick's success, although I haven't tried this yet myself
However, can I ask why when the tree path is copied from the Toolbox and then pasted directly into Excel (i.e. Human health hazards#Genetic Toxicity#in vitro#mammalian cell gene mutation assay#Gene mutation#Mouse Lymphoma cells), why does it not import correctly?
Cheers,
Geoff...
-
Re: Successful importation of datasets
601256 Dec 8, 2010 12:25 PM (in response to Geoffrey Hynes)Geoff,
Pre-empting Georgi's reply, I presume the answer is that only the fields "Human health hazards" and "Genetic Toxicity" are recognised as legitimate primary fields in the database, whereas the remainder (#in vitro#mammalian cell gene mutation assay#Gene mutation#Mouse Lymphoma cells)) is only recognised if defined by the metadata tags.
Nick
-
Re: Successful importation of datasets
Geoffrey Hynes Dec 8, 2010 12:36 PM (in response to 601256)Hi Nick,
I agree, however I'm interested and wondered why the data in example 1 goes in correctly, but data in example 2 doesn't.
From Georgi's comments, the metadata (highlighted tree path) should need to be defined for both.
1). Environmental Fate and Transport#Biodegradation#Biodegradation in water: screening tests#% Degradation
2). Human health hazards#Genetic Toxicity#in vitro#mammalian cell gene mutation assay#Gene mutation#Mouse Lymphoma cells
I'm assuming that you have now separated you database out into individual Excel cells instead of all in a long string in a single cell?
If that's the case, I may revert to my original set-up and try this again selecting all the metadata.
Cheers,
Geoff...
-
Re: Successful importation of datasets
588921 Dec 8, 2010 1:59 PM (in response to Geoffrey Hynes)Geoffrey,
The import works on leaf node from the predefined tree (the 1st path) and not on dynamic path (the 2nd one). You could see which is which if you press the Ctrl key - this will underline the predefined part of the tree(see attached file).
If you want to see what defines the Dynamic part you can click on the Human Health Hazards#Genetic Toxicity and you will see what metadata fields are used to define the hierarchy.
Georgi
-
PredefinedAndDynamic.JPG 14.1 K
-
OnlyPredefined.JPG 24.5 K
-
-
Re: Successful importation of datasets
601256 Dec 8, 2010 2:46 PM (in response to Geoffrey Hynes)Hi Geoff,
I've completely deleted the "Mortality#LC50#48 h#Animalia#Arthropoda(Invertebrates)#Branchiopoda(branchiopods)" columns from my spreadsheet. All that is needed is the column containing the predefined region, "Endpoint path" (in my case it's "Ecotoxicological Information#Aquatic Toxicity") and the species column (Daphnia magna). So long as my species column is metatagged as "Test Organism (Species)", the Toolbox fills in the rest. Of course I still need the columns for duration, units etc.
Nick
-
Re: Successful importation of datasets
Geoffrey Hynes Dec 8, 2010 3:11 PM (in response to 601256)Hi Nick/Georgi,
Success.
Using the crtl key to see the predefined tree path for each parameter helps as per your previous emails (I'd forgotten about this). Then defined the dynamic tree path in individual cells and as long as these are the same as in the additional information guide Georgi sent, everything links in nicely.
I still think this is overly complicated compared to version 1 of the TB, but it now works.
I was beginning to think I'd need separate databases for the 4 predefined areas, but not now, which is a major bonus.
After a lot of work, cheers all,
Geoff…
-
Re: Successful importation of datasets
601256 Dec 8, 2010 3:49 PM (in response to Geoffrey Hynes)Geoff,
I think the old Swarzenegger line "I'll be back" might be more appropriate.
Hope to talk again.
Nick
-
Re: Successful importation of datasets
Geoffrey Hynes Dec 8, 2010 4:28 PM (in response to 601256)Cheers Nick, your comments made it much easier for me, appreciated.
I'm on the advanced course in Barcelona next week, so that may give rise to many more questions such as the profilers v structural similiarity thoughts as I'm going to try and have a quick discussion with Prof. Mekenyan about this.
Any updates to this or anything that may be helpful, I'll post the following week or after Christmas.
Have a good Christmas break,
Geoff...
-
-
-
-
-
-
-
-
-
-
-