We were encouraged by the findings of our previous work which showed that the designed sequences do not lead to promiscuous relationships and are always identifying parent or other structural fold members as templates. These sequences were deployed in databases of natural sequence families for structure recognition of protein families of yet unknown structure.
The assessment over a dataset of fold associated families prior to their application was carried out, and the performance in statistical measures was encouraging. Metrics such as normalized fold frequence, query HMM profile coverage in addition to strict E-values were employed to filter associations.
The use of designed sequences propagated iterative sequence searches leading to fold associations for 1372 protein families with yet unknown structure. For 20 structure-unknown families, the subsequent release provided a structure. We cross validated our assignments and found them to be correct for 18 families.