My bibliography
Wróblewski J., Kowalski M., 2014. On Complexity of Effective Data Granulation in Databases. | |
We present the problem of splitting objects from a finite set into heterogenous groups of equal or almost equal cardinalities. The problem resembles the classic problem of data clustering, but additional constraints on the groups' size and global measure of clustering quality make well known clustering algorithms inapplicable. We believe that such a problem occurs in many important aspects of computer science. We will consider its context in the area of databases and reference a ``body of research'': the Infobright RDBMS. We've noticed that the problem emerges naturally in the use and design of Infobright's technology. In this paper we define the problem and present some specific applications. The main contribution of the article is the proof that the problem is NP-hard. |
Wróblewski J., Stawicki S., 2014. SQL-Based KDD with Infobright's RDBMS: Attributes, Reducts, Trees. | |
We present an example of a framework for KDD process implemented as SQL queries and procedures, consisting of new attribute construction, reduct finding and inducing decision trees. We focus especially on attribute reduction, which is one of the most important tasks in data mining, especially for high-dimensional data sets. In such domains like natural language processing, signal processing, microarray analysis, it is crucial to reduce many thousands of attributes into a small set of the most meaningful ones. The main technical contribution of this paper is a complete framework for calculating short reducts using SQL queries on data stored in a relational database system, without any need of external tools generating or modifying their syntax. A case study of large, high-dimensional and imbalanced real-world data is presented. The paper also recalls some other examples of SQL-based data mining implementations, in the areas such as feature extraction and decision tree induction. The experimental results are based on the usage of Infobright's analytic database system, whose performance characteristics perfectly fit the needs of presented algorithms. |
Ślęzak D., Synak P., Wojna A., Wróblewski J., 2013. Two Database Related Interpretations of Rough Approximations: Data Organization and Query Execution. | |
We present analytic data processing technology derived from the principles of rough sets and granular computing. We show how the idea of approximate computations on granulated data has evolved toward complete product supporting standard analytic database operations and their extensions. We refer to our previous works where our query execution algorithms were described in terms of iteratively computed rough approximations. We explain how to interpret our data organization methods in terms of classical rough set notions such as reducts and generalized decisions. |
Ślęzak D., Synak P., Borkowski J., Wróblewski J., Toppin G., 2012. A Rough-Columnar RDBMS Engine – A Case Study of Correlated Subqueries. | |
Columnar databases provide a number of benefits with regard to both data storage (e.g.: data compression) and data processing (e.g.: optimized data access, parallelized decompression, lazy materialization of intermediate results). Their characteristics are particularly advantageous for exploratory sessions and ad hoc analytics. The principles of columnar stores can be also combined with a pipelined and iterative processing, leading toward modern analytic engines able to handle large, rapidly growing data sets. In this paper, we show how to further enrich such a framework by employing metadata layers aimed at minimizing the need of data access. In particular, we discuss the current implementation and the future roadmap for correlated subqueries in Infobright’s RDBMS, where all above-mentioned architectural features interact with each other in order to improve the query execution. |
Ślęzak D., Synak P., Wróblewski J., Borkowski J., Toppin G., 2012. Rough Optimizations of Complex Expressions in Infobright’s RDBMS. | |
We discuss the importance of analytic SQL statements with complex expressions in the business intelligence and knowledge discovery applications. We report the recent improvements of the execution of complex expressions in the Infobright’s RDBMS, which is based on the paradigms of columnar databases and adaptive rough computations over the granulated metadata layer. |
Ślęzak D., Synak P., Toppin G., Wróblewski J., Borkowski J., 2012. Rough SQL - Semantics and Execution. | |
We introduce rough query, which is a new approach to defining and computing SQL approximations. Rough query results are reported by means of simple metadata of attributes in an information system that would be a result of a standard select statement. The proposed approach is already available as an SQL extension in both Infobright Community and Enterprise Editions. Rough queries are computed practically in real time by basing on the statistical information layer, which was used so far only for the standard query optimization. |
Ślęzak D., Synak P., Wróblewski J., Toppin G., 2010. Infobright – Analytic Database Engine using Rough Sets and Granular Computing. | |
We discuss the usage of the paradigms of rough sets and granular computing in the core components of the Infobright's database engine. Having data stored in the form of compressed blocks of attribute values, our query execution methods utilize compact information about those blocks' contents instead of brute-force data decompression. The paper contains examples of algorithms implemented with the aim to minimize the need of accessing the compressed data in such operations as filtering, joining and aggregating. |
Ślęzak D., Kowalski M., Eastwood V., Wróblewski J., 2009. Methods and systems for database organization. |
Ślęzak D., Wróblewski J., Eastwood V., Synak P., 2008. Rough Sets in Data Warehousing. | |
The theory of rough sets, based on the universal framework of information systems, provides a powerful model for representing patterns and dependencies both in databases and in data mining. On the one hand, although there are numerous rough set applications to data mining and knowledge discovery, the usage of rough sets inside the database engines is still quite an uncharted territory. On the other hand, however, this situation is not so exceptional given that even the most well-known paradigms of machine learning, soft computing, artificial intelligence, and approximate reasoning are still waiting for more recognition in the database research, with huge potential in such areas as, e.g., physical data model tuning or adaptive query optimization. |
Ślęzak D., Wróblewski J., Eastwood V., Synak P., 2008. Brighthouse: An Analytic Data Warehouse for Ad-hoc Queries. | |
Brighthouse is a column-oriented data warehouse with an automatically tuned, ultra small overhead metadata layer called Knowledge Grid, that is used as an alternative to classical indexes. The advantages of column-oriented data storage, as well as data compression have already been well documented, especially in the context of analytic, decision support querying. This paper demonstrates additional benefits resulting from Knowledge Grid for compressed, column-oriented databases. In particular, we explain how it assists in query optimization and execution, by minimizing the need of data reads and data decompression. |
Apanowicz C., Eastwood V., Ślęzak D., Synak P., Wojna A., Wojnarski M., Wróblewski J., 2008. Method and System for Data Compression in a Relational Database. |
Apanowicz C., Eastwood V., Ślęzak D., Synak P., Wojna A., Wojnarski M., Wróblewski J., 2008. Method and System for Storing, Organizing and Processing Data in a Relational Database. |
Ślęzak D., Wróblewski J., 2007. Roughfication of Numeric Decision Tables: The Case Study of Gene Expression Data. | |
We extend the standard rough set-based approach to be able to deal with huge amounts of numeric attributes versus small amount of available objects. We transform the training data using a novel way of non-parametric discretization, called roughfication (in contrast to fuzzification known from fuzzy logic). Given roughfied data, we apply standard rough set attribute reduction and then classify the testing data by voting among the obtained decision rules. Roughfication enables to search for reducts and rules in the tables with the original number of attributes and far larger number of objects. It does not require expert knowledge or any kind of parameter tuning or learning. We illustrate it by the analysis of the gene expression data, where the number of genes (attributes) is enormously large with respect to the number of experiments (objects). |
Ślęzak D., Wróblewski J., 2006. Rough Discretization of Gene Expression Data. | |
We adapt the rough set-based approach to deal with the gene expression data, where the problem is a huge amount of genes
(attributes) a \in A versus small amount of experiments (objects) u \in U. We perform the gene reduction using standard
rough set methodology based on approximate decision reducts applied against specially prepared data. We use rough
discretization – Every pair of objects (x,y) \in UxU yields a new object, which takes values “>=a(x)” if and
only if a(y)>=a(x); and “< a(x)” otherwise; over original genes-attributes a. In this way: 1) We work with desired, larger number of objects improving credibility of the obtained reducts; 2) We produce more decision rules, which vote during classification of new observations; 3) We avoid an issue of discretization of real-valued attributes, difficult and leading to unpredictable results in case of any data sets having much more attributes than objects. We illustrate our method by analysis of the gene expression data related to breast cancer. |
Szczuka M., Wróblewski J., 2006. A Rough-Neural Approach to Classifier Networks. | |
We discuss the notion of hierarchical concept (classifier) schemes. Processes of construction, tuning and learning of hierarchical structures of concepts (granules of knowledge) are presented. The proposed solution consists of a generalised structure of feedforward neural-like network approximating the intermediate concepts similarly to traditional neurocomputing approaches. Fundamental works are also supported by a more practical part where implementation and experimental verification of presented ideas is discussed. We provide the examples of compound concepts corresponding to the Bayesian and rule based classifiers, and show some intuition concerning their processing through the network. |
Rahman M.M., Ślęzak D., Wróblewski J., 2005. Parallel Island Model for Attribute Reduction. | |
We develop a framework for parallel computation of the optimal rough set decision reducts from data. We adapt the island model for evolutionary computing. The idea is to optimize reducts within separate populations (islands) and enable the best reducts-chromosomes to migrate among islands. Experiments show that the proposed method speeds up calculations and also provides often better quality of results, comparing to genetic algorithms applied so far to the attribute reduction. |
Wróblewski J., 2005. Pairwise Cores in Information Systems. | |
A core in information system is a set of attributes globally necessary to
distinct objects from different decision classes (i.e. the intersection of
all reducts of the information system). A notion of a pairwise core (2-core), which
naturally extends the definition of a core into the case of pairs of attributes
is presented. Some useful features concerned with the graph representation
of pairwise cores are discussed. The paper presents also practical application of the notion of 2-core. It is known that a core (if exists) may be used to improve the reduct finding methods, since there exist polynomial algorithms for core construction. The same may be proven for a 2-core, which may be also used for estimation of minimal reduct size. |
Szczuka M., Ślęzak D., Wróblewski J., 2004. Feedforward concept networks. | |
The paper presents an approach to construction of hierarchical structures of data based concepts (granules), extending the idea of feedforward neural networks. The operations of processing the concept information and changing the concept specification through the network layers are discussed. Examples of the concepts and their connections are provided with respect to the case study of learning hierarchical rule based classifiers from data. The proposed methods are referred to the foundations of granular and rough-neural computing. |
Szczuka M., Ślęzak D., Wróblewski J., 2004. Harnessing classifier networks - towards hierarchical concept construction. | |
The process of construction and tuning of classifier networks is discussed. The idea of relating the basic inputs with the target classification concepts via the internal layers of intermediate concepts is explored. Intuitions and relationships to other approaches, as well as the illustrative examples are provided. |
Wieczorkowska A., Wróblewski J., 2004. Octave-Error Proof Timbre-Independent Estimation of Fundamental Frequency of Musical Sounds. | |
Estimation of fundamental frequency (so called pitch tracking) can be performed using various methods. However, all these algorithms are susceptible to errors, especially octave ones. In order to avoid these errors, pitch-trackers are usually adjusted to particular musical instruments. Therefore problem arises when one wants to extract fundamental frequency independent on the timbre. Our goal was to elaborate method of fundamental frequency extraction, which works correctly for any timbre. We propose multi-algorithm approach, where fundamental frequency estimation is based on results coming both from a range of frequency tracking methods, and additional parameters of sound. Also, we propose frequency tracking based on direct analysis of signal and its spectrum. We explain the structure of our estimator and the obtained results for various musical instruments and sound articulation techniques. |
Szczuka M., Ślęzak D., Wróblewski J., 2003. Constructing Extensions of Bayesian Classifiers with use of Normalizing Neural Networks. | |
We introduce a new neural network model that generalizes the principles of the Naive Bayes classification method. It is trained with use of backpropagation-like algorithm, in purpose of obtaining optimal combination of several classifiers. Experimental results are presented. |
Bazan J., Skowron A., Ślęzak D., Wróblewski J., 2003. Searching for the complex decision reducts: The case study of the survival analysis. | |
Generalization of the fundamental rough set discernibility tools aiming at searching for relevant patterns for complex decisions is discussed. As an example of application, there is considered the post-surgery survival analysis problem for the head and neck cancer cases. The goal is to express dissimilarity between different survival tendencies by means of clinical information. It requires handling decision values in form of plots representing the Kaplan-Meier product estimates for the groups of patients. |
Ślęzak D., Szczuka M., Wróblewski J., 2003. Neural Network Architecture for Synthesis of the Probabilistic Rule Based Classifiers. | |
We introduce the notion of a probabilistic neural network (PNN), where the propagated signals take the form of finite probability distributions. Appropriately tuned PNN can be applied as the compound voting measure while classifying new cases at the basis of approximate decision reducts extracted from the training data. We provide a general scheme of such a classification process, as well as some theoretical issues concerning the PNN learning. |
Ślęzak D., Wróblewski J., 2003. Order based genetic algorithms for the search of approximate entropy reducts. | |
We use entropy to extend the rough set based notion of a reduct. We show that the order based genetic algorithms, applied to the search of classical decision reducts, can be used in exactly the same way in case of extracting optimal approximate entropy reducts from data. |
Wieczorkowska A., Wróblewski J., Ślęzak D., Synak P., 2003. Problems with Automatic Classification of Musical Sounds. | |
Convenient searching of multimedia databases requires well annotated data. Labeling sound data with information like pitch or timbre must be done through sound analysis. In this paper, we deal with the problem of automatic classification of musical instrument on the basis of its sound. Although there are algorithms for basic sound descriptors extraction, correct identification of instrument still poses a problem. We describe difficulties encountered when classifying woodwinds, brass, and strings of contemporary orchestra. We discuss most difficult cases and explain why these sounds cause problems. The conclusions are drawn and presented in brief summary closing the paper. |
Wieczorkowska A., Wróblewski J., Ślęzak D., Synak P., 2003. Application of temporal descriptors to musical instrument sound recognition. | |
An automatic content extraction from multimedia files is recently being extensively explored. However, an automatic content description of musical sounds has not been broadly investigated and still needs an intensive research. In this paper, we investigate how to optimize sound representation in terms of musical instrument recognition purposes. We propose to trace trends in the evolution of values of MPEG-7 descriptors in time, as well as their combinations. Described process is a typical example of KDD application, consisting of data preparation, feature extraction and decision model construction. Discussion of efficiency of applied classifiers illustrates capabilities of possible progress in the optimization of sound representation. We believe that further research in this area would provide background for an automatic multimedia content description. |
Wróblewski J., 2003. Adaptive aspects of combining approximation spaces. | |
The paper addresses issues concerning a problem of constructing optimal classification algorithm. A notion of parameterized approximation space is used to model a process of the classifier construction. The process can be viewed as hierarchical searching for optimal information granulation to fit a concept described by empirical data. A problem of combining several parameterized information granules (given by classification algorithms) to obtain global data description is described. Some solutions based on adaptive methods are presented. |
Bazan J., Osmólski A., Skowron A., Ślęzak D., Szczuka M., Wróblewski J., 2002. Rough set approach to the survival analysis. | |
Application of rough set based tools to the post-surgery survival analysis is discussed. Decision problem is defined over data related to the head and neck cancer cases, for two types of medical surgeries. The task is to express the differences between expected results of these surgeries and to search for rules discerning different survival tendencies. The rough set framework is combined with the Kaplan-Meier product estimation and the Cox's proportional hazard modeling. |
Bazan J., Szczuka M., Wróblewski J., 2002. A New Version of Rough Set Exploration System. | |
We introduce a new version of the Rough Set Exploration System - a software tool featuring a library of methods and a graphical user interface supporting variety of rough-set-based computations. Methods, features and abilities of the implemented software are discussed and illustrated with a case study in data analysis. |
Ślęzak D., Wróblewski J., 2002. Approximate bayesian network classifiers. | |
Bayesian network (BN) is a directed acyclic graph encoding probabilistic independence statements between variables. BN with decision attribute as a root can be applied to classification of new cases, by synthesis of conditional probabilities propagated along the edges. We consider approximate BNs, which almost keep entropy of a decision table. They have usually less edges than classical BNs. They enable to model and extend the well-known Naive Bayes approach. Experiments show that classifiers based on approximate BNs can be very efficient. |
Ślęzak D., Wróblewski J., 2002. Order-based genetic algorithms for extraction of approximate bayesian networks from data. | |
Approximate bayesian nets (a-BN's) almost keep the information entropy of data tables and encode knowledge about approximate dependencies between features. Advantages of applying a-BN's to real-life data analysis are discussed. Genetic-based algorithmic framework for extracting a-BN's from data is proposed. |
Ślęzak D., Synak P., Wróblewski J., 2002. Rough Set Based Feature Extraction for Medical Data. | |
Rough-set-based KDD tools are applied to analysis of PKDD'2001 Discovery Challenge data set thrombosis. We focus on the phase of the feature extraction, aiming at finding new attributes which enable better classification of new cases. |
Ślęzak D., Synak P., Wieczorkowska A., Wróblewski J., 2002. KDD-based approach to musical instrument sound recognition. | |
Automatic content extraction from multimedia files is a hot topic nowadays. Moving Picture Experts Group develops MPEG-7 standard, which aims to define a unified interface for multimedia content description, including audio data. Audio description in MPEG-7 comprises features that can be useful for any content-based search of sound files. In this paper, we investigate how to optimize sound representation in terms of musical instrument recognition purposes. We propose to trace trends in evolution of values of MPEG-7 descriptors in time, as well as their combinations. Described process is a typical example of KDD application, consisting of data preparation, feature extraction and decision model construction. Discussion of efficiency of applied classifiers illustrates capabilities of further progress in optimization of sound representation. We believe that further research in this area would provide background for automatic multimedia content description. |
Wróblewski J., 2001. Adaptacyjne metody klasyfikacji obiektów. |
Bazan J., Nguyen H.S., Nguyen S.H., Synak P., Wróblewski J., 2000. Rough Set Algorithms in Classification Problem. | |
In the paper we present some algorithms, based on rough set theory, that can be used for the problem of new cases classification. Most of the algorithms were implemented and included in ROSETTA system. We present several methods for computation of decision rules based on reducts. We discuss the problem of real value attribute discretization for increasing the performance of algorithms and quality of decision rules. Finally we deal with a problem of resolving conflicts between decision rules classifying a new case to different categories (classes). |
Demczuk M., Janecki J., Wardzińska A.B., Marczykowska T., Skomerska A., Wróblewski J., 2000. Cardiological database in a community hospital - 10 years of experience. | |
A multi-ward community hospital providing care in a rural area and localized in a relatively small town creates some specific possibilities
in management of medical data. After the appropriate amount of time every one inhabitant of the area who has some health’s problems must have
his/her file in such a hospital. Thus, routine observational data collected in such a hospital may be a very rich source of a valuable and unique
information, useful both while adequate territory-oriented medical policy is discussed as well as more general evidence-based medical
scientific aims are considered. The aim of this paper is to discuss, on the basis of our ten-year experience, some specific problems we found and to present our solutions. |
Ślęzak D., Wróblewski J., 2000. Application of normalized decision measures to the new case classification. | |
The optimization of rough set based classification models with respect to parameterized balance between a model's complexity and confidence is discussed. For this purpose, the notion of a parameterized approximate inconsistent decision reduct is used. Experimental extraction of considered models from real life data is described. |
Wróblewski J., 2000. Ensembles of classifiers based on approximate reducts. | |
A problem of improving rough set based expert systems by modifying a notion of reduct is discussed. A notion of approximate reduct is introduced, as well as some proposals of quality measure for such a reduct. A complete classifying system based on approximate reducts is presented and discussed. It is proved that a problem of finding optimal set of classifying agents based on approximate reducts is NP-hard; a genetic algorithm is used to find the suboptimal set. Experimental results show, that the classifying system is effective and relatively fast. |
Wróblewski J., 2000. Analyzing relational databases using rough set based methods. | |
One of the most important problems in KDD applications is a size of real-world databases. In practical problems data may contain millions of records in many data tables bounded by relations. On the other hand, most of theoretical works and practical applications concentrate on relatively small data sets collected in single tables. The paper proposes a new approach to the problem of analysis of relational databases. The proposed methodology woks in adaptive way: if a considered data table is not sufficient to create satisfactory set of rules, new attributes (generated by arithmetical operations or based on database relations) are added to information system. |
Ślęzak D., Wróblewski J., 1999. Classification Algorithms Based on Linear Combinations of Features. | |
We provide theoretical and algorithmic tools for finding new features which enable better classification of new cases. Such features are proposed to be searched for as linear combinations of continuously valued conditions. Regardless of the choice of classification algorithm itself, such an approach provides the compression of information concerning dependencies between conditional and decision features. Presented results show that properly derived combinations of attributes, treated as new elements of the conditions' set, may significantly improve the performance of well known classification algorithms, such as k-NN and rough set based approaches. |
Wróblewski J., 1998. A Parallel Algorithm for Knowledge Discovery System. | |
A rough set based knowledge discovery system is presented. The system is based on the decomposition of large data tables into smaller ones in such a way that the approximation of global decision from local ones (related to these smaller tables) can be obtained. A set of locally generated rules can be collected for achieving the sufficient approximation of the global decision algorithm. A parallel (distributed) system for data analysis using local reduct-based rule generator is described and discussed. Experiment results on large data tables are presented. |
Wróblewski J., 1998. Covering with reducts - a fast algorithm for rule generation. | |
In a rough set approach to knowledge discovery problems, a set of rules is generated basing on training data using a notion of reduct. Because a problem of finding short reducts is NP-hard, we have to use several approximation techniques. A covering approach to the problem of generating rules based on information system is presented in this article. A new, efficient algorithm for finding local reducts for each object in data table is described, as well as its parallelization and some optimization notes. A problem of working with tolerances in our algorithm is discussed. Some experimental results generated on large data tables (concerned with real applications) are presented. |
Wróblewski J., 1998. Genetic algorithms in decomposition and classification problem. | |
Advantages of both genetic and heuristic algorithms can be exploited by
hybrid algorithms - nondeterministic, problem-oriented heuristics controlled
by genetic algorithm. In this work some examples of hybrid systems are
presented, as well as the theory of order-based genetic algorithms being
used in these systems. The article involves two examples of application of hybrid algorithms supporting rough set methods in knowledge discovery: a system for short reducts generation and a method of templates finding. |
Ivkov M., Korjik V., Wróblewski J., 1997. On generalized greedy algorithms in broadcast key distribution schemes. | |
In another work a key distribution scheme for broadcast encryption based on
2-designs was proposed. The connectivity of this scheme is the maximal number of messages
that should be broadcasted by the center to the non-compromised users to generate a
common key. This value was upper bounded in another work with the use of the greedy
algorithm. We propose the generalized greedy algorithm and some randomized modifications
of it (a hybrid genetic-greedy algorithm among them) to improve a connectivity of broadcast
key distribution schemes. We present the results of simulations for different 2-design
matrices to prove this property and to find a computation time when different algorithms
are used. A few methods for row covering problem in special case of reduced 2-design matrices are presented. Experimental results show, that relatively fast randomized weighted greedy algorithm (RWGA) method is efficient for this problem. In some real-time applications we can use suboptimal but fast WGA method. All these methods give better results than simple random covering method. A method using genetic algorithm (ordinal-based GA) gives similar results than RWGA, but is significantly slower. |
Nguyen S. H., Skowron A., Synak P., Wróblewski J., 1997. Knowledge Discovery in Databases: Rough Set Approach. | |
The primary goal of data mining and knowledge discovery is to find patterns (laws, models) of real world objects (states, processes, phenomena) from gathered imperfect data. One of the problem is to find descriptions of each decision classes. This can be done using a variety of data mining or machine learning methods. We propose some methods for extracting from data some elementary blocks from which these descriptions are constructed. These elementary blocks are sets of similar objects defined by so called templates or similarity relations. We focus on rough set approach to the searching problem for elementary blocks included (or almost included) in and together covering a given decision class. We also discuss another searching method for global elementary blocks i.e. blocks creating a partition of the object domain into sub-domains what is needed when the size of investigated table is too large to apply existing standard methods. After decomposition of large object domain into sub-domains some local decision rules are found for all obtained sub-domains. Next they are used to classify new unseen objects. In this paper we concentrate on effective methods searching for different kinds of templates and methods for their aggregation. We also present a general searching scheme for approximate description of decision classes as well as an improved method for short reducts generation. |
Polkowski L., Skowron A., Synak P., Wróblewski J., 1996. Searching of Approximate Description of Decision Classes. | |
We discuss a searching method for synthesis of approximate description
of decision classes in large data tables (decision tables). The method
consists of the following stages: - searching for basic templates which are next used as elementary building blocks for decision classes description; - performing templates grouping as a pre-processing for generalisation and contraction; - generalisation and contraction operations performed on the results of grouping. The main goal of the method is to synthesise the approximate description of decision classes. The control in the searching method is focused on attempts to reduce uncertainty in approximate description of decision classes. On the other hand uncertainty in temporary synthesised descriptions of decision classes is the main "driving force" for the searching method. In the paper we concentrate on the different methods for template generation and on the discussion of performed computer experiments. We also present a general searching scheme for approximate description of decision classes as well as we point out the relationship of our approach to the rough mereological approach to the synthesis of complex objects. The presented method can be adopted for synthesis of adaptive decision algorithms what is the goal of our ongoing research project. |
Wróblewski J., 1996. Algorytmy genetyczne - ćwiczenia. |
Wróblewski J., 1996. Theoretical Foundations of Order-Based Genetic Algorithms. | |
A lot of research on genetic algorithms theory is concentrated on classical, binary case. However, there are many other types of useful genetic algorithms (GA), e.g. tree-based (genetic programming), or order-based ones. This paper shows, that many of classical results can be transferred into the order-based GAs. The analysis includes the Schema Theorem and Markov chain modelling of order-based GA. |
Wróblewski J., 1996. Rozwiązywanie niektórych problemów NP-trudnych za pomocą algorytmów genetycznych: podstawy teoretyczne i zastosowania. | |
Note: The most of original results (written in Polish) of this master's thesis was published in the Fundamenta Informaticae vol. 28 (1996) paper - see above. |
Nguyen S. H., Nguyen T. T., Polkowski L., Skowron A., Synak P., Wróblewski J., 1996. Decision Rules for Large Data Tables. | |
Data mining and knowledge discovery in large databases in recent years have become hot topics not only for researchers, they have as well attraped the attention of many major companies in the information technology domain. In this paper we present an approach to the problem, based on rough set methods. The main aim is to show how rough set and rough mereology theories can be effectively used to extract knowledge from large databases. |
Wróblewski J., 1995. Finding minimal reducts using genetic algorithms. | |
An application of three types of genetic algorithm to short reducts finding is presented. The first mehtod: classical genetic algorithm with individuals represented by bit strings, appeared to be fast, but sometimes fails to find the global optimum. The second and third method bases on permutational coding and "greedy" algorithms. The results are much better, but the computation time increases. |
Wróblewski J., 1995. Genetic approach to scalar quantisation. | |
The scalar quantisation problem can be formulated as follows: for the probability distribution H(z), z from [a , b], find n representatives and n+1 breakpoints such that the mean-square
error of quantisation is minimal. The classical algorithm for optimal quantisation (centroid algorithm )was developed in 1960 by J. Max, but the results are very sensitive to the initial state,
and sometimes they are far from the global optimum. The existing methods for obtaining good initial states are useless, when we have no explicit formula for H, or if the distribution H is discrete. Genetic Algorithm (GA) is one of the non-deterministic search method, based on the Darwinian principle of natural selection. This method can successfully find the global optimum by combining two processes: the local search on the neighbourhood of the good solutions and the global search on the distant areas with a low probability. GA finds the individual s from the state space S, that maximises the fitness value F(s), for the given positive function F. First, the population of N random objects from S is selected. Then, the population is changed T times by the selection process (individuals with the low value of F are replaced by these with the higher value of F) and by the genetic operators (mutation: small, random change of individual; crossing-over: exchange of the "genetic material" between two individuals). The best individual of the last population is taken as a result of optimisation. In the GA approach to the quantisation problem every individual (scalar quantiser) is represented as an ordered set of representatives - floating point values in the ascending order. The initial population is created randomly with the probability distribution H. The fitness value for each individual is defined as a function of its mean-square error. The mutation affects one position of each individual with the probability Pm. The new value is selected randomly with the uniform distribution. The crossing-over operator is constructed so as the order of the values in chromosome is not disturbed: the random cross point k is selected, and the descendant of the random individuals 1 and 2 from population is combined from k elements of individual 1 and n-k+1 ones of individual 2. The centroid algorithm is applied to the whole or a part of population as an additional technique. This time-consuming optimisation is done once, after a few generations (usually after 20% of total number of generations). In the "promising objects" method, we attempt to predict the final result of the classical centroid algorithm after a few iteration. This method can be used to improve properties of the genetic algorithm (GA-PO). The fitness function is based not on the error value of the individual, but on the error value after several (usually 2 or 4) iterations of the centroid algorithm. The result of GA-PO - the best individual - is optimised then by the centroid algorithm. The method of "promising objects" can be used separately as the new search technique (PO): we can randomly choose a set of individuals, select the "most promising" of them and use the centroid algorithm to bring it to the local minimum. A few of grey-scale images were used as a source of data for experiments. All the algorithms were tested on the discrete distributions of 256 values, which are to be reduced to n = 16 levels. The experimental results show, that GA-PO method is more effective than the standard centroid algorithm, and (for most of the tested images) centroid algorithm and GA, working during the same time (especially when the distribution H is irregular). This method (in the discrete version) can be used to scalar image quantisation (e.g. from the 8-bit grey scale to the 4-bit one), when the accuracy is more important than the time complexity. |