DSpace Repository

Scalable Sequential Pattern Mining for Biological Sequences *

Show simple item record

dc.contributor.author Wang Ke
dc.contributor.author Xu Yabo
dc.contributor.author Xu Jeffrey
dc.contributor.author Yu
dc.date.accessioned 2018-01-22T17:24:23Z
dc.date.available 2018-01-22T17:24:23Z
dc.date.issued 2004
dc.identifier.uri http://hdl.handle.net/123456789/6913
dc.description.abstract Biosequences typically have a small alphabet, a long length, and patterns containing gaps (i.e., " don't care ") of arbitrary size. Mining frequent patterns in such sequences faces a different type of explosion than in transaction sequences primarily motivated in market-basket analysis. In this paper, we study how this explosion affects the classic sequential pattern mining, and present a scalable two-phase algorithm to deal with this new explosion. The Segment Phase first searches for short patterns containing no gaps, called segments. This phase is efficient. The Pattern Phase searches for long patterns containing multiple segments separated by variable length gaps. This phase is time consuming. The purpose of two phases is to exploit the information obtained from the first phase to speed up the pattern growth and matching and to prune the search space in the second phase. We evaluate this approach on synthetic and real life data sets.
dc.format application/pdf
dc.title Scalable Sequential Pattern Mining for Biological Sequences *
dc.type generic


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account