728x90
반응형
def find_aligning_region(fullseq,subseq,extending=False):
'''
This function locates sub-sequences (subseq) within the given full-sequence (fullseq).
Next, it returns list containing start and end location of the sub-sequence.
Sub-sequence should be fully included in the full-sequence.
# Input
fullseq : Full-sequence (string)
subseq : Sub-sequence (string). Its length should be shorter than that of fullseq.
extending (default : False): Subsequence may be a pattern of reapeat region in fullsequence.
In this case, it may be better representation to return the longest alining site by aggregating the indexes (extending=True).
# Return
list : [[start,end],[start2,end2],...,[startN,endN]] 0-index. the 'end' position is equivalent to 'start'+len(subseq)-1.
if subseq is not included in fullseq, it will return ['N/A','N/A'].
'''
# Find starting location
import re
if subseq in fullseq:
full_length,sub_length=len(fullseq),len(subseq)
idx=[[start,start+sub_length-1] for start in range(0,full_length-sub_length+1) if subseq in fullseq[start:start+sub_length]]
else:
return [['N/A','N/A']]
# Checking
if extending:
extended_idx=[]
neighbor_idx=[]
for i in range(len(idx)-1):
i1=idx[i][0]
i2=idx[i+1][0]
if i2-i1==1:
neighbor_idx.extend([i1,i2])
else:
if len(neighbor_idx)==0:
extended_idx.append(idx[i])
else:
start_idx=neighbor_idx[0]
max_idx=neighbor_idx[-1]+sub_length-1
extended_idx.append([start_idx,max_idx])
neighbor_idx=[]
extended_idx.append(idx[-1])
return extended_idx
else:
return idx
## extending=False option
find_aligning_region(fullseq='AAATTGGAAAAAGAAA',subseq='AAA',extending=False)
#[[0, 2], [7, 9], [8, 10], [9, 11], [13, 15]]
## extending=True option
find_aligning_region(fullseq='AAATTGGAAAAGAAA',subseq='AAA',extending=True)
#[[0, 2], [7, 11], [13, 15]]
## subseq is not included in fullseq
find_aligning_region(fullseq='AAATTGGAAAAAGAAA',subseq='C',extending=True)
#[['N/A', 'N/A']]
728x90
반응형
'파이썬3' 카테고리의 다른 글
파이썬 list안의 list를 풀어주는 기능 (unlist) (0) | 2024.02.14 |
---|---|
[pysam] reference fasta파일로부터 원하는 위치의 DNA서열 불러오기 (0) | 2024.01.31 |
무작위 펩타이드 서열 생성 파이썬 기능 (random peptide sequence generator in python) (0) | 2024.01.11 |
PRIME 1.0 결과물을 파이썬으로 불러들이는 스크립트 (1) | 2024.01.05 |
리스트 나누기 (chunking, sub-list,list split) (0) | 2023.12.27 |