Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E::faidx_adjust_position] The sequence not found; samtools/bcftools mpileup problem #1015

Closed
zillurbmb51 opened this issue Feb 28, 2019 · 6 comments

Comments

@zillurbmb51
Copy link

zillurbmb51 commented Feb 28, 2019

Hello there,
I am using samtools mpileup for snp calling. Whenever I use samtools mpileup -uf pfal.fa bbm.sorted.bam | bcftools call -c > bbm.vcf or any mpileup command I am getting [E::faidx_adjust_position] The sequence "Pf3D7_01_v3 | organism=Plasmodium_falciparum_3D7 | version=2015-06-18 | length=640851 | SO=chromosome" not found for all position. I have indexed both fasta and bam file. I have tried with bcftools mpileup also but same error report. My fasta and bam headers are same (attached). Any help?
Best regards
Zillur
screen shot 2019-02-28 at 3 35 38 pm

@lh3
Copy link
Member

lh3 commented Feb 28, 2019

In fasta, anything following a space is comment, so the sequence name is Pf3D7_01_v3 etc. It's BBmap's fault to include those long names in the SAM header.

@PlatonB
Copy link

PlatonB commented Sep 29, 2019

My case. I took random data from SRA for training purposes. Then I got SAM and FASTQ via SRA Tools and converted SAM to BAM using SAMtools.

The first 10 lines of FASTQ.

@SRR10033112.1 1 length=180
TCGCCGTTAAGTTCGGAGACGACCGCGTTCCACACTGTGGTGAAGCCTGAACCGGGGTCATCGGTCAACGACGTATCTCCCTGGTTCTCGCGAGAACCAGGGAGATACGTCGTTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCGA
+SRR10033112.1 1 length=180
DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHHHIIIIIIIIIIIIIIIIIIIIIIIEHHIIIIIIIIIHDDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHHHIIIIIIIIGIIIIIIIIIIEHHIIIIHIIIIIIIIIIIGHIIIIIIIIIIIII
@SRR10033112.2 2 length=202
CCTTAGGGTCGCCGTTAAGTTCGGAGACGACCGCGTTCCACACTGTGGTGAAGCCTGAACCGGGGTCATCGGTCAACGACGTATCTCCCTGGTTCTCGTTAGCTCGACCCGGAACCAAGACCCGGAACTAACGAGAACCAGGGAGATACGTCGTTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCG
+SRR10033112.2 2 length=202
DDDDCIIIHIIIIIIIIIHHHIHHIHIIIIIIIIIIIHIIIIIHIIHEHHHHHIIHHHHEHHDIHHCHHIIIIGHEHGCHDHHHIIHHEFHGHIIHIIHIHDDDDCIIIIIIIIIIIGHHHIHIHIHIGHHIEHIIIIIIIIHEEGHGHHHIIHGHIFIHIIIII1FHC<HDGHEFHDCFHIIIGHHCHH?CGHI=EC..C<
@SRR10033112.3 3 length=202
CTGGGTCCGTCGTCAACCTTAGGGTCGCCGTTAAGTTCGGAGACGACCGCGTTCCACACTGTGGTGAAGCCTGAACCGGGGTCATCGGTCAACGACGTATCCTGGGTCCGGAACTAACGAGAACCAGGGAGATACGTCGTTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAAC

The first 10 lines of SAM.

@HD	VN:1.2	SO:coordinate
@SQ	SN:AP012340.1	LN:4392353
@RG	ID:default
27	99	AP012340.1	1	70	101M	=	126	226	TTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCGACCCTAAGGTTGACGACGGACCCAGCAGTGATGC	DDDDDIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHI/EHIIIIIIIIIIIIIIIHIIIIHIIIHIIIIIIHIGHIIIIICEHH	NH:i:1	NM:i:0
12	99	AP012340.1	1	70	101M	=	46	146	TTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCGACCCTAAGGTTGACGACGGACCCAGCAGTGATGC	DDDDDIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIHIIIIIIIIIIIIIIIIIIIIIIIIIIH@FHIIIIII	NH:i:1	NM:i:0
1086920	83	AP012340.1	1	70	3S98M	=	4392280	4392353	TCGTTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCGACCCTAAGGTTGACGACGGACCCAGCAGTGA	GHIIIIIIHIIIIHIIIIIIHIIIIHIIIIIHIIIIIIIIGIIIIIIHIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIHIHIIIIIIIIIIIIIIIDDDBD	NH:i:1	NM:i:0
14	99	AP012340.1	1	70	4S97M	=	56	156	GTCGTTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCGACCCTAAGGTTGACGACGGACCCAGCAGTG	DDDDDIIIIIIHIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIHHGHIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIGIIHIIIHIIH?	NH:i:1	NM:i:0
9	99	AP012340.1	1	70	9S92M	=	28	128	GATACGTCGTTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCGACCCTAAGGTTGACGACGGACCCAG	DDDCDIIIIIIHIIIHHHIHHIIIHIIIIIIIIIIIIIIIHIIIHIGHIHHHIHIHHHIIIIIIIIIIIHIIIIIHHIIHIIGHHIIGIIGIIIIIIIHIH	NH:i:1	NM:i:0
3	83	AP012340.1	1	70	9S92M	=	1	92	GATACGTCGTTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCGACCCTAAGGTTGACGACGGACCCAG	HIIHCIHFHHIIHHEIIHHCDHCHEHHIIHIGIIIIIHIIIHHHIIIIHDHIHFIIIIHIIHIIHHIIIIIIIIIIIIIHIIIIIIHIIIIIIIIICDDDD	NH:i:1	NM:i:0
1086926	83	AP012340.1	1	70	13S88M	=	4392294	4392353	GGGAGATACGTCGTTGACCGATGACCCCGGTTCAGGCTTCACCACAGTGTGGAACGCGGTCGTCTCCGAACTTAACGGCGACCCTAAGGTTGACGACGGAC	HIGIIIIHDHHHHGIIIIIIIIIHIIIIIHIIIHIHIIIIIIIIIIIGIIIIIIIHIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDDDD	NH:i:1	NM:i:0

I tried to get BCF with help of command from the official BCFtools tutorial. A huge number of identical lines were printed.

[E::faidx_adjust_position] The sequence "AP012340.1" not found
[E::faidx_adjust_position] The sequence "AP012340.1" not found
[E::faidx_adjust_position] The sequence "AP012340.1" not found
[E::faidx_adjust_position] The sequence "AP012340.1" not found
[E::faidx_adjust_position] The sequence "AP012340.1" not found
<...>

@daviesrob
Copy link
Member

@PlatonB What command line did you use? And what was in your reference fasta file?

@PlatonB
Copy link

PlatonB commented Oct 1, 2019

@daviesrob

What command line did you use?

bcftools mpileup -Ou -f path_to/SRR10033112.fa path_to/SRR10033112.bam | bcftools call -mv -Ob -o path_to/SRR10033112.bcf

And what was in your reference fasta file?

I quoted part of fastq above.

@daviesrob
Copy link
Member

That's not the right fasta file. You need the one that was used to align the data and has the AP012340.1 reference sequence in it.

@csmiller
Copy link

csmiller commented May 6, 2020

In fasta, anything following a space is comment, so the sequence name is Pf3D7_01_v3 etc. It's BBmap's fault to include those long names in the SAM header.

You can use
trimreaddescriptions=t
in bbmap to include only the sequence names in output SAM and avoid this error downstream with bcftools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants