FlexiTex: Enhancing Texture Generation
via Visual Guidance

Anonymous Submission

Abstract

Recent texture generation methods achieve impressive results due to the powerful generative prior they leverage from large-scale text-to-image diffusion models. However, abstract textual prompts are limited in providing textural or global shape information, which results in the texture generation methods producing blurry or inconsistent patterns. To tackle this, we present FlexiTex, embedding rich information via visual guidance to generate a high-quality texture. The core of FlexiTex is the Visual Guidance Enhancement module, which incorporates more specific information from visual guidance to reduce ambiguity in the text prompt and preserve high-frequency details. To further enhance the visual guidance, we introduce a Direction-Aware Adaptation module that automatically designs direction prompts based on different camera poses, avoiding the Janus problem and maintaining semantically global consistency. Benefiting from the visual guidance, FlexiTex produces quantitatively and qualitatively sound results, demonstrating its potential to advance texture generation for real-world applications.


It may take a while to load all textured meshes ...


Text-to-Texture

Input Mesh

Latent-Paint

TexPainter

Text2Tex

Paint3D

SyncMVD

Ours

a female wearing a robe with her hands crossed in front of her chest
a small, cartoon-style house
a green tank with rich details
an Eastern dragon
an ancient caricon-style carriage with wooden wheels
green tyrannosaurus
a man wearing a long coat and hat


Image-to-Texture

Reference

Input Mesh

TEXTure

PGC-3D

Paint3D

Ours



Diverse Texture Synthesis

We use 10 text prompts and 10 different image prompts here and achieve diverse results on the same meshes.


Framework

The framework of FlexiTex. Given a mesh with corrected orientation and text/image prompts, the Visual Guidance Enhancement module extracts visual features to provide informative guidance during the denoising steps. The Direction-Aware Adaptation module then injects direction prompts according to camera poses, employing diffusion models that are more geometrically sensitive. Through iterative texture warpping and rasterization, our method generates high-fidelity textures of both rich details and multi-view consistency.



Ablation Results on Visual Guidance Enhancement(VGE)

Through Visual Guidance Enhancement module, we preserve more high-frequency details and avoid variance degration.

Simple Prompt

small carton house
while flower
metal bracelet
retro pistol

+ Refined Prompt

a small cartoon-style house with featuring a white window frame and a rustic wooden door
a white flower with intricate details, featuring delicate yellow stamens, soft green leaves, and subtle texture on petals, all rendered in realistic colors and materials
a ring with two layers of loop buckles, featuring a silver-colored metal body with intricate details on the sides and top
a detailed replica of a gun, featuring a metallic silver handle with intricate engravings and a wooden grip with a rich brown color

+ VGE


Simple Prompt

modern high-rise building
green dinosaur
businessman

+ Refined Prompt

a modern-style high-rise building with multiple floors and windows featuring warm-toned brick exterior, and vibrant greenery on rooftops
a vibrant green dinosaur with scaly skin, sharp teeth, and a long, curved tail, with darker green accents on its arms and legs
a white male wearing a charcoal gray suit with a subtle sheen, paired with black leather shoes and a crisp white shirt with thin stripes. He wears black sunglasses

+ VGE




Ablation Results on Direction-Aware Adaptation(DAA)

Through DAA module, we improve the geometric awareness in texture generation and alleviate the multi-face problem.

w/o DAA

w/ DAA

w/o DAA

w/ DAA